<a href="https://www.kaggle.com/code/danielfourie/mushroom-classification-keras-ann-100?scriptVersionId=206196307" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<center>
    <h1>Mushroom Classification - safe or poison?</h1>
    <img src="https://www.seattlemet.com/discover/wp-content/uploads/2023/12/Where-do-magic-mushrooms-grow-152201838.jpg">
</center>

# <u><b>Data Reading and Preparation</b></u>

In [1]:
#Importing the libraries
import numpy as np
import pandas as pd

In [2]:
#Importing the dataset
dataset = pd.read_csv('/kaggle/input/mushroom-classification/mushrooms.csv')

In [3]:
#Let's look at the top 5 rows
dataset.head()

Unnamed: 0,class,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,...,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,p,x,s,n,t,p,f,c,n,k,...,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,...,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,...,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,...,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,...,s,w,w,p,w,o,e,n,a,g


In [4]:
#Let's see if we have any NaNs in our dataset
dataset.isnull().sum()

class                       0
cap-shape                   0
cap-surface                 0
cap-color                   0
bruises                     0
odor                        0
gill-attachment             0
gill-spacing                0
gill-size                   0
gill-color                  0
stalk-shape                 0
stalk-root                  0
stalk-surface-above-ring    0
stalk-surface-below-ring    0
stalk-color-above-ring      0
stalk-color-below-ring      0
veil-type                   0
veil-color                  0
ring-number                 0
ring-type                   0
spore-print-color           0
population                  0
habitat                     0
dtype: int64

**We have no NaNs - this is great!**

In [5]:
#Let's see our data types for each feature
dataset.dtypes

class                       object
cap-shape                   object
cap-surface                 object
cap-color                   object
bruises                     object
odor                        object
gill-attachment             object
gill-spacing                object
gill-size                   object
gill-color                  object
stalk-shape                 object
stalk-root                  object
stalk-surface-above-ring    object
stalk-surface-below-ring    object
stalk-color-above-ring      object
stalk-color-below-ring      object
veil-type                   object
veil-color                  object
ring-number                 object
ring-type                   object
spore-print-color           object
population                  object
habitat                     object
dtype: object

In [6]:
#Assign our dependent and independent variables
X = dataset.iloc[:, 1:]
y = dataset.iloc[:, 0]

In [7]:
# Encoding categorical data
#encoding y
from sklearn.preprocessing import LabelEncoder
labelencoder_y = LabelEncoder()
#reshape y
y = np.reshape(y,(-1,1)) #label encoder requires 2-dim array
y[:,0] = labelencoder_y.fit_transform(y[:, 0])

In [8]:
#encoding X
X = pd.get_dummies(data=X,columns=X.columns, drop_first=True, dtype='int') #drop_first = True to avoid the dummy variable trap

In [9]:
#Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# <u><b>Building and training the model</b></u>

In [10]:
#import the libraries
import keras
from keras.models import Sequential
from keras.layers import Input, Dense
from sklearn.metrics import accuracy_score, classification_report

In [11]:
# Initialising the ANN
classifier = Sequential()
classifier.add(Input(shape=(X_train.shape[1],)))
# Adding first hidden layer
classifier.add(Dense(units = 48, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the second hidden layer
classifier.add(Dense(units = 48, kernel_initializer = 'uniform', activation = 'relu'))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

In [12]:
# Fitting the ANN to the Training set
classifier.fit(X_train, y_train.astype('float64'), batch_size = 32, epochs = 3) #ValueError if y_train datatype not changed

Epoch 1/3
[1m204/204[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - accuracy: 0.9112 - loss: 0.4159
Epoch 2/3
[1m204/204[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 0.9991 - loss: 0.0086
Epoch 3/3
[1m204/204[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - accuracy: 1.0000 - loss: 0.0018


<keras.src.callbacks.history.History at 0x7f74b092a530>

In [13]:
# Predicting the Train set results
y_pred_train = classifier.predict(X_train)
y_pred_train = (y_pred_train > 0.5) #50% threshold
# Predicting the Test set results
y_pred_test = classifier.predict(X_test)
y_pred_test = (y_pred_test > 0.5) #50% threshold

[1m204/204[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step
[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step


In [14]:
#Displaying model's performance on Train and Test set
print("ANN Model's performance on Train, and Test set:")
print(f"The model's accuracy on the training set is: {accuracy_score(y_train.astype('bool'),y_pred_train)*100}%")
print(f"The model's accuracy on the test set is: {accuracy_score(y_test.astype('bool'),y_pred_test)*100}%\n")
print(f"The model's classification report on the test set\n {classification_report(y_test.astype('bool'),y_pred_test)}")

ANN Model's performance on Train, and Test set:
The model's accuracy on the training set is: 100.0%
The model's accuracy on the test set is: 100.0%

The model's classification report on the test set
               precision    recall  f1-score   support

       False       1.00      1.00      1.00       852
        True       1.00      1.00      1.00       773

    accuracy                           1.00      1625
   macro avg       1.00      1.00      1.00      1625
weighted avg       1.00      1.00      1.00      1625



**We have achieved a perfect score on all metrics presented!**

<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#f95b4a;
           font-size:110%;
           font-family:Verdana;
           letter-spacing:0.5px">

<p style="padding: 10px;
              color:black;">
Thank you for reading through my notebook on Mushroom Classification. I hope you enjoyed it and found it interesting☺️👍🏻. I will also reply to any comments you have on this notebook. Have a good day!🚀
</p>
</div>