**Classifier**

Classification involves predicting what class something belongs to. Classifiers can be binary or multi-class, meaning that they either classify something into a binary decision ( yes/no, spam/not spam, hot/cold ) or into several different categories ( blue, yellow, red or green? ). Classification models are a very common use case in deep learning, and they can be used to solve a lot of different problems. 

Now, we are going to build a classification model that can help us to classify mushrooms between edible and poisonous.

In [1]:
import numpy as np

import pandas as pd

from sklearn.preprocessing import LabelEncoder
from sklearn.utils import shuffle
from sklearn.metrics import confusion_matrix, classification_report

from keras.layers import Dense, Input
from keras.models import Model
from keras.optimizers import Adam
from keras.utils import np_utils

Using TensorFlow backend.


**The Data**

The Mushroom Classification dataset is a multivariate dataset that contains hypothetical samples corresponding to 23 species of gilled mushrooms. These species are identified as edible or poisonous. 

1. Classes:
 * edible = e
 * poisonous = p
2. Cap shape
3. Cap surface
4. Cap color
5. Bruises
6. Odor
7. Gill attachment
8. Gill spacing
9. Gill size
10. Gill color
11. Stalk shape
12. Stalk root
13. Stalk surface above ring
14. Stalk surface below ring
15. Stalk color above ring
16. Stalk color below ring
17. Veil type
18. Veil color
19. Ring number
20. Ring type
21. Spore print color
22. Population
23. Habitat

More information at [Kaggle Mushroom dataset](https://www.kaggle.com/uciml/mushroom-classification)

**Loading a csv file**

This data is organized inside a csv file. We are going to read the values from this file and organize them into arrays.

In [2]:
import os.path
path = "Datasets/mushrooms.csv"
if os.path.isfile(path) :
    dataset = pd.read_csv(path)
else:
    dataset = pd.read_csv("2-Basic-neural-networks/" + path)

print("The dataset examples:")
dataset

The dataset examples:


Unnamed: 0,class,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,...,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,p,x,s,n,t,p,f,c,n,k,...,s,w,w,p,w,o,p,k,s,u
1,e,x,s,y,t,a,f,c,b,k,...,s,w,w,p,w,o,p,n,n,g
2,e,b,s,w,t,l,f,c,b,n,...,s,w,w,p,w,o,p,n,n,m
3,p,x,y,w,t,p,f,c,n,n,...,s,w,w,p,w,o,p,k,s,u
4,e,x,s,g,f,n,f,w,b,k,...,s,w,w,p,w,o,e,n,a,g
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8119,e,k,s,n,f,n,a,c,b,y,...,s,o,o,p,o,o,p,b,c,l
8120,e,x,s,n,f,n,a,c,b,y,...,s,o,o,p,n,o,p,b,v,l
8121,e,f,s,n,f,n,a,c,b,n,...,s,o,o,p,o,o,p,b,c,l
8122,p,k,y,n,f,y,f,c,n,b,...,k,w,w,p,w,o,e,w,v,l


**Separating the inputs and outputs**

This dataset need to be organized and preprocessed before it can be used inside the neural network. Let's start by separating the input values to an array (X) and the output values to another array (Y).

In [3]:
# Getting the input data
X = dataset.iloc[:,1:]
# Getting the output
Y = dataset.iloc[:,0:1]

In [4]:
X.head()

Unnamed: 0,cap-shape,cap-surface,cap-color,bruises,odor,gill-attachment,gill-spacing,gill-size,gill-color,stalk-shape,...,stalk-surface-below-ring,stalk-color-above-ring,stalk-color-below-ring,veil-type,veil-color,ring-number,ring-type,spore-print-color,population,habitat
0,x,s,n,t,p,f,c,n,k,e,...,s,w,w,p,w,o,p,k,s,u
1,x,s,y,t,a,f,c,b,k,e,...,s,w,w,p,w,o,p,n,n,g
2,b,s,w,t,l,f,c,b,n,e,...,s,w,w,p,w,o,p,n,n,m
3,x,y,w,t,p,f,c,n,n,e,...,s,w,w,p,w,o,p,k,s,u
4,x,s,g,f,n,f,w,b,k,t,...,s,w,w,p,w,o,e,n,a,g


In [5]:
Y.head()

Unnamed: 0,class
0,p
1,e
2,e
3,p
4,e


**Encoding the labels**

Remember that neural networks are mathematical functions but our labels are names. We also need to convert the way we represent those names to a numerical form.

Since we have only two categories, edible or poisonous, they will be represented by either 0 or 1, so we are going to have one output.

In [6]:
# Change the classes to numerical values (edible = 1, poisonous = 0)
Y = Y.replace('e', 1)
Y = Y.replace('p', 0)
Y = Y.astype({"class": int})
Y.head()

Unnamed: 0,class
0,0
1,1
2,1
3,0
4,1


In [7]:
# Change all other categories into one hot encoded values
def One_Hot_Encode_everything(dataframe):
    new_table = None
    for column in X.columns:
        new_column = pd.get_dummies(X[column], prefix=column)
        if(new_table is None):
            new_table = new_column
        else:
            new_table = pd.concat([new_table, new_column], axis=1)
    return new_table
X = One_Hot_Encode_everything(X)
X.head()

Unnamed: 0,cap-shape_b,cap-shape_c,cap-shape_f,cap-shape_k,cap-shape_s,cap-shape_x,cap-surface_f,cap-surface_g,cap-surface_s,cap-surface_y,...,population_s,population_v,population_y,habitat_d,habitat_g,habitat_l,habitat_m,habitat_p,habitat_u,habitat_w
0,0,0,0,0,0,1,0,0,1,0,...,1,0,0,0,0,0,0,0,1,0
1,0,0,0,0,0,1,0,0,1,0,...,0,0,0,0,1,0,0,0,0,0
2,1,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,1,0,0,0
3,0,0,0,0,0,1,0,0,0,1,...,1,0,0,0,0,0,0,0,1,0
4,0,0,0,0,0,1,0,0,1,0,...,0,0,0,0,1,0,0,0,0,0


**Separating the train and test data**

The network need to be able to perform with data that it hasn't seen during the training process. So we separate a small portion of the dataset out of the training process to better evaluate its performance later.

In [8]:
# Shuffle the dataset
X, Y = shuffle(X, Y, random_state=0)

# 80% for training and 20% for test
p = int(len(dataset)* 0.8 )

x_train = X[:p]
x_test = X[p:]

y_train = Y[:p]
y_test = Y[p:]

**Building a classification model** 

In [9]:
print("Build your neural network here.")

Build your neural network here.


**Training the model**

Now we have our data separated into inputs and outputs, and training data and testing data. We also have defined our architecture. All that is left to do is fit the data inside our model (train).

In [10]:
print("Train your model here.")

Train your model here.


**Evaluating the trained neural network**

A good way to evaluate binary classification models is by finding the accuracy, mean error and the error standard deviation.

In [11]:
print("Find the model's accuracy, mean error and standard deviation.")

Find the model's accuracy, mean error and standard deviation.
