[This is the article that I'm following](https://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/)
by Jason Brownlee on June 2, 2016 in Deep Learning

# Multi-Class Classification Tutorial with the Keras Deep Learning Library

In [2]:
import numpy as np
import pandas as pd

from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder # to create dummy variables for categorical columns - algorithms can only process numbers.
from sklearn.pipeline import Pipeline

Using TensorFlow backend.


<a id='seed'></a>

In [3]:
seed = 7
np.random.seed(seed)

<img src='np.random.seed.png'>

In [4]:
np.random.seed(seed); np.random.rand(4) # testing the np.random.rand function with the seed

array([0.07630829, 0.77991879, 0.43840923, 0.72346518])

In [5]:
np.random.seed(seed); np.random.rand(4) # testing the np.random.rand function with the seed

array([0.07630829, 0.77991879, 0.43840923, 0.72346518])

This seed allows me to generate the same set of numbers from the random function,

---

## Loading the data set
> Note that you are making a prediction on categories based on continuous variables.

In [9]:
df = pd.read_csv('iris.csv')
print(df.head())

print('='*40)
print(df.shape)

print('='*40)
print(df.dtypes)

   sepal_length  sepal_width  petal_length  petal_width   class
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
(100, 5)
sepal_length    float64
sepal_width     float64
petal_length    float64
petal_width     float64
class            object
dtype: object


In [10]:
X_col_names = ['sepal_length','sepal_width','petal_length','petal_width']
y_col_names = ['class']

X = df[X_col_names]
y = df[y_col_names]

In [11]:
print(X.head())

print('='*40)
print(y.head())

   sepal_length  sepal_width  petal_length  petal_width
0           5.1          3.5           1.4          0.2
1           4.9          3.0           1.4          0.2
2           4.7          3.2           1.3          0.2
3           4.6          3.1           1.5          0.2
4           5.0          3.6           1.4          0.2
    class
0  setosa
1  setosa
2  setosa
3  setosa
4  setosa


## Encode the target variable

In [12]:
# checking the number of categories in the column and their counts
# value_counts() is a series object method
y['class'].value_counts() 

versicolor    34
virginica     34
setosa        32
Name: class, dtype: int64

What is one hot enconding: 

1. The idea of one hot encoding is the reformatting of categories into numeric qualities for processing.
For example, these categories will be organized into a separate columns where the presence of the category in a row will be indicated by 1.

To reiterate:
 
1. Algorithms can only process numbers thus we have to find a way to translate these categories into numbers for processing.
2. Each category will be given a new column and if a row corresponds to that category it will be represented by a one and the rest of the columns will have a zero.
3. So on and so forth for the subsequent columns.

___

In [13]:
encoder = LabelEncoder()
encoder.fit(y)

# Note that y, your target variable is in a data frame which is 2D. The label encoder can only accept 1D. Thus - 
# use .ravel() to set your y into 1D

  y = column_or_1d(y, warn=True)


LabelEncoder()

In [14]:
y.head() 
# remember y here is 2D, it's a pandas dataframe
# you have to make it into a 1d array because it's what the encoder is expected based on the error message above.

Unnamed: 0,class
0,setosa
1,setosa
2,setosa
3,setosa
4,setosa


In [15]:
y.ravel() 
# .ravel() is a numpy method, you have to convert y into a numpy array first before using .ravel()

AttributeError: 'DataFrame' object has no attribute 'ravel'

In [None]:
y.values[:10]
# using .values attribute to transform the object into a numpy array first because .ravel() only works on -
# a nupy array 

In [None]:
y.values.ravel()[:10] 
#now .ravel method works on numpy array and this is what I get back a 1D array

In [16]:
print('checking the object type of this new structure : {}'.format(type(y.values.ravel())))
print('checking the shape of this structure :  {}'.format((y.values.ravel().shape)))

checking the object type of this new structure : <class 'numpy.ndarray'>
checking the shape of this structure :  (100,)


In [17]:
# Finally, the encoder works and this is what I have after transformation.
# What the Label Encoder does is only to tansform each category into a unique number.
# The result yet is not a one-hot encoding.
# A one hot encoding is when presence of a cateogry in a row is indicated by 1 or 0 only.

encoder.fit(y.values.ravel())

encoded_y = encoder.transform(y.values.ravel())
encoded_y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [18]:
# This is where we transform the categories that are label encoded into one-hot encoding.
# These are what we call 'dummy variables', variables that are represented by a 1 or 0 in each row.

dummy_y = np_utils.to_categorical(encoded_y)
dummy_y[:10]

array([[1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.]], dtype=float32)

In [19]:
# how about my X?
X.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


In [20]:
# convert to numpy array
X = X.values
X[:10]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       [5. , 3.4, 1.5, 0.2],
       [4.4, 2.9, 1.4, 0.2],
       [4.9, 3.1, 1.5, 0.1]])

One hot enconding: 

1. The idea of one hot encoding is the reformatting of categories into numeric qualities for processing.
For example, these categories will be organized into a separate columns where the presence of the category in a row will be indicated by 1.

To reiterate:
 
1. Algorithms can only process numbers thus we have to find a way to translate these categories into numbers for processing.
2. Each category will be given a new column and if a row corresponds to that category it will be represented by a one and the rest of the columns will have a zero.
3. So on and so forth for the subsequent columns.

## Setting up the neural network model

QUOTE: The Keras library provides wrapper classes to allow you to use neural network models developed with Keras in scikit-learn.

1. What the author is saying here is that we want to evalute the keras model using scikit-learn methods. In order to do this we have to use something called the wrappper classes.
2. Such a wrapper class is the KerasClassifer class.
3. KerasClassifier allows you to use nerual network models developed in Keras with scikit-learn

In [21]:
# Creating the Keras model

model = Sequential() 
#initializing the model structure as a Sequntial model - The most common model with a stack of layers

model.add(Dense(8, input_dim=4, activation='relu'))
# adding a layer to the model with 8 neurons. input_dimension referse to the number of predictors - 
# activation ='relu'. This stands for Rectified Linear Activation Function
# This function is here to capture non-linearity patterns in data.
# If we only capture linear patterns in data then our predictive power is limited.

model.add(Dense(3, activation='softmax'))
# This is the last layers of the model with only 3 neurons because, each neuron represented each class / category -
# we are trying to predict.

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# .compile here is to configure the model's learning process.
# the previous steps were you setting up the structure of the model
# configuration requires you to specify how you are going to train the model : optimizer,
    # the loss function: which is the function that optmizer is tryingt to minimize. 
    # and the metrics : used to monitor the training. 


Activation functions:
<img src='non-linearity.png' width=750>

[More on activation functions -1](https://towardsdatascience.com/activation-functions-and-its-types-which-is-better-a9a5310cc8f)

In [22]:
# The code for the model without the commentary like above

model = Sequential() 
model.add(Dense(8, input_dim=4, activation='relu'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

<a id='Sequential()'></a>

In [23]:
# Putthing the model into a function so that we can call it 

def sequential_model():
    
    model = Sequential() 
    model.add(Dense(8, input_dim=4, activation='relu'))
    model.add(Dense(3, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    return model

# We need this function becuase the KerasClassifier function only acccepts a single input

<a id='KerasClassifer'></a>

In [24]:
# Creating the KerasClassifer - the wrapper for scikit-learn 
#i.e it formats our kears model into structure that the scikit-learn API can understand for processing.

estimator = KerasClassifier(build_fn=sequential_model, epochs=200, batch_size=5,  verbose=0)

# If you remember these parameters come into the model fit part which indicates how many times you want to - 
# train your model over the data set (epoch), how many batches do you want to split your dataset into (batch_size) -
# And verbose determines how much descriptions of the training process you want to print out during model training

Note that the data is not yet fit into this estimator. This KerasClassifier you are building it not a fit function. As of now no data is in the model yet. You will have to input the data later on during the cross_val_score section.

---

QUOTE: The scikit-learn has excellent capability to evaluate models using a suite of techniques. The gold standard for evaluating machine learning models is k-fold cross validation.

1. The purpose we are using the wrapper I would think is becuase we want to use the k-fold cross validation technique from scikit-learn.
2. Now that we have created the wrapper KerasClassffier with all configurations done we have to set up Kfold for cross validation.

<a id='KFold'></a>

In [25]:
# KFold is from sklearn.model_selection

kfold = KFold(n_splits=10, shuffle=True, random_state=seed) 

1. Note that the seed number has already been stated at the [beginning of the page](#seed).
    - Note that the your refernce to the 'seed' id has to come after the hash symbol for this internal link to work.

2. Shuffle: Shuffle the data before you split them into 10 folds.
    - The splitting process splits the data into 10 pieces then uses each piece as a test set against the remaining 9 pieces.
    - The score from all these 10 folds will be aggregated.


<a id='cross_val_score'></a>

In [26]:
# Since KFold has been created I can use it in my cross_val_score now.

results = cross_val_score(estimator, X, dummy_y, cv=kfold)

# If I did not use KFold here then I can just specify cv as a number.
# But Kfold allows me to specify the seed number and weather or not I want to shuffle the data set -
    # on top of specifying the number of folds. 

1. estimator is my KerasClassifier wrapper that wraps the Keras Sequential model
2. cross_val_score is a function from scikit-learn that allows us to do multiple train-test splits across different sections of the data for training.

IMPORTANT:
1. I had an error executing the : results = cross_val_score(estimator, X, dummy_y, cv=kfold). It said something about being unable to pickle or whatever. I realized that th problem was that I added the open and close brackets () to my sequantial_model function when stating it as a parameter under build_fn in the KerasClassifier.
2. The paramters accepts the function without the open and close bracket for example, first point is wrong, second is correct. 
    - estimator = KerasClassifier(build_fn=sequential_model(), epochs=200, batch_size=5,  verbose=0)
    - estimator = KerasClassifier(build_fn=sequential_model, epochs=200, batch_size=5,  verbose=0)

In [27]:
results

array([1.        , 1.        , 0.90000001, 0.80000001, 1.        ,
       1.        , 0.40000001, 1.        , 1.        , 1.        ])

In [28]:
results.mean() #91% accuracy, not bad. What is the variation of the accuracy scores?

0.9100000023841858

In [29]:
results.std() # variation is 18% between the folds, surprisingly high.

0.18138356904044883

In [30]:
print("The mean accuracy score from cross validation is %.2f %%" % (results.mean()*100))
print("The standard deviation from cross validation is %.2f %%" % (results.std()*100))

# note that % represents the special character you want to insert
# f means to being back a float
# .2 means the number of decimals of the float you want to bing back
# %% two of these just means insert a charcter but this insertion will be a percentage symbol. 

The mean accuracy score from cross validation is 91.00 %
The standard deviation from cross validation is 18.14 %


## Summary

1. Create a seed number to use throughout the exercise. 
2. Load the data using pandas read_csv.
3. Converted the values to numpy array.
4. One-hot encoding on the target variable using the label encoder first then - from keras.utils import np_utils - np_utils.categorical.
5. Created the keras [Sequential model](#Sequential()) - input, hidden, output (loss, optimizer, metrics).
6. Wrapped the keras model using the [KerasClassifer](#KerasClassifer) - indicated number of epochs.
7. Tested the model using [cross_val_score](#cross_val_score) from scikit-learn but not without configuration [KFold](#KFold) first.
8. Check the accuracy result, its mean and its standard deviation.

## Last Meta

1. This is a classification problem.
2. We are using an accuracy score.
3. We want to use cross_val_score from the scikit-learn library.
4. In order to use cross_val_score the inputs or data structure must be recognized by scikit-learn thus the KerasClassifier wrapper.
5. We are still using a neural net for this problem but evaluating it through scikit-learn's cross_val_score


__Set up the neural net (keras) -> wrap the neural net (keras) -> test the neural net. (Scikit-learn)__