# Project: Multiclass Classiﬁcation Of Flower Species
With [Iris Flowers Classiﬁcation Dataset]

The iris ﬂower dataset is a well studied problem and a such we can expect to achieve a model accuracy in the range of 95% to 97%. This provides a good target to aim for when developing our models and is a good problem for practicing on neural networks because all of the 4 input variables are numeric and have the same scale in centimeters. Each instance describes the properties of an observed ﬂower measurements and the output variable is speciﬁc iris species. The attributes for this dataset can be summarized as follows:
- Sepal length in centimeters.
- Sepal width in centimeters.
- Petal length in centimeters.
- Petal width in centimeters.
- Class.

This is a multiclass classiﬁcation problem, meaning that there are more than two classes to be predicted, in fact there are **three ﬂower species**. This is an important type of problem on which to practice with neural networks because the three class values require specialized handling.

You can also download the iris ﬂowers dataset from the UCI Machine Learning repository

In [1]:
import numpy as np
import pandas as pd

from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier

from keras.utils import np_utils

from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

np.random.seed(47) # It ensures that the stochastic process of training a neural network model can be reproduced.

Using TensorFlow backend.


In [11]:
#Because the output variable in the dataset contains strings, it is easiest to load the data using pandas.
df = pd.read_csv('iris.csv', header=None)
data = df.values ## we need array...
type(df), type(data)

(pandas.core.frame.DataFrame, numpy.ndarray)

In [14]:
df.head(2)

Unnamed: 0,0,1,2,3,4
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa


In [17]:
X = data[:, 0:4].astype(float)
y = data[:, -1]

print(X[:3], y[:3])

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]] ['Iris-setosa' 'Iris-setosa' 'Iris-setosa']


In [24]:
df[4].value_counts()

Iris-virginica     50
Iris-setosa        50
Iris-versicolor    50
Name: 4, dtype: int64

### One hot encoding for the Y(multi-class)
The output variable contains three different string values. When modeling multiclass classfication problems using neural networks, it is good practice to reshape the output attribute from a vector that contains values for each class value to be a matrix with a boolean for each class value and whether or not a given instance has that class value or not. This is called one hot encoding or creating dummy variables from a categorical variable.
- **`LabelEncoder()`**: encoding the strings consistently to **integers**
- **`to_categorical()`**: convert the vector of **integers** to a one hot encoding 

In [30]:
encoder = LabelEncoder().fit(y)
encoded_y = encoder.transform(y)

dummy_y = np_utils.to_categorical(encoded_y)
print(dummy_y[:3])

[[1. 0. 0.]
 [1. 0. 0.]
 [1. 0. 0.]]


### Define the Single Layered Baseline NN model

The Keras library provides wrapper classes to allow you to use neural network models developed with Keras in scikit-learn.  There is a **`KerasClassifier()`** class in Keras that can be used as an Estimator in scikit-learn, the base type of model in the library. The **`KerasClassifier()`** takes the name of a function as an argument. This function must return the constructed neural network model, ready for training.

Write a function that will create a **baseline neural network** for the iris classiﬁcation problem. 
- [Input]
- [Hidden]
  - Create a simple fully connected network with one hidden layer containing **8 neurons**. 
  - The hidden layer uses a **rectifier** activation function, and the network uses the efficient **Adam** gradient descent optimization algorithm with a logarithmic loss function, which is called **categorical_crossentropy** in Keras.
- [output]
  - Because we used a one hot encoding for our iris dataset, the output layer must create **3 output** values, one for each class. 
  - The output value with the **largest value** will be taken as the class predicted by the model. we use a **softmax** activation function in the output layer. This is to ensure the output values are in the range of 0 and 1 and may be used as predicted probabilities.

The network topology of this can be summarized as:
- 4 inputs -> [8 hidden nodes] -> 3 outputs

In [33]:
def baseline_model():
    model = Sequential()
    model.add(Dense(8, input_dim=4, activation='relu'))
    model.add(Dense(3, activation='softmax'))
    
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return(model)

Now create our **KerasClassifier** for use in scikit-learn. We can also pass arguments in the construction of the KerasClassifier class that will be passed on to the `fit()` function internally used to train the neural network.

Here, we pass the number of **epochs** as 200 and **batch_size** as 5 to use when training the model. Debugging is also turned off when training by setting **verbose** to 0.

In [34]:
estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

### Evaluate The Model with k-Fold Cross-Validation
Now evaluate the NN model on our training data, using KFoldCV. Here, we set the number of folds to be 10 (an excellent default) and to shuffle the data before partitioning it.

In [35]:
kfold = KFold(n_splits=10, shuffle=True, random_state=47)

Now we can evaluate our model (estimator) on our dataset (X and dummy_y). It will return an object that describes the evaluation of the 10 constructed models for each of the splits. 

In [37]:
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print('Accuracy -> Mean: %.2f (+/- %.2f)' %(results.mean(), results.std()))

Accuracy -> Mean: 0.97 (+/- 0.03)
