# Classic multi-class classification

## I. Iris classification

### Dataset Description
- The classic Hello World example for classification
- Tutorial: http://machinelearningmastery.com/multi-class-classification-tutorial-keras-deep-learning-library/
- Dataset: http://archive.ics.uci.edu/ml/datasets/Iris
- Attributes

|sepal length in cm| sepal width in cm| petal length in cm| petal width in cm| class|
|-|-|-|-|-|
|5.1|3.5|1.4|0.2|Iris-setosa|

### Preparing the environment

In [31]:
import numpy
import pandas
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

### Training steps

#### 1. Preparing the data
- Download the dataset into data directory and load into memory

In [25]:
# load dataset
dataframe = pandas.read_csv("data/iris.csv", header=None)
dataset = dataframe.values
X = dataset[:,0:4].astype(float)
Y = dataset[:,4]

* Explore the data a bit

In [26]:
X[0,], Y[0]

(array([ 5.1,  3.5,  1.4,  0.2]), 'Iris-setosa')

- Numerically encode the output classes with one-hot encoding

|Iris-setosa|Iris-versicolor|Iris-virginica|
|-|-|-|
|1|0|0|
|0|1|0|
|0|0|1|

In [34]:
encoder = LabelEncoder()
encoder.fit(Y)
one_hot_Y = np_utils.to_categorical(encoder.transform(Y))

#### 2. Designing the network

- Since a data point has 4 attributes, the input vector will have length 4
- The output vector is an one-hot encoding representation of length 3
- Therefore for an one-layer neural network we can use this topology

> 4 input variables -> 5 neurons -> 3 outputs

- We use 5 neurons in the layer to demonstrate that it doesn't have to be the same as the number of input variables
- In keras code, this is our model

In [36]:
def multiclass_model():
    # create model
    model = Sequential()

    # Input layer
    model.add(Dense(5, input_dim=4, init='normal', activation='relu'))
    # Output layer
    model.add(Dense(3, init='normal', activation='sigmoid'))

    # ADAM gradient descent optimization
    # logarithmic loss function
    # TODO: experiment with other loss function & optimizer
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

#### 3. Create a classifier

In [37]:
classifier = KerasClassifier(build_fn=multiclass_model, nb_epoch=200, batch_size=5, verbose=0)

#### 4. Evaluate the model

In [38]:
seed = 7
numpy.random.seed(seed)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(classifier, X, one_hot_Y, cv=kfold)
print("Baseline: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Baseline: 96.00% (4.42%)
