### Keras / Tensorflow Basics II
#### An even better simple example

I'm going to be utilizing the `iris` dataset which is built into sklearn. This is a multi-classification problem in which 4 features are utilized to categorize one of 3 classes.

In [None]:
# Imports
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import confusion_matrix, classification_report,accuracy_score
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical

#### Data Loading / EDA

In [2]:
# Load the data from sklearn
iris = load_iris()

In [3]:
# A little bit about this dataset
print(iris.DESCR)

.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
                
    :Summary Statistics:

                    Min  Max   Mean    SD   Class Correlation
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20   0.76    0.9565  (high!)

    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :

#### Data transformation

In [4]:
# Set features
X = iris.data

# Set labels
y = iris.target

In [5]:
# One hot encode y values, transform from numerical 0,1,2
# We do this because output does not have correlation to ranking (class 0 is 'less' than class 1)
# Class 0 --> [1,0,0]
# Class 1 --> [0,1,0]
# Class 2 --> [0,0,1]

y = to_categorical(y)
y[0:3] #Sample print

array([[1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.]], dtype=float32)

#### Build / Train Model

In [6]:
# Test / Train split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

In [7]:
# Scale / Standardize data
scaler = MinMaxScaler()

# Fit to train
scaler.fit(X_train)

# Scale both training and testing X
scaled_X_train = scaler.transform(X_train)
# Not fitting here because we're assuming no prior knowledge of test
scaled_X_test = scaler.transform(X_test)

In [9]:
# I'm utilizing 16 neurons (4x the features)
# Moves down by half until the output, in which the neurons are equal to the number of categories 
#  since this is a multi-class problem, and utilizes softmax

model = Sequential()
model.add(Dense(16,input_dim=4,activation='relu'))
model.add(Dense(8,input_dim=4,activation='relu'))
model.add(Dense(4,input_dim=4,activation='relu'))
model.add(Dense(3,activation='softmax'))

model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])

In [10]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_5 (Dense)              (None, 16)                80        
_________________________________________________________________
dense_6 (Dense)              (None, 8)                 136       
_________________________________________________________________
dense_7 (Dense)              (None, 4)                 36        
_________________________________________________________________
dense_8 (Dense)              (None, 3)                 15        
Total params: 267
Trainable params: 267
Non-trainable params: 0
_________________________________________________________________


In [12]:
#Fit to model
model.fit(scaled_X_train,y_train,epochs=100,verbose=0)

<keras.callbacks.callbacks.History at 0x7fb998d6cb50>

In [13]:
# Grab predictions of testing data
predictions = model.predict_classes(scaled_X_test)

In [14]:
# Transform y_test to index position of [0,1,0] etc. - this becomes 1 and [0,0,1] becomes 2
# We do this so it matches the format of the predictions for comparison
y_test.argmax(axis=1)

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
       0, 1, 2, 2, 1, 2])

#### Determine effectiveness of model

In [15]:
confusion_matrix(y_test.argmax(axis=1),predictions)

array([[19,  0,  0],
       [ 0, 14,  1],
       [ 0,  0, 16]])

In [16]:
print(classification_report(y_test.argmax(axis=1),predictions))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.93      0.97        15
           2       0.94      1.00      0.97        16

   micro avg       0.98      0.98      0.98        50
   macro avg       0.98      0.98      0.98        50
weighted avg       0.98      0.98      0.98        50



In [17]:
accuracy_score(y_test.argmax(axis=1),predictions)

0.98

A 99% accuracy, and the confusion matrix / classification reports look great :-)