## Multi-Class Classification Iris Flowers Project 2 for Mothers who love Gardening and Flowers: Identifying Flower Types


In [1]:
# All necessary imports here
import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.wrappers.scikit_learn import KerasClassifier
from keras.utils import np_utils
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import LabelEncoder
from sklearn.pipeline import Pipeline

Using TensorFlow backend.


In [0]:
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)

### Step 1. Project Description
The Iris dataset consists of 50 samples, each of 3 different speciecs of Iris flower, in total 150 samples with 4 input variables namely: sepal length, sepal width, petal length and petal width. While, 3 output variables as the name of species: Iris-setosa, Iris-versicolor and Iris-virginica.

### Step 2. Making Preparations

In [0]:
# loading the dataset
path_to_data = "https://raw.githubusercontent.com/mahrukh98/incredible-AI/master/datasets/iris.csv"
dataframe = pd.read_csv(path_to_data, header = None)
dataset = dataframe.values

# splitting into input(X) and output(Y) variables
X = dataset[:,0:4].astype(float)
Y = dataset[:,4]

In [4]:
Y

array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-versic

As the output variables need to be numerical so as to be predicted by the model, we're using the Label Encoder class from scikit-learn, then fit_transform method to learn the labels' mean and standard deviation afterwards applying those transformations on the training dataset for encoded labels. The resulting numerical values are then one-hot encoded.

In [0]:
# encode class values as integers
encoder = LabelEncoder()
encoded_Y = encoder.fit_transform(Y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

Finally, we've received the encoded variables where [1,0,0] represents Iris-setosa, [0,1,0] represents Iris-versicolor and [0,0,1] represents Iris-virginica!

In [6]:
dummy_y

array([[1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.],
       [1., 0

### Step 3: Define the Neural Network Baseline Model
Creating the baseline model with single, densely connected hidden layer of 8 hidden units and randomly initialized weights. The hidden layer is passed through 'relu' activation function that zeroes-out the negative values and only keeps the positive values of computation. Output layer with 3 units is passed through 'softmax' activation function to give the ouput values in between  0 and 1 as probabilities, where the higher value represents the higher class.
Finally, 'categorical_crossentropy' loss function which is particular for the multi-class classification problems , 'Adam' optimizer along with recording accuracy metrics are reserved for compilation.

In [0]:
# define baseline model
def baseline_model():
  # create model
  model = Sequential([
    Dense(8, input_shape=(4,)),
    Activation('relu'),
    Dense(3),
    Activation('softmax'),
   ])
  # Compile model
  model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
  return model

In [0]:
estimator = KerasClassifier(build_fn=baseline_model, epochs=200, batch_size=5, verbose=0)

### Step 4. Evaluate The Model with k-Fold Cross Validation

Now, evaluating the model using k-fold cross-validation in the scikit-learn framework for that purpose we're using KerasClassifier wrapper class with the model creation function, number of epochs and batch_size as argument.

In [9]:
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Baseline model's accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Baseline model's accuracy: 97.33% (4.42%)


### Step 5. Tuning Layers and Number of Neurons in The Model

### Step 5.1. Evaluate a Smaller Network
Observing what reducing the hidden units to 4 gives us! 














In [0]:
# define smaller model
def smaller_model():
  # create model
  model = Sequential([
    Dense(4, input_shape=(4,)),
    Activation('relu'),
    Dense(3),
    Activation('softmax'),
   ])
  # Compile model
  model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
  return model

In [12]:
estimator = KerasClassifier(build_fn=smaller_model, epochs=200, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Smaller model's accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Smaller model's accuracy: 98.00% (3.06%)


Woah! That's pretty awesome!! 98.00% We need to continue or improve this! :D 



### Step 5.2. Evaluate a Larger Network
Just checking, if increasing another layer results in good performance or not. Here, we're using 2 hidden layers, first one with 8 hidden units and the second one with 4 hidden units!



In [0]:
# define larger model
def larger_model():
  # create model
  model = Sequential([
    Dense(8, input_shape=(4,)),
    Activation('relu'),
    Dense(4, input_shape=(4,)),
    Activation('relu'),
    Dense(3),
    Activation('softmax'),
   ])
  # Compile model
  model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
  return model

In [14]:
estimator = KerasClassifier(build_fn=larger_model, epochs=200, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Larger model's accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Larger model's accuracy: 84.00% (23.32%)


### Step 6. Really Scaling up: developing a model that overfits
Now, to figure out the strength of our model that it lies exactly right at the border between the underfitting and overfitting, we'll have to cross that border i.e. overfit the model. This can be achieved by:


1.   Adding layers ----- 3 layers + 1 output layer
2.   Increasing hidden units --- 16-----> 8 ------> 4 ------>3
3.   Training for more epochs ---250


 



In [0]:
# define overfitting model
def overfitting_model():
  # create model
  model = Sequential([
    Dense(16, input_shape=(4,)),
    Activation('relu'),
    Dense(8, input_shape=(4,)),
    Activation('relu'),
    Dense(4, input_shape=(4,)),
    Activation('relu'),
    Dense(3),
    Activation('softmax'),
   ])
  # Compile model
  model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
  return model

In [16]:
estimator = KerasClassifier(build_fn=overfitting_model, epochs=250, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Overfitted model's accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Overfitted model's accuracy: 85.33% (26.63%)


Seems the model is underfitting and even of the worse kind, if we compare it with previously achieved 98%!

### Step 7. Tuning the Model

Trying the 'rmsprop' optimizer instead of 'Adam', but seeems this model got inclined with the baseline model's performance.

In [0]:
# define improved model
def improved_model():
  # create model
  model = Sequential([
    Dense(4, input_shape=(4,)),
    Activation('relu'),
    Dense(3),
    Activation('softmax'),
   ])
  # Compile model
  model.compile(loss='categorical_crossentropy',optimizer='rmsprop',metrics=['accuracy'])
  return model

In [18]:
estimator = KerasClassifier(build_fn=improved_model, epochs=200, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Improved model's accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

Improved model's accuracy: 97.33% (4.42%)


We'll continue with this model onwards!

### Step 8. Rewriting the code using the Keras Functional API

In [0]:
import keras
from keras import layers

In [27]:
# creating functional API for improved model 
def create_improved_fn():
  inputs = keras.Input(shape = (4,))
  x = layers.Dense(4, activation='relu')(inputs)
  output = layers.Dense(3, activation='softmax')(x)

  model = keras.Model(inputs, output)
  
  model.compile(loss='categorical_crossentropy',
              optimizer='Adam',
              metrics=['accuracy'])
  
  return model

estimator = KerasClassifier(build_fn=create_improved_fn, epochs=200, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Improved model's accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))


Improved model's accuracy: 97.33% (4.42%)


### Step 9. Rewriting the code by doing Model Subclassing

In [28]:
# Creating subclass for improved model
class Improved(keras.Model):
  def __init__(self):
    super(Improved, self).__init__()
    self.dense1 = layers.Dense(4, activation='relu')
    self.dense2 = layers.Dense(3, activation='softmax')

  def call(self, inputs):
    x = self.dense1(inputs)
    return self.dense2(x)
  
# DISCLAIMER!!!
# This part is inspired from the functional API style :D 
# As, build_fn needs callable function or class instance, we're generating another method which will also accompany the input shape not specified in the class and 
# compilation step

def create_Improved_subclass():
  inputs = keras.Input(shape = (4,))
  model = Improved()
  output = model.call(inputs)
  
  model = keras.Model(inputs, output)
  model.compile(loss='categorical_crossentropy',
              optimizer='Adam',
              metrics=['accuracy'])
  return model

estimator = KerasClassifier(build_fn=create_Improved_subclass, epochs=200, batch_size=5, verbose=0)
kfold = KFold(n_splits=10, shuffle=True, random_state=seed)
results = cross_val_score(estimator, X, dummy_y, cv=kfold)
print("Improved model's accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))


Improved model's accuracy: 97.33% (3.27%)


### Step 10. Rewriting the code without using scikit-learn
Here, we're asked for selecting the best model and implementing its code + the k-fold cross validation without scikit-learn !

In [0]:
# define improved model
def improved_model():
  # create model
  model = Sequential([
    Dense(4, input_shape=(4,)),
    Activation('relu'),
    Dense(3),
    Activation('softmax'),
   ])
  # Compile model
  model.compile(loss='categorical_crossentropy',optimizer='Adam',metrics=['accuracy'])
  return model

In [32]:
np.random.seed(seed)
k = 10
num_val_samples = len(dataset) // k
np.random.shuffle(dataset)
all_scores = []
num_epochs = 200

for i in range(k):
  print('processing fold #', i)

  # Preparing the validation data and properly partitioning training data
  val_X = X[num_val_samples * i:num_val_samples * (i + 1)]
  train_X = np.append(X[:num_val_samples * i], X[num_val_samples * (i + 1):], axis=0) 
    
  val_Y = dummy_y[num_val_samples * i:num_val_samples * (i + 1)]
  train_Y = np.append(dummy_y[:num_val_samples * i] , dummy_y[num_val_samples * (i + 1):], axis=0)
  # Building the Keras model (already compiled)
  model = improved_model() 
  all_scores = model.fit(train_X,train_Y,epochs=num_epochs,batch_size=5,verbose=0,validation_data = (val_X,val_Y))
  
  # Saving state dictionary of model    
  history_dict = all_scores.history
  val_score = np.average(history_dict['val_acc'])

  print("Final improved model's accuracy: %.2f%% " % (val_score*100))

processing fold # 0
Final improved model's accuracy: 96.80% 
processing fold # 1
Final improved model's accuracy: 92.80% 
processing fold # 2
Final improved model's accuracy: 100.00% 
processing fold # 3
Final improved model's accuracy: 82.97% 
processing fold # 4
Final improved model's accuracy: 66.10% 
processing fold # 5
Final improved model's accuracy: 42.23% 
processing fold # 6
Final improved model's accuracy: 86.17% 
processing fold # 7
Final improved model's accuracy: 100.00% 
processing fold # 8
Final improved model's accuracy: 87.63% 
processing fold # 9
Final improved model's accuracy: 75.30% 


In [33]:
print("Final improved model's accuracy: mean =  %.2f%% and standard deviation = %.2f%% " % (val_score.mean()*100, val_score.std()*100))

Final improved model's accuracy: mean =  75.30% and standard deviation = 0.00% 
