# Image-Classification-on-CIFAR-10-Dataset-Using-CNN

- Image classification is a fascinating deep learning project.
- In this project, we will build a convolution neural network in Keras with python on a CIFAR-10 dataset to Recognise Various Objects and classify them into different classes

# Download and Process your dataset

- The CIFAR-10 dataset consists of 60000 32×32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

- You can download the dataset from https://www.cs.utoronto.ca/~kriz/cifar.html

- Extract the data to a folder and in the same folder create a script to open your dataset.

# About Dataset

1.  The CIFAR-10 data consists of 60,000 32x32 color images in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images in the official data. 

2.  The dataset is broken into batches to prevent our machine from running out of memory. The CIFAR-10 dataset consists of 5 batches, named data_batch_1, data_batch_2, etc. As stated in the official web site, each file packs the data using pickle module in python.

3. **Understanding the Original Data:**
    - The original one batch data is (10000 x 3072) matrix expressed in numpy array. 
    
    - The number of columns, (10000), indicates the number of sample data. 
    
    - As stated in the CIFAR-10/CIFAR-100 dataset, the row vector, (3072) represents an color image of 32x32 pixels.
    
    - Since this project is going to use CNN for the classification tasks, the original row vector is not appropriate.
    
    - In order to feed an image data into a CNN model, the dimension of the input tensor should be:
        1. (width x height x num_channel) or 
        2. (num_channel x width x height)
        
    - It depends on your choice (check out the tensorflow conv2d). I am going to use the first choice because the default choice.

4. **How to reshape into a such form?**

- The row vector for an image has the exact same number of elements if you calculate 32*32*3 == 3072. In order to reshape the row vector into (width x height x num_channel) form, there are two steps required. 

    1. The first step is to use reshape function, and the second step is to use transpose function in numpy. The following direction is described in a logical concept.

        - Divide the row vector into 3 pieces, where each piece means each color channel.
            - the resulting array has (3 x 1024) matrix, which makes (10000 x 3 x 1024) tensor in total.
        - Divide the each 3 pieces further by 32. 32 is width and height of an image.
            - this results in (3 x 32 x 32), which makes (10000 x 3 x 32 x 32) tensor in total

    2. This is not the end of story yet. Now, one image data is represented as (num_channel, width, height) form. However, this is not the shape tensorflow / keras are expecting. They are expecting different shape (width, height, num_channel) instead. We need to swap the order of each axes, and that is where transpose comes in.

        - The transpose can take a list of axes, and each value specifies an index of dimension it wants to move. 
            - For example, calling transpose with argument (1, 2, 0) in an numpy array of (num_channel, width, height) will return a new numpy array of (width, height, num_channel).

5. **The label classes in the dataset are:**

- airplane 
- automobile 
- bird 
- cat 
- deer 
- dog 
- frog 
- horse 
- ship 
- truck

6. The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

**I choose CIFAR 10 dataset to experiment my deep learning theory for the below reasons:**

1. CIFAR 10 is a bit challenging since it has 60K images, which is a lot for a begginer.
2. The images are compressed so that they can be trained with less computational power.
3. CIFAR 10 is very popular so that if I was struck at some point I can easily get lot of help from community.

# Importing the libraries

In [2]:
# Basic Libraries
import numpy as np
import pandas as pd

# Plotting Libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Tensorflow
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.layers import Input, Add, Activation, Dropout, Flatten, Dense
from tensorflow.keras.layers import Convolution2D, MaxPool2D, AveragePooling2D, ZeroPadding2D, Conv2D
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import LearningRateScheduler

from keras.utils import np_utils

# Other Files
import pickle
from os import listdir
from os.path import isfile, join
import os

# Bringing the data to a format acceptable by the CNN

In [3]:
# Function to unpickle the dataset
def unpickle_all_data(directory):
    
    # Initialize the variables
    train = dict()
    test = dict()
    
    train_x = []
    train_y = []
    test_x = []
    test_y = []
    
    # Iterate through all files that we want to train and test
    
    # Train is separated into batches
    for filename in listdir(directory):
        if isfile(join(directory, filename)):
            
            # The train data
            if 'data_batch' in filename:
                print('Handing file: %s' % filename)
                
                # Open the file
                with open(directory + '/' + filename, 'rb') as fo:
                    data = pickle.load(fo, encoding = 'bytes')

                if 'data' not in train:
                    train['data'] = data[b'data']
                    train['labels'] = np.array(data[b'labels'])
                else:
                    train['data'] = np.concatenate((train['data'], data[b'data']))
                    train['labels'] = np.concatenate((train['labels'], data[b'labels']))
                    
            # The test data
            elif 'test_batch' in filename:
                print('Handing file: %s' % filename)
                
                # Open the file
                with open(directory + '/' + filename, 'rb') as fo:
                    data = pickle.load(fo, encoding='bytes')
                
                test['data'] = data[b'data']
                test['labels'] = data[b'labels']
    
    # Manipulate the data to the propper format
    for image in train['data']:
        train_x.append(np.transpose(np.reshape(image,(3,32,32)), (1,2,0)))
        
    train_y = [label for label in train['labels']]
    
    for image in test['data']:
        test_x.append(np.transpose(np.reshape(image,(3,32,32)), (1,2,0)))
        
    test_y = [label for label in test['labels']]
    
    # Transform the data to np array format
    train_x = np.array(train_x)
    train_y = np.array(train_y)
    test_x = np.array(test_x)
    test_y = np.array(test_y)
    
    return (train_x, train_y), (test_x, test_y)

In [4]:
# Run the function with and include the folder where the data are
(x_train, y_train), (x_test, y_test) = unpickle_all_data(os.getcwd() + '/cifar-10-batches-py/')

Handing file: data_batch_1
Handing file: data_batch_2
Handing file: data_batch_3
Handing file: data_batch_4
Handing file: data_batch_5
Handing file: test_batch


In [5]:
# Dumping the data generated into seperate files 
with open('x_test.pickle', 'wb') as handle:
    pickle.dump(x_test, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('y_test.pickle', 'wb') as handle:
    pickle.dump(y_test, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('x_train.pickle', 'wb') as handle:
    pickle.dump(x_train, handle, protocol=pickle.HIGHEST_PROTOCOL)
with open('y_train.pickle', 'wb') as handle:
    pickle.dump(y_train, handle, protocol=pickle.HIGHEST_PROTOCOL)

In [6]:
# Training and validation files

files = ['x_train.pickle', 'y_train.pickle',
         'x_test.pickle', 'y_test.pickle']

# Load training samples
with open(files[0], 'rb') as img_file:
    x_train = pickle.load(img_file, encoding='bytes')
    
# Load training labels
with open(files[1], 'rb') as lb_file:
    y_train = pickle.load(lb_file, encoding='bytes')
    
# Load validation samples
with open(files[2], 'rb') as img_file_v:
    x_test = pickle.load(img_file_v, encoding='bytes')
    
# Load validation labels
with open(files[3], 'rb') as lb_file_v:
    y_test = pickle.load(lb_file_v, encoding='bytes')

## Data Preprocessing

In [7]:
# Transofrm the Features to a float32 type
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# Normalizing the input features
x_train /= 255
x_test /= 255

# Applying One-hot Encoding to lables
num_classes = 10
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)

# Building the model

# 1. Base Model

1. Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function as relu
2. Max Pool layer with size 2×2
3. Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function as relu
4. Max Pool layer with size 2×2
5. Flatten layer
6. Fully connected layer with 256 units and a rectifier activation function as relu
7. Fully connected output layer with 10 units and a softmax activation function
8. Compiling the Model with adam optimizer and metrices as accuracy

In [8]:
# Initialising the Model_1
Model_1 = Sequential()

# Convolution
Model_1.add(Conv2D(filters = 32, kernel_size = (3,3), input_shape = (32,32,3), activation = 'relu'))

# Max Pooling
Model_1.add(MaxPool2D(pool_size = (2,2)))

# Adding 2nd Convolutional Layer
Model_1.add(Conv2D(filters = 32, kernel_size = (3,3), activation = 'relu'))

# Max Pooling
Model_1.add(MaxPool2D(pool_size = (2,2)))

# Flattening
Model_1.add(Flatten())

# Full Connection
Model_1.add(Dense(256, activation = 'relu'))

# Output Layer
Model_1.add(Dense(10, activation = 'softmax'))

# Compiling Model_1
Model_1.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

In [9]:
Model_1.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d (Conv2D)              (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 13, 13, 32)        9248      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 6, 6, 32)          0         
_________________________________________________________________
flatten (Flatten)            (None, 1152)              0         
_________________________________________________________________
dense (Dense)                (None, 256)               295168    
_________________________________________________________________
dense_1 (Dense)              (None, 10)                2

In [10]:
Model_1_result = Model_1.fit(x_train, y_train, epochs = 20, validation_data = (x_test, y_test), workers = -1)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [11]:
Model_1_evaluation = Model_1.evaluate(x_test, y_test)
print('Test Loss is : {} and the Test Accuracy is: {}'.format(Model_1_evaluation[0], Model_1_evaluation[1]))

Test Loss is : 2.175469160079956 and the Test Accuracy is: 0.678600013256073


## Model 1 Conclusions

1. Using our base model we are able to get 96.80% training accuracy and 67.86% test accuracy, which is not that great but it is not bad either so we use our very simple model. 

2. If you observe the model training log above you can see that Validation loss is increasing a lot which means our model is overfitting. That means our model performs good on the training data but fails to generalize on the unseen data. 

3. Lets try to reduce overfitting. How to reduce Overfitting?
    1. Add more data
    2. Decrease model complexity
    3. Apply regularization

4. Option 1 and 2 are not a good fit in our case, since getting more data is very tough and out model is not that complex to reduce the architecture. 

5. So lets regularize the model. In deep learning drop out is very good form of regularization. Dropout is a regularization method that approximates training a large number of neural networks with different architectures in parallel.

6. Drop out literally means we gonna drop some of the neural units randomly from our network, which forces the next layer to learn the patterns again. In this way our model will become more robust and generalizes well. 

7. Lets try to add drop out after each CNN layer + MaxPool Layer.

# 2. Model 2 with Drop outs

1. Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function as relu
2. Max Pool layer with size 2×2
3. Dropout set to 25%
4. Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function as relu
5. Max Pool layer with size 2×2
6. Dropout set to 25%
7. Flatten layer
8. Fully connected layer with 256 units and a rectifier activation function as relu
9. Dropout set to 50%
10. Fully connected output layer with 10 units and a softmax activation function
11. Compiling the Model with adam optimizer and metrices as accuracy

In [12]:
# Initialising the Model_2
Model_2 = Sequential()

# Convolution
Model_2.add(Conv2D(filters = 32, kernel_size = (3,3), input_shape = (32,32,3), activation = 'relu'))

# Max Pooling
Model_2.add(MaxPool2D(pool_size = (2,2)))

# Adding Dropouts
Model_2.add(Dropout(0.25))

# Adding 2nd Convolutional Layer
Model_2.add(Conv2D(filters = 32, kernel_size = (3,3), activation = 'relu'))

# Max Pooling
Model_2.add(MaxPool2D(pool_size = (2,2)))

# Adding Dropouts
Model_2.add(Dropout(0.25))

# Flattening
Model_2.add(Flatten())

# Full Connection
Model_2.add(Dense(256, activation = 'relu'))

# Adding Dropouts
Model_2.add(Dropout(0.50))

# Output Layer
Model_2.add(Dense(10, activation = 'softmax'))

# Compiling Model_1
Model_2.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

In [13]:
Model_2.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_2 (Conv2D)            (None, 30, 30, 32)        896       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 15, 15, 32)        0         
_________________________________________________________________
dropout (Dropout)            (None, 15, 15, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 13, 13, 32)        9248      
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 6, 6, 32)          0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 6, 6, 32)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 1152)             

In [14]:
Model_2_result = Model_2.fit(x_train, y_train, epochs = 30, validation_data = (x_test, y_test), workers = -1)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [15]:
Model_2_evaluation = Model_2.evaluate(x_test, y_test)
print('Test Loss is : {} and the Test Accuracy is: {}'.format(Model_2_evaluation[0], Model_2_evaluation[1]))

Test Loss is : 0.788870632648468 and the Test Accuracy is: 0.7297000288963318


## Model 2 Conclusions

1. Thats cool, Improvememt of validation accuracy from 67 to 72.97% is a good start, lets try to add more number of layers with increase in the number of filters. 
2. Also looks like we can train out model for more number of epochs since our validation loss is in control. Thanks to drop outs!!

# Model 3: with Dropouts, More Layers and More Filters

1. Convolutional input layer, 64 feature maps with a size of 3×3, a rectifier activation function as relu
2. Convolutional input layer, 64 feature maps with a size of 3×3, a rectifier activation function as relu
3. Max Pool layer with size 2×2
4. Dropout set to 40%
5. Convolutional input layer, 128 feature maps with a size of 3×3, a rectifier activation function as relu
6. Convolutional input layer, 128 feature maps with a size of 3×3, a rectifier activation function as relu
7. Max Pool layer with size 2×2
8. Dropout set to 40%
9. Flatten layer
10. Fully connected layer with 512 units and a rectifier activation function as relu
11. Fully connected layer with 512 units and a rectifier activation function as relu
12. Dropout set to 50%
13. Fully connected output layer with 10 units and a softmax activation function
14. Compiling the Model with adam optimizer and metrices as accuracy

In [16]:
# Initialising the Model_3
Model_3 = Sequential()

# Convolution
Model_3.add(Conv2D(filters = 64, kernel_size = (3,3), input_shape = (32,32,3), activation = 'relu'))

# Adding 2nd Convolutional Layer
Model_3.add(Conv2D(filters = 64, kernel_size = (3,3), activation = 'relu'))

# Max Pooling
Model_3.add(MaxPool2D(pool_size = (2,2)))

# Adding Dropouts
Model_3.add(Dropout(0.40))

# Adding 3rd Convolutional Layer
Model_3.add(Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'))

# Adding 4th Convolutional Layer
Model_3.add(Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu'))

# Max Pooling
Model_3.add(MaxPool2D(pool_size = (2,2)))

# Adding Dropouts
Model_3.add(Dropout(0.40))

# Flattening
Model_3.add(Flatten())

# Full Connection
Model_3.add(Dense(512, activation = 'relu'))

# Full Connection
Model_3.add(Dense(512, activation = 'relu'))

# Adding Dropouts
Model_3.add(Dropout(0.50))

# Output Layer
Model_3.add(Dense(10, activation = 'softmax'))

# Compiling Model_3
Model_3.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

In [17]:
Model_3.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_4 (Conv2D)            (None, 30, 30, 64)        1792      
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 28, 28, 64)        36928     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 14, 14, 64)        0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 12, 12, 128)       73856     
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 10, 10, 128)       147584    
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 5, 5, 128)        

In [18]:
Model_3_result = Model_3.fit(x_train, y_train, epochs = 30, validation_data = (x_test, y_test), workers = -1)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [19]:
Model_3_evaluation = Model_3.evaluate(x_test, y_test)
print('Test Loss is : {} and the Test Accuracy is: {}'.format(Model_3_evaluation[0], Model_3_evaluation[1]))

Test Loss is : 0.7137004137039185 and the Test Accuracy is: 0.769599974155426


## Model 3 Conclusions

1. Adding more layers and increasing droup out from 0.25 to 0.4 resulted in 76.95 % accuracy which a good improvement, so lets try increase layers.
2. Also dont just blindly add more layers. If you keep adding more CNN's we will lose the data as data will be reduced in size due to convolution. So when we use small size images its good idea to limit the number of layers.

# Model 4: with Increased Dropouts, More Layers, More Filters, Padding and Using He Kernel Initializer

1. Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
2. Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
3. Max Pool layer with size 2×2
4. Dropout set to 20%
5. Convolutional input layer, 64 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
6. Convolutional input layer, 64 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
7. Max Pool layer with size 2×2
8. Dropout set to 25%
9. Convolutional input layer, 128 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
10. Convolutional input layer, 128 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
11. Max Pool layer with size 2×2
12. Dropout set to 30%
13. Flatten layer
14. Fully connected layer with 128 units and a rectifier activation function as relu
15. Dropout set to 40%
16. Fully connected output layer with 10 units and a softmax activation function
17. Compiling the Model with adam optimizer and metrices as accuracy

In [21]:
# Initialising the Model_4
Model_4 = Sequential()

# Convolution
Model_4.add(Conv2D(filters = 32, kernel_size = (3,3), input_shape = (32,32,3), activation = 'relu', padding = 'same',
                   kernel_initializer = 'he_uniform'))

# Adding 2nd Convolutional Layer
Model_4.add(Conv2D(filters = 32, kernel_size = (3,3), activation = 'relu', kernel_initializer = 'he_uniform', 
                   padding='same'))

# Max Pooling
Model_4.add(MaxPool2D(pool_size = (2,2)))

# Adding Dropouts
Model_4.add(Dropout(0.20))

# Adding 3rd Convolutional Layer
Model_4.add(Conv2D(filters = 64, kernel_size = (3,3), activation = 'relu', kernel_initializer='he_uniform', 
                   padding='same'))

# Adding 4th Convolutional Layer
Model_4.add(Conv2D(filters = 64, kernel_size = (3,3), activation = 'relu', kernel_initializer='he_uniform', 
                   padding='same'))

# Max Pooling
Model_4.add(MaxPool2D(pool_size = (2,2)))

# Adding Dropouts
Model_4.add(Dropout(0.25))

# Adding 5th Convolutional Layer
Model_4.add(Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu', kernel_initializer='he_uniform', 
                   padding='same'))

# Adding 6th Convolutional Layer
Model_4.add(Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu', kernel_initializer='he_uniform', 
                   padding='same'))

# Max Pooling
Model_4.add(MaxPool2D(pool_size = (2,2)))

# Adding Dropouts
Model_4.add(Dropout(0.30))

# Flattening
Model_4.add(Flatten())

# Full Connection
Model_4.add(Dense(128, activation = 'relu', kernel_initializer='he_uniform'))

# Adding Dropouts
Model_4.add(Dropout(0.40))

# Output Layer
Model_4.add(Dense(10, activation = 'softmax'))

# Compiling Model_3
Model_4.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

In [22]:
Model_4.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_8 (Conv2D)            (None, 32, 32, 32)        896       
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 32, 32, 32)        9248      
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
dropout_6 (Dropout)          (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_10 (Conv2D)           (None, 16, 16, 64)        18496     
_________________________________________________________________
conv2d_11 (Conv2D)           (None, 16, 16, 64)        36928     
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 8, 8, 64)         

In [23]:
Model_4_result = Model_4.fit(x_train, y_train, epochs = 30, validation_data = (x_test, y_test), workers = -1)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [24]:
Model_4_evaluation = Model_4.evaluate(x_test, y_test)
print('Test Loss is : {} and the Test Accuracy is: {}'.format(Model_4_evaluation[0], Model_4_evaluation[1]))

Test Loss is : 0.6353070735931396 and the Test Accuracy is: 0.8025000095367432


## Model 4 Conclusions

1.  If you see the behaviour our model, we can still train our model longer since validation loss is not getting worse, ofcourse there are some fluctuations and distrotions in loss. 
2. Either we can run it for 200 epochs or more or simply leave it here and work on how to control those fluctuations and then train our model for more number of epochs. 
3. Lets go for option 2

# Model 5: with Increased Dropouts, More Layers, More Filters, Padding, Using He Kernel Initializer and using Batch Normalization

- Batch normalization works just the same way as we normalize the input data where we divided the x_train/255. What we are trying to do there is we are arranging all the features in same scale so that model converges easily and we can reduce the distrotions. Our input layer is lucky enough to have everything in same scale why not the rest of the layers?

- This is what exactly we do in batch normalization, when ever we passs the CNN throuh a batch normalization layer we are normalizing the weights so that our model will be stable and we can train model longer and also use larger learning rate.

- Also the kind of normalization we use is batch normalization which means we compute mean and varience for each mini batch not the whole data.

1. Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
2. Add batch Normalization
3. Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
4. Add batch Normalization
5. Max Pool layer with size 2×2
6. Dropout set to 20%
7. Convolutional input layer, 64 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
8. Add batch Normalization
9. Convolutional input layer, 64 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
10. Add batch Normalization
11. Max Pool layer with size 2×2
12. Dropout set to 30%
13. Convolutional input layer, 128 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
14. Add batch Normalization
15. Convolutional input layer, 128 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
16. Add batch Normalization
17. Max Pool layer with size 2×2
18. Dropout set to 40%
19. Flatten layer
20. Fully connected layer with 128 units and a rectifier activation function as relu
21. Add batch Normalization
22. Dropout set to 50%
23. Fully connected output layer with 10 units and a softmax activation function
24. Compiling the Model with adam optimizer and metrices as accuracy

In [26]:
# Initialising the Model_5
Model_5 = Sequential()

# Convolution
Model_5.add(Conv2D(filters = 32, kernel_size = (3,3), input_shape = (32,32,3), activation = 'relu', padding = 'same',
                   kernel_initializer='he_uniform'))

# Adding Batch Normalization
Model_5.add(BatchNormalization())

# Adding 2nd Convolutional Layer
Model_5.add(Conv2D(filters = 32, kernel_size = (3,3), activation = 'relu', kernel_initializer='he_uniform', 
                   padding='same'))

# Adding Batch Normalization
Model_5.add(BatchNormalization())

# Max Pooling
Model_5.add(MaxPool2D(pool_size = (2,2)))

# Adding Dropouts
Model_5.add(Dropout(0.20))

# Adding 3rd Convolutional Layer
Model_5.add(Conv2D(filters = 64, kernel_size = (3,3), activation = 'relu', kernel_initializer='he_uniform', 
                   padding='same'))

# Adding Batch Normalization
Model_5.add(BatchNormalization())

# Adding 4th Convolutional Layer
Model_5.add(Conv2D(filters = 64, kernel_size = (3,3), activation = 'relu', kernel_initializer='he_uniform', 
                   padding='same'))

# Adding Batch Normalization
Model_5.add(BatchNormalization())

# Max Pooling
Model_5.add(MaxPool2D(pool_size = (2,2)))

# Adding Dropouts
Model_5.add(Dropout(0.3))

# Adding 5th Convolutional Layer
Model_5.add(Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu', kernel_initializer='he_uniform', 
                   padding='same'))

# Adding Batch Normalization
Model_5.add(BatchNormalization())

# Adding 6th Convolutional Layer
Model_5.add(Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu', kernel_initializer='he_uniform', 
                   padding='same'))

# Adding Batch Normalization
Model_5.add(BatchNormalization())

# Max Pooling
Model_5.add(MaxPool2D(pool_size = (2,2)))

# Adding Dropouts
Model_5.add(Dropout(0.40))

# Flattening
Model_5.add(Flatten())

# Full Connection
Model_5.add(Dense(128, activation = 'relu', kernel_initializer='he_uniform'))

# Adding Batch Normalization
Model_5.add(BatchNormalization())

# Adding Dropouts
Model_5.add(Dropout(0.50))

# Output Layer
Model_5.add(Dense(10, activation = 'softmax'))

# Compiling Model_3
Model_5.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

In [27]:
Model_5.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_14 (Conv2D)           (None, 32, 32, 32)        896       
_________________________________________________________________
batch_normalization (BatchNo (None, 32, 32, 32)        128       
_________________________________________________________________
conv2d_15 (Conv2D)           (None, 32, 32, 32)        9248      
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 32)        128       
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
dropout_10 (Dropout)         (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_16 (Conv2D)           (None, 16, 16, 64)       

In [28]:
Model_5_result = Model_5.fit(x_train, y_train, epochs = 30, validation_data = (x_test, y_test), workers = -1)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [29]:
Model_5_evaluation = Model_5.evaluate(x_test, y_test)
print('Test Loss is : {} and the Test Accuracy is: {}'.format(Model_5_evaluation[0], Model_5_evaluation[1]))

Test Loss is : 0.43515413999557495 and the Test Accuracy is: 0.8636000156402588


## Model 5 Conclusions

1. Adding batch normalizaton yeilds very good results and we are able to get 86.36% validation accuracy which is really good. 
2. Batch normalization made our average results into excellent results. Still there is a lot of improvement scope as the Benchmarks.ai website have achieved 99.37. So there is definitely a lot we can do.

# Model 6: Image augmentation (Using Keras)

1. Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
2. Add batch Normalization
3. Convolutional input layer, 32 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
4. Add batch Normalization
5. Max Pool layer with size 2×2
6. Dropout set to 20%
7. Convolutional input layer, 64 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
8. Add batch Normalization
9. Convolutional input layer, 64 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
10. Add batch Normalization
11. Max Pool layer with size 2×2
12. Dropout set to 30%
13. Convolutional input layer, 128 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
14. Add batch Normalization
15. Convolutional input layer, 128 feature maps with a size of 3×3, a rectifier activation function as relu, padding as same and kernel initializer as He.
16. Add batch Normalization
17. Max Pool layer with size 2×2
18. Dropout set to 40%
19. Flatten layer
20. Fully connected layer with 128 units and a rectifier activation function as relu
21. Add batch Normalization
22. Dropout set to 50%
23. Fully connected output layer with 10 units and a softmax activation function
24. Compiling the Model with adam optimizer and metrices as accuracy

In [31]:
# Initialising the Model_6
Model_6 = Sequential()

# Convolution
Model_6.add(Conv2D(filters = 32, kernel_size = (3,3), input_shape = (32,32,3), activation = 'relu', padding = 'same',
                   kernel_initializer='he_uniform'))

# Adding Batch Normalization
Model_6.add(BatchNormalization())

# Adding 2nd Convolutional Layer
Model_6.add(Conv2D(filters = 32, kernel_size = (3,3), activation = 'relu', kernel_initializer='he_uniform', 
                   padding='same'))

# Adding Batch Normalization
Model_6.add(BatchNormalization())

# Max Pooling
Model_6.add(MaxPool2D(pool_size = (2,2)))

# Adding Dropouts
Model_6.add(Dropout(0.20))

# Adding 3rd Convolutional Layer
Model_6.add(Conv2D(filters = 64, kernel_size = (3,3), activation = 'relu', kernel_initializer='he_uniform', 
                   padding='same'))

# Adding Batch Normalization
Model_6.add(BatchNormalization())

# Adding 4th Convolutional Layer
Model_6.add(Conv2D(filters = 64, kernel_size = (3,3), activation = 'relu', kernel_initializer='he_uniform', 
                   padding='same'))

# Adding Batch Normalization
Model_6.add(BatchNormalization())

# Max Pooling
Model_6.add(MaxPool2D(pool_size = (2,2)))

# Adding Dropouts
Model_6.add(Dropout(0.3))

# Adding 5th Convolutional Layer
Model_6.add(Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu', kernel_initializer='he_uniform', 
                   padding='same'))

# Adding Batch Normalization
Model_6.add(BatchNormalization())

# Adding 6th Convolutional Layer
Model_6.add(Conv2D(filters = 128, kernel_size = (3,3), activation = 'relu', kernel_initializer='he_uniform', 
                   padding='same'))

# Adding Batch Normalization
Model_6.add(BatchNormalization())

# Max Pooling
Model_6.add(MaxPool2D(pool_size = (2,2)))

# Adding Dropouts
Model_6.add(Dropout(0.40))

# Flattening
Model_6.add(Flatten())

# Full Connection
Model_6.add(Dense(128, activation = 'relu', kernel_initializer='he_uniform'))

# Adding Batch Normalization
Model_6.add(BatchNormalization())

# Adding Dropouts
Model_6.add(Dropout(0.50))

# Output Layer
Model_6.add(Dense(10, activation = 'softmax'))

# Compiling Model_3
Model_6.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

In [33]:
# Image Augmentation

data_generator = ImageDataGenerator(width_shift_range = 0.1, height_shift_range = 0.1, horizontal_flip = True, 
                                    rotation_range = 20)
train_set = data_generator.flow(x_train, y_train)
steps = int(x_train.shape[0] / 64)

In [34]:
Model_6_result = Model_6.fit(x_train, y_train, epochs = 50, validation_data = (x_test, y_test), workers = -1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


In [35]:
Model_6_evaluation = Model_6.evaluate(x_test, y_test)
print('Test Loss is : {} and the Test Accuracy is: {}'.format(Model_6_evaluation[0], Model_6_evaluation[1]))

Test Loss is : 0.4836590886116028 and the Test Accuracy is: 0.8568999767303467


## Model 6 Conclusions

1. Image augmentation will always helps model to genearalize more. 
2. We are now getting 87.28% validation accuracy with 0.43 validation loss at 47th epoch.
3. If we keep on increasing the Epoch then the accuracy might improve. But let us stop our exploration here and conclude the results. 
3. Later some other time we will explore the rest of the 12 percent accuracy!!

# End Results

- 87.28 % accuracy is not bad and there is lot to try, we have pretrained models, we have complex architectures. Also we can gather more data. If we have good computational resources we can play with batch size and learning rate. Which we will explor in other note books.

- This note book is a combination of multiple tutorials and different kaggle kernals. If you find this note books intresting run the code and play with parameters and see how it is behaving. If you know someother ways to improve accuracy beyond 87.28 please comment so that we can try.

- I read that using pretrained model will improve the accuracy a lot since the models already have certain filters or layers that are good at recognizing certain image. We will try that in a different notebook.