
## A dive into a data scientist work 

The hands-on section will follow 3 main steps:

  - **Prepare the groundwork**
  
  After importing the necessary dependencies, you will learn about what neural networks hyperparameters are, and what values we chose to assign to them. You will also perform a couple of transformations on the dataset to have it ready for the training.
      
    
  - **Build and train your model**
  
  You will create a CNN based on architectural choices we made for you. You will be able to train the model and visualize some important metrics that guide a data scientist during the development cycle. Take the time to get familiar with the learning process that is common to different types of neural networks.
   
   
  - **Evaluate and enhance the model**
 
 After visualizing the predictions made by the model on some handwritten digit images, you will have the opportunity to adjust the neural network and get better predictions. For this, you can tune the hyperparameters and the architecture of the neural network to increase its accuracy.

## Prepare the groundwork

### Importing libraries

You will use Keras, a high level API for Tensorflow to design and train your first simple CNN. 
Frameworks like Tensorflow, Pytorch,etc. are practical to hide the computational complexity that the forward and backpropagation bring for instance. The coding effort is then focused on designing deep and complex neural network architectures, exploring creative metrics to make the training more efficient, etc.

In [None]:
import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import warnings

import tensorflow as tf
from tensorflow.keras.datasets import mnist #28x28 images
from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing.image import img_to_array, load_img
from tensorflow.keras.utils import plot_model


You are handling dataset where each image is a 28x28 pixels wide handwritten digit. 

Each input image being categorized to a '0' to '9' class, we have a total of 10 classes.

<img src="https://huchma.fi/wp-content/posts_material/001/mnist/mnist.png" width="250" align="center"/>

In [None]:
# input image dimensions
img_rows, img_cols = 28, 28

# numbers of predictable classes, from 0 to 9
nb_classes = 10

### Defining Hyperparameters:

While the gradient descent with forward and backpropagation helps to learn the parameters of a neural network, the hyperparameters are a set of non learnable parameters that still have a big impact on the accuracy one can reach after the training. They will then directly impact the performance of the model.

To visualize the hyperparameters you can refer to the CNN layout below:


<img src="illustration/new network.png" width="500" align="center"/>


Here a predefined list we suggest to start with:


In [None]:
#################
# Hyperparameters
#################
batch_size = 8
nb_epoch = 1
# number of convolutional filters to use
nb_filters = 10
# number of neurons in the dense layers
nb_dense_layer_1 = 40
nb_dense_layer_2 = 10
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)
dropout_rate = 0.9

Below is a description to better understand what they represent:
    - Epoch : a complete pass over the entire dataset in the training phase
    - Batch : a subset of the dataset over which the loss is calculated to perform a single gradient descent step
    - Convolutional filters (or kernel) number and size :
<img src="https://d2l.ai/_images/conv-pad.svg" width="300" align="center"/>
                
                image source: https://d2l.ai/chapter_convolutional-neural-networks/padding-and-strides.html

    - Pooling size:
<img src="https://qph.fs.quoracdn.net/main-qimg-98ecf7ba49710bf56042d035a74505b6" width="250" align="center"/> 
                
                image source:quora.com/What-is-max-pooling-in-convolutional-neural-networks
                
    - Dropout: a regularization technique that helps preventing neural networks from overfitting.

<img src="illustration/bootstrap-aggregating-img2.png" width="550" align="center"/>

                image source:"DEEP LEARNING BOOK – IAN GOODFELLOW AND YOSHUA BENGIO AND AARON COURVILLE"

### Data Preparation
The dataset is always split at least into training and test data:
  - **60.000 images used to train** the neural network. Every image will be passed through the CNN to teach the network and update the parameters following the learning process
  
  - **10.000 images used to test** the accuracy of the network. This subset is never seen by the network during the training. It then represents how the model would behave with real life data.

In [None]:
# The data, shuffled and split between train and test sets
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()

print('Training data is composed of', X_train.shape[0], 'samples of', X_train.shape[1], 'x', X_train.shape[2], 'grayscale images')
print('Test data is composed of', X_test.shape[0], 'samples of', X_test.shape[1], 'x', X_test.shape[2], 'grayscale images\n')







Other steps are necessary before starting the training process. Here are some examples:

    - Normalization for a more stable and fast learning
    - Label transformation to move to a one hot vector representation of the classes. 
    A one hot vector representation (see image below) transforms a one value label to a vector with 10 elements. '1' is set for the element with the index corresponding to the label value (5th element when the digit is '5' for instance), and '0' elsewhere.
    The network will thus make a prediction of a vector with 10 values, each corresponding to one of the 10 classes. It will express the probability that the picture belongs to each of the classes, showing (hopefully) more confidence on the right class, through a higher probability value.


<img src="https://i.imgur.com/wKtY1Og.png" width="400" align="center"/> 

                image source:https://www.pluralsight.com/guides/getting-started-tensorflow
 

In [None]:
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
# normalization
X_train /= 255
X_test /= 255

# convert class vectors to binary class matrices (one hot vector encoding)
first_element = Y_train[0]
Y_train = tf.keras.utils.to_categorical(Y_train, nb_classes)
Y_test = tf.keras.utils.to_categorical(Y_test, nb_classes)
print('Y_train shape after moving to categorical representation:', Y_train.shape,'\n')
print('First element in Y_train is a :',first_element)
print('First element Y_train after transformation :', Y_train[0])


## Build and train your model

### Building the CNN


You can recognize the model from the representation below:

<img src="illustration/new network.png" width="500" align="center"/>



In [None]:
def mycnn():
    myinput = tf.keras.Input(shape=input_shape)
    # Convolution layers
    x = tf.keras.layers.Conv2D(nb_filters, kernel_size, strides=(1,1), padding='same', activation=tf.nn.relu)(myinput)
    x = tf.keras.layers.MaxPool2D(pool_size, pool_size, padding='same')(x)
    x = tf.keras.layers.Dropout(rate=dropout_rate)(x)
    
    # Dense layers
    x = tf.keras.layers.Flatten()(x)
    x = tf.keras.layers.Dense(nb_dense_layer_1, activation = tf.nn.relu)(x)
    x = tf.keras.layers.Dropout(rate=dropout_rate)(x)
    x = tf.keras.layers.Dense(nb_dense_layer_2, activation = tf.nn.softmax)(x)
    model = tf.keras.Model(inputs=myinput, outputs=x)        
    
    return model

warnings.filterwarnings('ignore')
mymodel = mycnn()
mymodel.summary()

In [None]:
mymodel.compile(loss='categorical_crossentropy',
              optimizer='Adam',
              metrics=['accuracy'])
history = mymodel.fit(X_train, Y_train,
                    epochs=nb_epoch, batch_size=batch_size,
                    verbose=1, validation_data=(X_test, Y_test))

As the training is a long process, you might have noticed that the training was only performed for **1 epoch**. Usually the number of epochs would be higher than that.

You can see below a graph showing the evolution of loss and accuracy for a **12 epochs training**.

In [None]:
df_hist=pd.read_csv('models/history_cnn_12_epochs.csv')
plt.plot(df_hist['acc'])
plt.plot(df_hist['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

In [None]:
df_hist = pd.read_csv('models/history_cnn_12_epochs.csv')
plt.plot(df_hist['loss'])
plt.plot(df_hist['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc = 'upper left')
plt.show()

## Evaluate and enhance the model

### Model evaluation

In [None]:
res = mymodel.evaluate(X_test, Y_test)
print('Test loss after 1 epoch training:', res[0])
print('Test accuracy after 1 epoch training:', res[1])

In [None]:
for filename in os.listdir("./samples"):
    if filename.endswith(".jpg"): 
        wholename = "./samples" + "/" + filename
        img = load_img(wholename, False, target_size = (28, 28))
        x = img_to_array(img)
        x1 = x[:, :, 1]
        x1 = np.expand_dims(x1, axis = 2)
        x1 = np.expand_dims(x1, axis = 0)
        label = -1
        plt.title('Original Label %d' % (label) )
        plt.imshow(x1.reshape([28,28]), cmap=plt.get_cmap('gray'))
        plt.show()
        prob = mymodel.predict(x1) 
        pred = prob.argmax(axis=-1)
        print('pred', pred, 'prob', prob)
        print("------------------------------")
        continue


### Train a better model

You have been able to create and train a first simple CNN to recognize hand written digits. As you have noticed, its performance is still to be enhanced and this will be the purpose of this section.

Below you can view the performance with a model we trained increasing the number of epochs and changing some hyperparameters like the number of dense layers.


In [None]:
# Loading a pretrained good model we prepared for you

six_epochs_model = tf.keras.models.load_model('models/cnn_6epochs.h5')
six_epochs_score = six_epochs_model.evaluate(X_test, Y_test)
print('Test loss after 6 epochs training:', six_epochs_score[0])
print('Test accuracy after 6 epochs training:', six_epochs_score[1])



You can try to increase the performance following **some** of these hints:
* Changing the dropout value (hint: read the tensorflow warning)
* Increasing the batch size
* Increasing the number of epochs
* Increasing the number of neurons in the dense layer
* Increasing the number of conv filters
* Changing the filter size to 5x5, or adding extra conv layers
* etc.

Check the number of parameters and the accuracy of the training and validation/test set. You can also inspect the prediction on some of the samples under 'images' folder.

In [None]:
# You can set SOME new values here
#################
# Hyperparameters
#################
batch_size = 8
nb_epoch = 1
# number of convolutional filters to use
nb_filters = 10
# number of neurons in the dense layers
nb_dense_layer_1 = 40
nb_dense_layer_2 = 10
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (3, 3)

# Check the tensorflow warning, dropout value might be too high
dropout_rate = 0.9

def my_new_cnn():
    myinput = tf.keras.Input(shape=input_shape)
    x = tf.keras.layers.Conv2D(nb_filters, kernel_size, strides=(1,1), padding='same', activation=tf.nn.relu)(myinput)
    # hint: maybe an additional Conv layer ?
    x = tf.keras.layers.MaxPool2D(pool_size, pool_size, padding='same')(x)
    x = tf.keras.layers.Dropout(rate=dropout_rate)(x)
    x = tf.keras.layers.Flatten()(x)
    x = tf.keras.layers.Dense(nb_dense_layer_1, activation = tf.nn.relu)(x)
    x = tf.keras.layers.Dropout(rate=dropout_rate)(x)
    x = tf.keras.layers.Dense(nb_dense_layer_2, activation = tf.nn.softmax)(x)
    model = tf.keras.Model(inputs=myinput, outputs=x)        
    
    return model

my_new_model = my_new_cnn()
my_new_model.summary()

In [None]:
my_new_model.compile(loss='categorical_crossentropy',
              optimizer='Adam',
              metrics=['accuracy'])
history = my_new_model.fit(X_train, Y_train,
                    epochs=nb_epoch, batch_size=batch_size,
                    verbose=1, validation_data=(X_test, Y_test))

In [None]:
for filename in os.listdir("./samples"):
    if filename.endswith(".jpg"): 
        wholename = "./samples" + "/" + filename
        img = load_img(wholename, False, target_size = (28, 28))
        x = img_to_array(img)
        x1 = x[:, :, 1]
        x1 = np.expand_dims(x1, axis = 2)
        x1 = np.expand_dims(x1, axis = 0)
        label = -1
        plt.title('Original Label %d' % (label) )
        plt.imshow(x1.reshape([28,28]), cmap=plt.get_cmap('gray'))
        plt.show()
        prob = my_new_model.predict(x1) 
        pred = prob.argmax(axis=-1)
        print('pred', pred, 'prob', prob)
        print("------------------------------")
        continue


#### Congratulations!  

You have not only trained a simple CNN, but you have also learned to improve the performance of the model. This concludes our lab portion. 

You can now proceed to the Conclusion notebook.