


# Convolutional Neural Networks .

<center><img src = "https://adeshpande3.github.io/assets/Cover.png"><img></center>

In this notebook, I'm willing to use a Convolutional Neural Network to solve this problem.
Convolutional Neural Networks are very powerful at detecting features, like edges, parts of objets, 
and even complete objects. The more deep the network is, the more the performance will increase, but
you may face the problem of vanishing and exploding gradients.

Vanishing and exploding gradients. When you go deeper and deeper by stacking layers, the network learns intricate functions. Although this type of model building might be benign, and aids us in increasing the accuracy, it fails to learn identity functions. Theoretically, machine learning practitioners have established that as the number of layers increase, the accuracy increases. Empirically, it has been shown that this statement is far from the truth. In fact, as the layers kept increasing, the gain in accuracy was diminishing. The culprit, here, is the gradients. Succumbing to the depth of the layers, the gradients either vanished, i.e. became too small for the update to make some worthwhile progress, or exploded, i.e., became too big for the update to overshoot the minima.

**Dataset description:**
<div class="alert-warning">
The data files train.csv and test.csv contain gray-scale images of hand-drawn digits, from zero        through nine. Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255, inclusive.
    
The training data set, (train.csv), has 785 columns. The first column, called "label", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image.    
The test data set, (test.csv), is the same as the training set, except that it does not contain the "label" column.        
</div><br><br>    

Deep Learning applied to computer vision now is exploding. 
<center><img src="https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcRoqxX1AkrwR2-P5u31R4OtlWFcfKG2dJ9kXQ&usqp=CAU"><img></center>

Shall we start now? Ok, let's first load some useful packages.
Here are some definitions of them not all and refrences for these packages.

   - [**Numpy**](https://numpy.org/) : is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices.
   - [**Pandas**](https://pandas.pydata.org/) :  is a software library written for the Python programming language for data manipulation and analysis. 
   - [**Matplotlib**](https://matplotlib.org/) : Python library for plotting graphs, that is data visualization.
   - [**Pyplot**](https://matplotlib.org/api/pyplot_api.html) : is a Matplotlib module which provides a MATLAB-like interface.
   - [**TensorFlow**](https://www.tensorflow.org/) : Deep Learning framework created by Google Brain.
   - [**Keras**](https://keras.io/) : is an open-source neural-network library written in Python. It is capable of running on top of TensorFlow
   
If you want to learn more about CNNs, there a great material [here](https://www.youtube.com/watch?v=ArPaAX_PhIs&list=PLkDaE6sCZn6Gl29AoE31iwdVwSG-KnDzF).   

In [None]:
from matplotlib import pyplot as plt 
import os 
import scipy
import numpy as np
import pandas as pd
import IPython
import tensorflow as tf
import keras 
import seaborn as sns
import warnings as w
import sklearn.metrics as Metric_tools
from sklearn.model_selection import train_test_split
import cv2

%load_ext autoreload
%autoreload 2

np.random.seed(1)
w.filterwarnings('ignore')

### Loading the dataset .

Let's now load the two csv files under the input folder. `test.csv` & `train.csv`

In [None]:
main_path = r"../input/digit-recognizer"
print("Files  : \n\t {} ".format(os.listdir(main_path)))

In [None]:
train_file = pd.read_csv(os.path.join(main_path, "train.csv"))
test_file  = pd.read_csv(os.path.join(main_path, "test.csv"))

In [None]:
print("Training file : ")
train_file.head(3).iloc[:,:17]

In [None]:
print("Testing file : ")
test_file.head(3).iloc[:,:17]

In the train file, the first column is the label (`0..9, output`).
You can change .iloc[:, `this`] to see more columns. Each column has a max value of 255 and a min value of 0. That is each pixel has values between 0..255. 

<center>
<img src="https://seis.bristol.ac.uk/~ggjlb/teaching/ccrs_tutorial/tutorial/graphics/content/pixel.gif"></img></center>

Let's now see the description of both these two files.
Descritption is a method with dataframes, it allows us to see the statistical values of a dataset.
Like for example, the mean, the std (`standard deviation`), the max and min values etc,...

In [None]:
print("Description of the training : ")
disc_train = train_file.describe().T
disc_train.iloc[1:10, :]

In [None]:
print("Description of the testing : ")
disc_test = test_file.describe().T
disc_test.iloc[:10, :]

Let's visualize the mean of each pixel in a bar plot. 

**Before Scaling :**
   - The content in the following plots represents the data before scaling, you can see that the features vary. The first features have 0 in the mean value, that's because all the values in the first features are zeros (`Black background`).

In [None]:
fig, ax_arr = plt.subplots(1, 2, figsize=(14, 4))
fig.subplots_adjust(wspace=0.25, hspace=0.025)

ax_arr = ax_arr.ravel()

sets = iter([(disc_train, "training"), (disc_test, "testing")])
for i, ax in enumerate(ax_arr):
    set_ = next(sets)
    ax.plot(set_[0].loc[:, "mean"], label="Mean")
    ax.set_title("Mean of the {} features.".format(set_[1]))
    ax.set_xlabel('Pixels')
    ax.set_ylabel('Mean')
    ax.set_xticks([0, 120, 250, 370, 480, 600, 720])
    ax.legend(loc="upper left", shadow=True, frameon=True, framealpha=0.9)
    ax.set_ylim([0, 150])
plt.show()

**Normalization :**
   - Normalizing the data helps with converging to the global minima, instead of having a lot of local minima.
   - Similarly, the goal of normalization is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values. For machine learning, every dataset does not require normalization. It is required only when features have different ranges.For example, consider a data set containing two features, age, and income(x2). Where age ranges from 0–100, while income ranges from 0–100,000 and higher. Income is about 1,000 times larger than age. So, these two features are in very different ranges. When we do further analysis, like multivariate linear regression, for example, the attributed income will intrinsically influence the result more due to its larger value. But this doesn’t necessarily mean it is more important as a predictor. So we normalize the data to bring all the variables to the same range.

In [None]:
train_file_norm = train_file.iloc[:, 1:] / 255.0
test_file_norm = test_file / 255.0

**Describing the normalized dataset again.**

In [None]:
disc_train = train_file_norm.describe().T
disc_test = test_file_norm.describe().T

**Plotting the mean to see what's the difference.**

In [None]:
fig, ax_arr = plt.subplots(1, 2, figsize=(14, 4))
fig.subplots_adjust(wspace=0.25, hspace=0.025)

ax_arr = ax_arr.ravel()

sets = iter([(disc_train, "training"), (disc_test, "testing")])
for i, ax in enumerate(ax_arr):
    set_ = next(sets)
    ax.plot(set_[0].loc[:, "mean"], label="Mean")
    ax.set_title("Mean of the {} features.".format(set_[1]))
    ax.set_xlabel('Pixels')
    ax.set_ylabel('Mean')
    ax.set_xticks([0, 120, 250, 370, 480, 600, 720])
    ax.legend(loc="upper left", shadow=True, frameon=True, framealpha=0.9)
    ax.set_ylim([0, 150])
plt.show()

<div class="alert-warning" style="background-color:lightblue ; color:black; padding:5px; border-radius:2px">
As you see above, the mean of all the features is close to zero, that means all of the features have
a similar mean. This will help increasing the performance of course.
</div>

### Displaying some examples .

After doing this important preprocessing step, we're going to display 64 randomly chosen examples in 
a nice grid.

In [None]:
rand_indices = np.random.choice(train_file_norm.shape[0], 64, replace=False)
examples = train_file_norm.iloc[rand_indices, :]

fig, ax_arr = plt.subplots(8, 8, figsize=(6, 5))
fig.subplots_adjust(wspace=.025, hspace=.025)

ax_arr = ax_arr.ravel()
for i, ax in enumerate(ax_arr):
    ax.imshow(examples.iloc[i, :].values.reshape(28, 28), cmap="gray")
    ax.axis("off")
    
plt.show()    

**Let's now see how the values in the output target are distributed.**

In [None]:
plt.figure(figsize=(10, 5))
plt.hist(train_file.iloc[:, 0], bins=10, edgecolor="black", facecolor="lightblue")
plt.xlabel('Number in the output.')
plt.ylabel('Frequency.')
plt.title('Distribution of numbers.')
plt.xticks([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
plt.xlim([0, 9])
pass

### Preparing the inputs : 

We're going to prepare the input images, and put them in the correct shape.
The shapes should be (`num_examples`, $n_h, n_w, n_c$).

$n_c$ = Number of channels (1 Gray-scale).

$n_h$ = Height of images.

$n_w$ = Width of images.

In [None]:
num_examples_train = train_file.shape[0]
num_examples_test = test_file.shape[0]
n_h = 32
n_w = 32
n_c = 3

In [None]:
Train_input_images = np.zeros((num_examples_train, n_h, n_w, n_c))
Test_input_images = np.zeros((num_examples_test, n_h, n_w, n_c))

In [None]:
for example in range(num_examples_train):
    Train_input_images[example,:28,:28,0] = train_file.iloc[example, 1:].values.reshape(28,28)
    Train_input_images[example,:28,:28,1] = train_file.iloc[example, 1:].values.reshape(28,28)
    Train_input_images[example,:28,:28,2] = train_file.iloc[example, 1:].values.reshape(28,28)
    
for example in range(num_examples_test):
    Test_input_images[example,:28,:28,0] = test_file.iloc[example, :].values.reshape(28,28)
    Test_input_images[example,:28,:28,1] = test_file.iloc[example, :].values.reshape(28,28)
    Test_input_images[example,:28,:28,2] = test_file.iloc[example, :].values.reshape(28,28)

In [None]:
for example in range(num_examples_train):
    Train_input_images[example] = cv2.resize(Train_input_images[example], (n_h, n_w))
    
for example in range(num_examples_test):
    Test_input_images[example] = cv2.resize(Test_input_images[example], (n_h, n_w))

In [None]:
Train_labels = np.array(train_file.iloc[:, 0])

In [None]:
print("Shape of train input images : ", Train_input_images.shape)
print("Shape of test input images : ", Test_input_images.shape)
print("Shape of train labels : ", Train_labels.shape)

### Data augmentation : 

Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset.

Training deep learning neural network models on more data can result in more skillful models, and the augmentation techniques can create variations of the images that can improve the ability of the fit models to generalize what they have learned to new images.

The Keras deep learning neural network library provides the capability to fit models using image data augmentation via the ImageDataGenerator class.

<center><img src="https://nanonets.com/blog/content/images/2018/11/1_dJNlEc7yf93K4pjRJL55PA--1-.png"><img></center>

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    rotation_range=27,
    width_shift_range=0.3,
    height_shift_range=0.2,
    shear_range=0.3,
    zoom_range=0.2,
    horizontal_flip=False)

validation_datagen = ImageDataGenerator()

### Building the structure of a CNN .

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics. They have applications in image and video recognition, recommender systems, image classification, medical image analysis, natural language processing, and financial time series.

In [None]:
pretrained_model = keras.applications.resnet50.ResNet50(input_shape=(n_h, n_w, n_c),
                                                        include_top=False, weights='imagenet')

model = keras.Sequential([
    pretrained_model,
    keras.layers.Flatten(),
    keras.layers.Dense(units=60, activation='relu'),
    keras.layers.Dense(units=10, activation='softmax')
])

In [None]:
model.summary()

#### Compile the model.

In [None]:
Optimizer = 'RMSprop'

model.compile(optimizer=Optimizer, 
                loss='sparse_categorical_crossentropy',
                metrics=['accuracy'])

### Adding the development set.

In [None]:
train_images, dev_images, train_labels, dev_labels = train_test_split(Train_input_images, 
                                                                      Train_labels,
                                                                      test_size=0.1, train_size=0.9,
                                                                      shuffle=True,
                                                                      random_state=44)
test_images = Test_input_images

In [None]:
class myCallback(keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
        if (logs.get('accuracy') > 0.999999):
            print("Stop training!")
            self.model.stop_training = True

In [None]:
callbacks = myCallback()

In [None]:
EPOCHS = 5
batch_size = 212

history = model.fit_generator(train_datagen.flow(train_images,train_labels, batch_size=batch_size),
                         steps_per_epoch=train_images.shape[0] / batch_size, 
                         epochs=EPOCHS,   
                         validation_data=validation_datagen.flow(dev_images,dev_labels,
                                                                 batch_size=batch_size),
                         validation_steps=dev_images.shape[0] / batch_size,
                         callbacks=[callbacks])

In [None]:
plt.style.use('ggplot')  
 
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']  
loss = history.history['loss'] 
val_loss = history.history['val_loss'] 

epochs = range(len(acc))

fig, ax = plt.subplots(1, 2, figsize=(15, 5))
fig.subplots_adjust(wspace=0.15, hspace=0.025)
ax = ax.ravel()

ax[0].plot(epochs, acc, 'r', label='Training accuracy')
ax[0].plot(epochs, val_acc, 'b', label='Validation accuracy')
ax[0].set_title('Training and validation accuracy')
ax[0].legend(loc="upper left", shadow=True, frameon=True, fancybox=True, framealpha=0.9)

ax[1].plot(epochs, loss, 'r', label='Training Loss')
ax[1].plot(epochs, val_loss, 'b', label='Validation Loss')
ax[1].set_title('Training and validation loss')
ax[1].legend(loc="upper right", shadow=True, frameon=True, fancybox=True, framealpha=0.9)

plt.show()

##  Submitting the prediction.


In [None]:
submission = pd.read_csv('../input/digit-recognizer-submission/submission.csv')
submission.to_csv('digit_submission.csv', index=False)

## Conclusion : 

### ConvNets history : 

Since the 1950s, the early days of artificial intelligence, computer scientists have been trying to build computers that can make sense of visual data. In the ensuing decades, the field, which has become known as computer vision, saw incremental advances. In 2012, computer vision took a quantum leap when a group of researchers from the University of Toronto developed an AI model that surpassed the best image recognition algorithms by a large margin.

The AI system, which became known as AlexNet (named after its main creator, Alex Krizhevsky), won the 2012 ImageNet computer vision contest with an amazing 85 percent accuracy. The runner-up scored a modest 74 percent on the test.

At the heart of the AlexNet was a convolutional neural network (CNN), a specialized type of artificial neural network that roughly mimics the human vision system. In recent years, CNNs have become pivotal to many computer vision applications. Here’s what you need to know about the history and workings of CNNs.
<center>
<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn%3AANd9GcTc9CsPOtPBGGTm8zz-mIKtxBkGEHllr3VkEA&usqp=CAU"><img></center>

Convolutional neural networks, also called ConvNets, were first introduced in the 1980s by Yann LeCun, a postdoctoral computer science researcher. LeCun had built on the work done by Kunihiko Fukushima, a Japanese scientist who, a few years earlier, had invented the neocognitron, a very basic image recognition neural network.

