# Practical 3

## Image Classification using Convolutional Neural Network 

In this practical, we will see how we can use a Convolutional Neural Network to images of hands making letters in [American Sign Language](http://www.asl.gs/)

The [American Sign Language alphabet](http://www.asl.gs/) contains 26 letters. Two of those letters (j and z) require movement, so they are not included in the training dataset.  

<img src="./Lesson3_data/asl.png" style="width: 600px;">

The sign language dataset is in [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) (Comma Separated Values) format, the same data structure behind Microsoft Excel and Google Sheets. Run the below code to load the data.

In [1]:
!unzip Lesson3_data.zip

Archive:  Lesson3_data.zip
  inflating: a.png                   
  inflating: asl.png                 
  inflating: b.png                   
  inflating: conv2d.png              
  inflating: dropout.png             
  inflating: maxpool2d.png           
  inflating: sign_mnist_train.csv    
  inflating: sign_mnist_valid.csv    


In [2]:
import tensorflow.keras as keras
import pandas as pd

# Load in our data from CSV files
train_df = pd.read_csv("Lesson3_data/sign_mnist_train.csv")
valid_df = pd.read_csv("Lesson3_data/sign_mnist_valid.csv")

# Separate out our target values
y_train = train_df['label']
y_valid = valid_df['label']
del train_df['label']
del valid_df['label']

# Separate out our image vectors
x_train = train_df.values
x_valid = valid_df.values

# Turn our scalar targets into binary categories
num_classes = 24
y_train = keras.utils.to_categorical(y_train, num_classes)
y_valid = keras.utils.to_categorical(y_valid, num_classes)

# Normalize our image data
x_train = x_train / 255
x_valid = x_valid / 255

Lets take a look at how the data is formatted.

In [3]:
x_train.shape, x_valid.shape

((27455, 784), (7172, 784))

In this format, we don't have all the information about which pixels are near each other. Because of this, we can't apply convolutions that will detect features. Let's reshape our dataset so that they are in a 28x28 pixel format. This will allow our convolutions to associate groups of pixels and detect important features.

That means that we need to convert the current shape `(27455, 784)` to `(27455, 28, 28, 1)`. As a convenience, we can pass the [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html#numpy.reshape) method a `-1` for any dimension we wish to remain the same, therefore:

In [4]:
x_train = x_train.reshape(-1,28,28,1)
x_valid = x_valid.reshape(-1,28,28,1)

x_train.shape, x_valid.shape

((27455, 28, 28, 1), (7172, 28, 28, 1))

Let's create the convolutional neural network for classifying the images as below. We have covered many of the different kinds of layers in the lecture. Run the code below after specifying the loss fucntion for the output layer.

In [5]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import (
    Dense,
    Conv2D,
    MaxPool2D,
    Flatten,
    Dropout,
    BatchNormalization,
)

model = Sequential()
model.add(Conv2D(75, (3, 3), strides=1, padding="same", activation="relu", 
                 input_shape=(28, 28, 1)))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Conv2D(50, (3, 3), strides=1, padding="same", activation="relu"))
model.add(Dropout(0.2))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Conv2D(25, (3, 3), strides=1, padding="same", activation="relu"))
model.add(BatchNormalization())
model.add(MaxPool2D((2, 2), strides=2, padding="same"))
model.add(Flatten())
model.add(Dense(units=512, activation="relu"))
model.add(Dropout(0.3))
model.add(Dense(units=num_classes, activation="softmax")) # fill this

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


<img src="./Lesson3_data/conv2d.png" width=300 />

These are our 2D convolutional layers. Small kernels will go over the input image and detect features that are important for classification. Earlier convolutions in the model will detect simple features such as lines. Later convolutions will detect more complex features. Let's look at our first Conv2D layer:
```Python
model.add(Conv2D(75 , (3,3) , strides = 1 , padding = 'same'...)
```
75 refers to the number of filters that will be learned. (3,3) refers to the size of those filters. Strides refer to the step size that the filter will take as it passes over the image. Padding refers to whether the output image that's created from the filter will match the size of the input image. 

Like normalizing our inputs, batch normalization scales the values in the hidden layers to improve training. [Read more about it in detail here](https://blog.paperspace.com/busting-the-myths-about-batch-normalization/). 

<img src="./Lesson3_data/maxpool2d.png" width=300 />
Max pooling takes an image and essentially shrinks it to a lower resolution. It does this to help the model be robust to translation (objects moving side to side), and also makes our model faster.

Let us look at the model summary.

In [6]:
model.summary()

Let us save this initial model as 'initial_model' in case we want to train it again later. Write some code to do this.

<details><summary>Click here for answer</summary>

```python
model.save('initial_model.keras')
```

</details>

In [7]:
# insert code here
model.save('initial_model.keras')

We cam now compile the model. Fill in the correct loss function for this problem.

In [10]:
model.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=["accuracy"]) # fill in the necessary

Write code to start the training of the model for 10 epochs.

<br/>
<details>
<summary>Click here for answer</summary>

```
model.fit(x_train, y_train, epochs=10, verbose=1, validation_data=(x_valid, y_valid))

```

</br>

In [11]:
#insert code here
model.fit(x_train, y_train, epochs=10, verbose=1, validation_data=(x_valid, y_valid))

Epoch 1/10
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 18ms/step - accuracy: 0.7735 - loss: 0.7780 - val_accuracy: 0.9156 - val_loss: 0.2735
Epoch 2/10
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 18ms/step - accuracy: 0.9921 - loss: 0.0238 - val_accuracy: 0.9085 - val_loss: 0.2955
Epoch 3/10
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 18ms/step - accuracy: 0.9963 - loss: 0.0104 - val_accuracy: 0.9541 - val_loss: 0.1740
Epoch 4/10
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m16s[0m 19ms/step - accuracy: 0.9980 - loss: 0.0057 - val_accuracy: 0.9343 - val_loss: 0.2755
Epoch 5/10
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 18ms/step - accuracy: 0.9985 - loss: 0.0043 - val_accuracy: 0.9402 - val_loss: 0.2094
Epoch 6/10
[1m858/858[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 17ms/step - accuracy: 0.9989 - loss: 0.0024 - val_accuracy: 0.9667 - val_loss: 0.1235
Epoch 7/10
[1m8

<keras.src.callbacks.history.History at 0x13783f880>

One way we can try to improve the model is to perform data augmentation.

Keras comes with an image augmentation class called `ImageDataGenerator`. We recommend checking out the [documentation here](https://www.tensorflow.org/api_docs/python/tf/keras/preprocessing/image/ImageDataGenerator). It accepts a series of options for augmenting your data. Lets take a look at the options we've selected below, and then execute the cell to create an instance of the class:

In [12]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rotation_range=10,  # randomly rotate images in the range (degrees, 0 to 180)
    zoom_range=0.1,  # Randomly zoom image
    width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
    height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
    horizontal_flip=True,  # randomly flip images horizontally
    vertical_flip=False, # Don't randomly flip images vertically
)  

Why would we want to flip images horizontally, but not vertically?

We can now fit the generator on the training dataset. 

In [13]:
datagen.fit(x_train)


Write code to load the initial model as model2 and compile it as before.

<br/>
<details>
<summary>Click here for answer</summary>

```
model2 = keras.models.load_model('initial_model.keras')
model2.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=["accuracy"])

```

</br>

In [14]:
#insert code here
model2 = keras.models.load_model('initial_model.keras')
model2.compile(loss="categorical_crossentropy", optimizer="rmsprop", metrics=["accuracy"])

In [15]:
model2.summary()

When using an image data generator with Keras, a model trains a bit differently: instead of just passing the `x_train` and `y_train` datasets into the model, we pass the generator in, calling the generator's [flow](https://keras.io/api/preprocessing/image/) method. This causes the images to get augmented live and in memory right before they are passed into the model for training.

Generators can supply an indefinite amount of data, and when we use them to train our data, we need to explicitly set how long we want each epoch to run, or else the epoch will go on indefinitely, with the generator creating an indefinite number of augmented images to provide the model.

We explicitly set how long we want each epoch to run using the `steps_per_epoch` named argument. Because `steps * batch_size = number_of_images_trained in an epoch` a common practice, that we will use here, is to set the number of steps equal to the non-augmented dataset size divided by the batch_size (which we set to 32).

Run the following cell to see the results. 

In [16]:
import math
batch_size = 32

steps_per_epoch=len(x_train)//batch_size
print(steps_per_epoch)


857


In [17]:
img_iter = datagen.flow(x_train, y_train, batch_size=batch_size)

model2.fit(img_iter,
          epochs=10,
          steps_per_epoch=len(x_train)//batch_size, # Run same number of steps we would if we were not using a generator.
          validation_data=(x_valid, y_valid))

  self._warn_if_super_not_called()


ImportError: This requires the scipy module. You can install it via `pip install scipy`

We can use our model to make predictions on new images. It will be useful to show the image as well. We can use the matplotlib library to do this.

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

def show_image(image_path):
    image = mpimg.imread(image_path)
    plt.imshow(image, cmap='gray')

In [None]:
show_image('Lesson3_data/b.png')

The images in our dataset were 28x28 pixels and grayscale. We need to make sure to pass the same size and grayscale images into our method for prediction. There are a few ways to edit images with Python, but Keras has a built-in utility that works well. 

In [None]:
from tensorflow.keras.preprocessing import image as image_utils

def load_and_scale_image(image_path):
    image = image_utils.load_img(image_path, color_mode="grayscale", target_size=(28,28))
    return image

In [None]:
image = load_and_scale_image('Lesson3_data/b.png')
plt.imshow(image, cmap='gray')

Now that we have a 28x28 pixel grayscale image, we're close to being ready to pass it into our model for prediction. First we need to reshape our image to match the shape of the dataset the model was trained on. Before we can reshape, we need to convert our image into a more rudimentary format. We'll do this with a keras utility called image_to_array. Then we will reshape and normalize the array.

In [None]:
image = image_utils.img_to_array(image)

# This reshape corresponds to 1 image of 28x28 pixels with one color channel
image = image.reshape(1,28,28,1) 

image = image / 255

We are now ready to make the prediction

In [None]:
import numpy as np
prediction = model.predict(image)
print(prediction)
print(np.argmax(prediction))