# Lab 3 Part 2 - Task 1: Parameters in CNN (5 Marks)

- For the model we have created in **Lab 3 Part 1 Exercise**: Early Stopping with Callbacks, calculate the number of parameters by hand for each layer and compare to the output of model.summary() and print the model summary.
- Then print the model summary of **Exercise 7 in Lab 1**
- Now compare the Model you created in **Exercise 7 in Lab 1**,
  - Compare the Parameters of the models

  - Compare Model Performance

In [2]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, Flatten, Dense, Dropout, MaxPooling2D, BatchNormalization
from tensorflow.keras.losses import SparseCategoricalCrossentropy
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.initializers import GlorotUniform
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.datasets import mnist, cifar10
import numpy as np

(train_data, train_labels), (test_data, test_labels) = mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


In [3]:
# Set the random seed to get same results everytime.
tf.random.set_seed(101)
np.random.seed(101)

In [4]:
# Use the first 10,000 samples of our training data as our validation set
val_data = train_data[:10000]
val_labels = train_labels[:10000]

# Use the remainder of the original training data for actual training
partial_train_data = train_data[10000:]
partial_train_labels = train_labels[10000:]

In [5]:
# Scale the pixel values so they lie in the range of 0-1
partial_train_data = partial_train_data / 255.
val_data = val_data / 255.
test_data = test_data /255.

In [6]:
# Expanding the dimensions of partial_train_data, val_data and test_data to make it 4D.
partial_train_data = np.expand_dims(partial_train_data, axis=3)
val_data = np.expand_dims(val_data, axis=3)
test_data = np.expand_dims(test_data, axis=3)

In [7]:
# Converting the labels into one hot encoded format
from tensorflow.keras.utils import to_categorical
partial_train_labels = to_categorical(partial_train_labels)
val_labels = to_categorical(val_labels)
test_labels = to_categorical(test_labels)

In [11]:
# Define a Sequential model for the CNN architecture.
model = Sequential([
    # Convolutional Layer 1
    Conv2D(filters=32,
           kernel_size=(3, 3),
           strides=1,
           padding='same',
           activation='relu',
           input_shape=(28, 28, 1)),

    # Convolutional Layer 2
    Conv2D(filters=32,
           kernel_size=(3, 3),
           strides=2,
           padding='valid',
           activation='relu'),

    # Convolutional Layer 3
    Conv2D(filters=64,
           kernel_size=(3, 3),
           strides=1,
           padding='same',
           activation='relu'),

    # Convolutional Layer 4
    Conv2D(filters=128,
           kernel_size=(3, 3),
           strides=1,
           padding='valid',
           activation='relu'),

    # Flatten Layer to convert the 2D feature maps to a 1D vector
    Flatten(),

    # Fully Connected Layer 1
    Dense(128, activation='relu'),

    # Output Layer with 10 units and softmax activation for classification
    Dense(10, activation='softmax')
])

# Compile the model, specifying the optimizer, loss function, and metrics.
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Define an early stopping callback to monitor the training process.
callback = tf.keras.callbacks.EarlyStopping(monitor='loss', patience=2)

# Train the model using the provided data and labels.
model_history = model.fit(partial_train_data,
                          partial_train_labels,
                          epochs=15,
                          batch_size=256,
                          callbacks=[callback],
                          validation_data=(val_data, val_labels),
                          verbose=1)

# Display a summary of the model's architecture.
model.summary()

# Evaluate the model on the test data
test_loss, test_acc = model.evaluate(test_data, test_labels)
print('Test Accuracy:', test_acc)

Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_4 (Conv2D)           (None, 28, 28, 32)        320       
                                                                 
 conv2d_5 (Conv2D)           (None, 13, 13, 32)        9248      
                                                                 
 conv2d_6 (Conv2D)           (None, 13, 13, 64)        18496     
                                                                 
 conv2d_7 (Conv2D)           (None, 11, 11, 128)       73856     
                                                                 
 flatten_1 (Flatten)         (None, 15488)             0         
                                                                 
 dense_5 (Dense)             (None, 128

In [9]:
# Prepare image data for a feedforward neural network (FNN):

train_images_fnn = partial_train_data.astype('float32') / 255
train_images_fnn = partial_train_data.reshape((50000, 28 * 28))

test_images_fnn = val_data.astype('float32') / 255
test_images_fnn = val_data.reshape((10000, 28 * 28))

In [10]:
# Create a Sequential FNN model
sequential_7 = Sequential()

# Define the FNN architecture
sequential_7.add(Dense(500, activation='relu', input_shape=(28 * 28,)))
sequential_7.add(Dense(100, activation='tanh', kernel_initializer='glorot_uniform'))
sequential_7.add(Dropout(0.25))
sequential_7.add(Dense(10, activation='softmax'))

# Display model summary
sequential_7.summary()

# Compile the FNN
sequential_7.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the FNN
sequential_7.fit(train_images_fnn, partial_train_labels, epochs=5, batch_size=34, verbose=1)

# Evaluate the FNN
test_loss, test_acc = sequential_7.evaluate(test_images_fnn, val_labels)
print('Test Accuracy:', test_acc)

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_2 (Dense)             (None, 500)               392500    
                                                                 
 dense_3 (Dense)             (None, 100)               50100     
                                                                 
 dropout (Dropout)           (None, 100)               0         
                                                                 
 dense_4 (Dense)             (None, 10)                1010      
                                                                 
Total params: 443610 (1.69 MB)
Trainable params: 443610 (1.69 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test Accuracy: 0.9772999882698059


#**Comparison between above two models.**

**1. FNN (Feed-Forward Neural Network):**
>- ***Model Architecture:*** The feed-forward neural network (FNN) consists of two dense (fully connected) layers with 500 and 100 units, a dropout layer, and a final dense layer with 10 output units for classification.

>- ***Total Parameters:*** The FNN has a total of 443,610 trainable parameters (approximately 1.69 MB).

>- ***Training:*** The FNN was trained for 5 epochs with a batch size of 1471. The training accuracy increased from around 93.4% in the first epoch to 98.9% in the fifth epoch.

>- ***Validation:*** The validation accuracy reached 97.73% after training.

>- ***Test Accuracy:*** The test accuracy achieved was 97.73%.

**2. CNN (Convolutional Neural Network):**
>- ***Model Architecture:*** The CNN consists of four convolutional layers with various filter sizes, max-pooling layers, dropout layers, a flatten layer, a dense layer with 128 units, and a final dense layer with 10 output units for classification.

>- ***Total Parameters:*** The CNN has a total of 2,085,802 trainable parameters (approximately 7.96 MB).

>- ***Training:*** The CNN was trained for 15 epochs with a batch size of 196. The training accuracy increased from 90.81% in the first epoch to 99.95% in later epochs.

>- ***Validation:*** The validation accuracy reached 99.04% after training.

>- ***Test Accuracy:*** The test accuracy achieved was 99.06%.


1. The CNN architecture achieved higher accuracy on both the training and validation sets compared to the FNN.

2. The CNN's ability to capture spatial features in the data contributed to better performance.

3. The CNN also has more parameters due to the convolutional layers, which can handle complex patterns in the data.

4. The FNN, while still achieving a high accuracy, had fewer parameters and might be more suitable for simpler datasets.


In Summary, the CNN performs better in terms of accuracy which comes with higher computational costs due to the number of parameters being used.

# Lab 3 Part 2 - Task 2: CIFAR-10 Challenge (10 Marks)

In this lab you will experiment with whatever ConvNet architecture/design you'd like on [CIFAR-10 image dataset](https://www.cs.toronto.edu/~kriz/cifar.html).


## Exercise  1: Creating the network

**Goal:** After training, your model should achieve **at least 80%** accuracy on a **validation** set within 20 epochs. (Or as close as possible as long as there is demonstrated effort to achieve this goal.)

**Data split** The training set should consist of 40000 images, the validation set should consist of 10000 images, and the test set should consist of the remaining 10000 images. **Please use the Keras `load_data()` function to import the data set.**


### Some things you can try:
- Different number/type of layers
- Different filter sizes
- Adjust the number of filters used in any given layer
- Try various pooling strategies
- Consider using batch normalization
- Check if adding regularization helps
- Consider alternative optimizers
- Try different activation functions


### Tips for training
When building/tuning your model, keep in mind the following points:

- This is experimental, so be driven by results achieved on the validation set as opposed to what you have heard/read works well or doesn't
- If the hyperparameters are working well, you should see improvement in the loss/accuracy within approximately one epoch
- For hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all
- Once you have found some sets of hyperparameters that seem to work, search more finely around these parameters. You may need to train for more epochs.
- Prefer random search to grid search for hyperparameters
- You should use the validation set for hyperparameter search and for evaluating different architectures
- The test set should only be used at the very end to evaluate your final model


In [12]:
(train_data, train_labels), (test_data, test_labels) = cifar10.load_data()

# Use the first 10,000 samples of our training data as our validation set
val_data_cifar = train_data[:10000]
val_labels_cifar = train_labels[:10000]

# Use the remainder of the original training data for actual training
train_data_cifar = train_data[10000:]
train_labels_cifar = train_labels[10000:]

# Scale the pixel values so they lie in the range of 0-1
train_data_cifar = train_data_cifar/ 255.
val_data_cifar = val_data_cifar / 255.
test_data_cifar = test_data /255.

print(train_data_cifar.shape)
print(val_data_cifar.shape)
print(test_data_cifar.shape)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
(40000, 32, 32, 3)
(10000, 32, 32, 3)
(10000, 32, 32, 3)


In [13]:
# convert the labels to categorical data

train_labels_cifar = to_categorical(train_labels_cifar)
val_labels_cifar = to_categorical(val_labels_cifar)
test_labels_cifar = to_categorical(test_labels)

print(train_labels_cifar.shape)
print(val_labels_cifar.shape)
print(test_labels_cifar.shape)

(40000, 10)
(10000, 10)
(10000, 10)


In [25]:
# Create a Sequential model for the CNN architecture
model_cifar = Sequential([
    # Convolutional Layer 1
    Conv2D(filters=256, kernel_size=(3, 3), strides=1, padding='same', activation='relu', input_shape=(32, 32, 3)),
    BatchNormalization(),

    # Convolutional Layer 2
    Conv2D(filters=128, kernel_size=(3, 3), strides=1, padding='same', activation='relu'),
    MaxPooling2D((2, 2)),
    Dropout(0.2),

    # Convolutional Layer 3
    Conv2D(filters=64, kernel_size=(5, 5), strides=1, padding='same', activation='sigmoid'),
    BatchNormalization(),

    # Convolutional Layer 4
    Conv2D(filters=64, kernel_size=(5, 5), strides=1, padding='same', activation='relu'),
    MaxPooling2D((2, 2)),
    Dropout(0.35),

    # Convolutional Layer 5
    Conv2D(filters=32, kernel_size=(7, 7), strides=1, padding='valid', activation='relu'),
    BatchNormalization(),

    # Flatten Layer to convert the 2D feature maps to a 1D vector
    Flatten(),

    # Fully Connected Layer 1
    Dense(256, activation='sigmoid'),
    Dropout(0.5),

    # Output Layer with 10 units and softmax activation for classification
    Dense(10, activation='softmax')
])

# Compile the model
model_cifar.compile(optimizer=Adam(learning_rate=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])

# Define an early stopping callback
callback = EarlyStopping(monitor='val_loss', patience=10)

# Train the model using the provided data and labels
model_history_cifar = model_cifar.fit(train_data_cifar, train_labels_cifar, batch_size=150,
                                      epochs=20,
                                      validation_data=(val_data_cifar, val_labels_cifar),
                                      callbacks=[callback],
                                      verbose=1)

# Display a summary of the model's architecture
model_cifar.summary()

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20
Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_43 (Conv2D)          (None, 32, 32, 256)       7168      
                                                                 
 batch_normalization_23 (Ba  (None, 32, 32, 256)       1024      
 tchNormalization)                                               
                                                                 
 conv2d_44 (Conv2D)          (None, 32, 32, 128)       295040    
                                                                 
 max_pooling2d_17 (MaxPooli  (None, 16, 16, 128)       0         
 ng2D)                                                           
                    

In [26]:
# Evaluate the model on the test data
test_loss_cifar, test_acc_cifar = model_cifar.evaluate(test_data_cifar, test_labels_cifar)
print('Test Accuracy:', test_acc_cifar)

Test Accuracy: 0.8077999949455261


## Exercise 2: Describe What you did

All the work you did leading up to your final model should be summarized in this section. This should be a logical and well-organized summary of the various experiments that were tried in **Lab 3 Part 2 - Task 2:Exercise 1**, and should be captured in **table format**. Upon reading this section I should understand what you tried, the reasoning behind trying it, any quantitative values that correspond to what you tried, and the results.

See [this guide](https://www.datacamp.com/community/tutorials/markdown-in-jupyter-notebook) for how to format markdown cells in Jupyter notebooks.

# Summary

- Our main objective was to achieve at least 80% accuracy on validation data, And it was to be done under the thresold of 20 epochs. To achieve this we have tried and trained models using different different achitectural designs. Here, are our observations based on those tries.

- We started with our base model as our model from LAB 3 part 1. This model had 4 convolution layers in it with 32, 32, 64 and 128 filters respectively. Then, after flattening the data we added two dense layers one of which was output layer. We used the ReLU activation function for all layers except the output layer, which used softmax. The model used the 'rmsprop' optimizer and 'categorical_crossentropy' as the loss function. We achieved 63% of validation activity using our base model.



---
| Experiment | Changes | Reasoning behind the changes | Validation Accuracy |
| --- | --- | --- | --- |
| 1 | Increased number of filters | This helps model to understand complex patterns | 69.3%  |
| 2 | Added fifth convolution layer | Improves feature extraction and model complexity | 72.2% |
| 3 | Added Batch Normalization | Ensure stable convergence and speeds up the learning process of model | 73.4% |
| 4 | Added MaxPooling2D layers | Reduce overfitting and capture Distinctive and dominant features | 74% |
| 5 | Added Dropout layers | Futher reduces overfitting and improves genralization | 74.6% |
| 6 | Tried Different dropout rates | Model's training accuracy was constantly increasing but validation accuracy was stuck as 74% | 75% |
| 7 | Changed some padings to ***valid*** | same padding preserves the spatial imformation and valid padding helps in lowering the computational costs | 76% |
| 8 | Implemented EarlyStopping | Prevent overfitting and terminate training at the optimal point if it reaches there before reaching the last epoch | 76.1% |
| 9 | Changed optimizer to Adam with 0.0001 learning rate | Adam has benifits of both RMSprop and momentum methods (minimizes cost) also faster trainnig. We also tried RMSprop, SGD. | 77.9% |
| 10 | Tried Different number of epochs and batch sizes | Helps in balancing speed, generalization, and overall model performance | 79.1% |
| 11 | Changed some of the activation functions to sigmoid from relu | Sigmoid helps in image classification problems | 80%|
| 12 | Changed shape of filters (kernal_size) | This helps in identifying different sized patterns, small filter identifies small patterns and big identifies big patterns| 81.5% |

---


>
- Our model achived 80.78% accuracy on test data as well, this shows us that our model has understood the patterns and working fine with unseen data.

- As the above table suggests, there are many factors which helps in improving the model. The validation accuracies are not exact values but these values suggests the impact of the change which were made.

- The ***change in filter shape*** and **adding sigmoid activation** function was the major breakthrough for us as we were stuck on 78-79% validation accuracy. But after changing those two we were constantly getting validation accuracy above 80%.






