## Assignment 9

## 1. What are the advantages of a CNN for image classification over a completely linked DNN?

Ans=>

One of the main advantages of CNNs for image classification over a fully connected DNN is that CNNs have a smaller number of parameters. This reduces the risk of overfitting and makes the model easier to train. Additionally, CNNs use a local receptive field, which allows them to capture the spatial structure of the input image. This is particularly useful for image classification, as objects in an image can appear at different scales and positions.

## 2. Consider a CNN with three convolutional layers, each of which has three kernels, a stride of two, and SAME padding. The bottom layer generates 100 function maps, the middle layer 200, and the top layer 400. RGB images with a size of 200 x 300 pixels are used as input. How many criteria does the CNN have in total? How much RAM would this network need when making a single instance prediction if we're using 32-bit floats? What if you were to practice on a batch of 50 images?


Ans=>

The number of parameters in each convolutional layer can be calculated as follows:

- Number of parameters in one kernel = (width * height * number of input channels + 1) * number of output channels
- Number of parameters in one layer = number of kernels * number of parameters in one kernel
- Total number of parameters in the three layers = (number of parameters in one layer) * 3

Assuming a kernel size of 3 x 3, the number of parameters in one kernel of the bottom layer is (3 * 3 * 3 + 1) * 100 = 2700. The number of parameters in one layer is 100 * 2700 = 270,000.

The number of parameters in one kernel of the middle layer is (3 * 3 * 100 + 1) * 200 = 18,000. The number of parameters in one layer is 200 * 18,000 = 3,600,000.

The number of parameters in one kernel of the top layer is (3 * 3 * 200 + 1) * 400 = 72,000. The number of parameters in one layer is 400 * 72,000 = 28,800,000.

The total number of parameters in the three layers is 270,000 + 3,600,000 + 28,800,000 = 32,670,000.

For a single instance prediction, the network would need approximately 32,670,000 * 4 bytes/parameter = 130,680,000 bytes of RAM to store the parameters, which is approximately 123 MB.

For a batch of 50 images, the network would need approximately 50 * 123 MB = 6.15 GB of RAM to store the parameters. Note that this calculation only considers the memory required for storing the parameters and doesn't take into account the memory required for storing intermediate activations during the forward pass.

## 3. What are five things you might do to fix the problem if your GPU runs out of memory while training a CNN?



Ans=>

If your GPU runs out of memory while training a CNN, here are five things you can try:

- Reduce the batch size, as this will reduce the memory required for each forward and backward pass.
- Reduce the complexity of the network architecture, such as reducing the number of filters or the size of the filters.
- Use half-precision (float16) data types instead of single-precision (float32) to reduce memory usage.
- Use gradient checkpointing to reduce the memory required to store intermediate activations.
- If possible, use a GPU with more memory.

## 4. Why would you use a max pooling layer instead with a convolutional layer of the same stride?


Ans=>

Max pooling is often used with a convolutional layer of the same stride to reduce the spatial dimensions of the feature map, resulting in a reduced computation time and reduced risk of overfitting. Additionally, max pooling can also help to preserve the important features in the feature map.

## 5. When would a local response normalization layer be useful?


Ans=>

A local response normalization layer can be useful in cases where the input images have large variations in contrast and intensity. The normalization layer helps to reduce the impact of these variations and increase the stability of the network.

## 6. In comparison to LeNet-5, what are the main innovations in AlexNet? What about GoogLeNet and ResNet's core innovations?


Ans=>

Some other potential use cases for local response normalization layers include improving the robustness of the network to small changes in the input, reducing the sensitivity of the network to noise, and improving the generalization ability of the network to unseen data.

## 7. On MNIST, build your own CNN and strive to achieve the best possible accuracy.

Ans=>



In [1]:
import tensorflow as tf
from tensorflow import keras

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Preprocess the data
x_train = x_train.reshape(-1, 28, 28, 1) / 255.0
x_test = x_test.reshape(-1, 28, 28, 1) / 255.0

# Convert the labels to one-hot encoding
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# Build the model
model = keras.Sequential([
    keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Conv2D(64, (3, 3), activation='relu'),
    keras.layers.MaxPooling2D((2, 2)),
    keras.layers.Flatten(),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train, batch_size=64, epochs=10, validation_split=0.1)

# Evaluate the model
test_loss, test_acc = model.evaluate(x_test, y_test)
print('Test accuracy:', test_acc)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test accuracy: 0.9926999807357788


## 8. Using Inception v3 to classify broad images.

a.Images of different animals can be downloaded. Load them in Python using the matplotlib.image.mpimg.imread() or scipy.misc.imread() functions, for example. Resize and/or crop them to 299 x 299 pixels, and make sure they only have three channels (RGB) and no transparency. The photos used to train the Inception model were preprocessed to have values ranging from -1.0 to 1.0, so make sure yours do as well.

Ans=>

- Split the images into training and validation sets. The validation set will be used to monitor overfitting, or the extent to which the model is adapting too well to the training data, and not generalizing well enough to new data.

- Use the pre-trained Inception v3 model, which is available in TensorFlow's Keras library, to classify the images. The pre-trained model can be used as a feature extractor, or fine-tuned to adapt to the new task of animal classification. When using the model as a feature extractor, the final dense layer is replaced with a new dense layer trained on the new task.

- Train the model on the training set, using data augmentation to generate more training data by flipping, rotating, and zooming images. This helps to reduce overfitting and increase the model's robustness.

- Evaluate the model on the validation set, and monitor the accuracy and loss to determine if the model is overfitting or underfitting. If overfitting, try reducing the complexity of the model by removing layers or increasing the dropout rate. If underfitting, try increasing the complexity of the model by adding layers or increasing the number of neurons.

- Once satisfied with the model's performance on the validation set, use it to make predictions on new images of animals, or use it to classify new images in a batch.

## 9. Large-scale image recognition using transfer learning.
a. Make a training set of at least 100 images for each class. You might, for example, identify your own photos based on their position (beach, mountain, area, etc.) or use an existing dataset, such as the flowers dataset or MIT's places dataset (requires registration, and it is huge).

b. Create a preprocessing phase that resizes and crops the image to 299 x 299 pixels while also adding some randomness for data augmentation.

c. Using the previously trained Inception v3 model, freeze all layers up to the bottleneck layer (the last layer before output layer) and replace output layer with  appropriate number of outputs for your new classification task (e.g., the flowers dataset has five mutually exclusive classes so the output layer must have five neurons and use softmax activation function).

d. Separate the data into two sets: a training and a test set. The training set is used to train the model, and the test set is used to evaluate it.



Ans=>

- Train the model for a number of epochs (iterations over the entire dataset) using a suitable optimizer, such as Adam or SGD, and a suitable loss function, such as categorical cross-entropy. You can also experiment with different batch sizes, learning rates, and other hyperparameters to see if they affect the model's accuracy.

- Evaluate the model's performance on the test set by measuring its accuracy, precision, recall, and F1 score. You can also create confusion matrices and ROC curves to visualize the model's performance.

- Fine-tune the model by unfreezing some of the layers and retraining them. You can experiment with unfreezing different numbers of layers and different parts of the network to see if it improves performance.

- Finally, use the trained model to make predictions on new images and evaluate its accuracy. You can also use the model to generate predictions for all images in the dataset and compare them with the actual labels to see if it's working well.

## ----------------------------------------------------------------------------------------------------------------------------------