**1. What are the advantages of a CNN for image classification over a completely linked DNN?**

Convolutional Neural Networks (CNNs) have several advantages over fully connected Deep Neural Networks (DNNs) for image classification:

- **Parameter Sharing**: A feature detector (like a kernel or filter) in CNNs learns a specific feature and can be applied across the entire image. This drastically reduces the number of parameters, making the model more efficient and less prone to overfitting.
  
- **Spatial Hierarchies**: CNNs inherently recognize spatial hierarchies in data. Lower layers might detect edges, while deeper layers might detect more complex structures. This hierarchical feature extraction isn't present in regular DNNs.
  
- **Translation Invariance**: Once a CNN learns a feature in one part of an image, it can recognize it in any other part. This property is extremely beneficial for image data.
  
- **Reduced Computational Needs**: Due to pooling and shared parameters, CNNs often require fewer computational resources than DNNs for equivalent tasks.
  
- **Better Handling of Image Data**: CNNs handle varying image sizes better than DNNs, which require a fixed input size.

**2. Consider a CNN with three convolutional layers, each of which has three kernels, a stride of two, and SAME padding. The bottom layer generates 100 feature maps, the middle layer 200, and the top layer 400. RGB images with a size of 200 x 300 pixels are used as input. How many parameters does the CNN have in total? How much RAM would this network need when making a single instance prediction if we're using 32-bit floats? What if you were to practice on a batch of 50 images?**

Given:
- CNN with three convolutional layers.
- Each convolutional layer has three kernels.
- Stride of two and SAME padding for each layer.
- 1st layer (bottom layer) generates 100 feature maps.
- 2nd layer (middle layer) generates 200 feature maps.
- 3rd layer (top layer) generates 400 feature maps.
- Input: RGB images of size 200 x 300 pixels.

Steps:
1. Calculate the number of parameters for each layer.
2. Compute the total number of parameters.
3. Compute the RAM required for a single instance prediction using 32-bit floats.
4. Calculate the RAM required for a batch of 50 images.

In [2]:
# Given values
filter_width = 3
filter_height = 3
input_channels_layer1 = 3 # RGB channels
num_filters_layer1 = 100
num_filters_layer2 = 200
num_filters_layer3 = 400

# Calculate parameters for each layer
params_layer1 = (filter_width * filter_height * input_channels_layer1 + 1) * num_filters_layer1
params_layer2 = (filter_width * filter_height * num_filters_layer1 + 1) * num_filters_layer2
params_layer3 = (filter_width * filter_height * num_filters_layer2 + 1) * num_filters_layer3

# Total parameters
total_params = params_layer1 + params_layer2 + params_layer3

# RAM for a single instance prediction (32-bit floats, 4 bytes each)
ram_single_instance = total_params * 4  # in bytes

# RAM for a batch of 50 images
ram_batch_50 = ram_single_instance * 50  # in bytes

total_params, ram_single_instance, ram_batch_50


(903400, 3613600, 180680000)


**2. Results for the given CNN architecture:**

- **Total Parameters**: 903,400
- **RAM required for a single instance prediction (using 32-bit floats)**: 3,613,600 bytes (or approximately 3.61 MB)
- **RAM required for a batch of 50 images**: 180,680,000 bytes (or approximately 180.68 MB)


**3. What are five things you might do to fix the problem if your GPU runs out of memory while training a CNN?**

- **Reduce Batch Size**: One of the most straightforward methods to reduce GPU memory consumption is to decrease the batch size.
  
- **Gradient Accumulation**: Instead of updating weights after every small batch, gradients are accumulated over multiple small batches and updated less frequently.
  
- **Model Pruning**: This involves removing neurons or entire layers that contribute little to the final prediction, reducing the model's size.
  
- **Use Checkpoints**: Save model weights periodically and clear GPU memory, then resume training from the last checkpoint.
  
- **Model Quantization**: This process reduces the precision of the model's parameters, which can save memory without significant loss in performance.

**4. Why would you use a max pooling layer instead of a convolutional layer with the same stride?**

Max pooling layers are used to downsample the spatial dimensions of the input, reducing the computational burden and number of parameters. Advantages include:
  
- **Parameter Reduction**: Unlike convolutional layers, max pooling doesn't introduce new parameters.
  
- **Translation Invariance**: Max pooling can make the model more robust to slight translations or distortions in the input.
  
- **Reduction in Spatial Dimensions**: This can help to focus on the most important features in the input data.

**5. When would a local response normalization layer be useful?**

Local Response Normalization (LRN) layers were once popular in the early days of CNNs. They normalize neuron activities in a way that promotes lateral inhibition, emphasizing some activation over others. They can be useful:
  
- When trying to make the activations of a neuron relative to its neighboring neurons.
- In scenarios where the model is overfitting, as normalization can introduce a form of regularization.
  
However, it's worth noting that LRN has become less popular with the advent of Batch Normalization, which normalizes neuron activities across batches and has proven to be more effective in many cases.

**6. In comparison to LeNet-5, what are the main innovations in AlexNet? What about GoogLeNet and ResNet's core innovations?**

- **AlexNet**:
  - Deeper architecture with more layers.
  - Use of ReLU activation function for faster training.
  - Implementation of dropout layers for regularization.
  - GPU implementation to handle the increased computational demand.
  
- **GoogLeNet**:
  - Introduction of the "Inception" module, which allows the network to choose between various convolutional kernel sizes and pooling operations.
  - Use of 1x1 convolutions to reduce the number of parameters.
  - No use of fully connected layers, thus reducing parameters.
  
- **ResNet**:
  - Introduction of "skip connections" or "residual connections" to allow gradients to flow through the network, mitigating the vanishing gradient problem in deep networks.
  - Ability to train extremely deep networks (e.g., ResNet-152) by leveraging these residual connections.

**7. On MNIST, build your own CNN and strive to achieve the best possible accuracy.**

In [3]:
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# 1. Load and preprocess the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# 2. Define the CNN architecture
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

# 3. Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# 4. Train the model
model.fit(train_images, train_labels, epochs=10, batch_size=64, validation_split=0.2)

# 5. Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)
print(f'Test accuracy: {test_acc}')


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test accuracy: 0.991100013256073


**8. Using Inception v3 to classify broad images...**

- Images of different animals can be downloaded. 
- Load them in Python using the matplotlib.image.mpimg.imread() or scipy.misc.imread() functions, for example. 
- Resize and/or crop them to 299 x 299 pixels, and make sure they only have three channels (RGB) and no transparency.
- The photos used to train the Inception model were preprocessed to have values ranging from -1.0 to 1.0, so make sure yours do as well.

In [4]:
import tensorflow as tf
import tensorflow_datasets as tfds
from tensorflow.keras.applications.inception_v3 import InceptionV3
from tensorflow.keras.applications.inception_v3 import preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
import numpy as np

# 1. Load the CIFAR-10 dataset
(ds_train, ds_test), ds_info = tfds.load(
    'cifar10',
    split=['train', 'test'],
    shuffle_files=True,
    as_supervised=True,
    with_info=True,
)

# 2. Extract animal images (e.g., cats, which are class index 3 in CIFAR-10)
animal_class_index = 3
animal_images = [img for img, label in tfds.as_numpy(ds_train) if label == animal_class_index]

# 3. Resize and preprocess these images for Inception v3 (which expects 299x299 images)
animal_images_resized = [tf.image.resize(img, [299, 299]) for img in animal_images]
animal_images_processed = [preprocess_input(img) for img in animal_images_resized]

# 4. Use the Inception v3 model for classification
model = InceptionV3(weights='imagenet')

predictions = model.predict(np.array(animal_images_processed))
decoded_predictions = decode_predictions(predictions, top=3)  # Directly pass the batched predictions

# Print the top 3 predictions for each image
for i, pred in enumerate(decoded_predictions):
    print(f"Image {i + 1}:")
    for imagenet_id, label, score in pred:
        print(f"{label} ({score:.2f})")


Downloading data from https://storage.googleapis.com/download.tensorflow.org/data/imagenet_class_index.json
Image 1:
marmoset (0.34)
Windsor_tie (0.15)
fox_squirrel (0.02)
Image 2:
lotion (0.39)
nipple (0.12)
whiskey_jug (0.08)
Image 3:
tabby (0.30)
Egyptian_cat (0.15)
tiger_cat (0.11)
Image 4:
red_wolf (0.17)
lesser_panda (0.11)
kit_fox (0.11)
Image 5:
Japanese_spaniel (0.24)
Border_collie (0.05)
EntleBucher (0.03)
Image 6:
web_site (0.23)
screen (0.22)
book_jacket (0.07)
Image 7:
nipple (0.22)
redbone (0.19)
golden_retriever (0.07)
Image 8:
Persian_cat (0.99)
tabby (0.00)
tiger_cat (0.00)
Image 9:
Persian_cat (0.30)
milk_can (0.21)
Brabancon_griffon (0.04)
Image 10:
cougar (0.98)
jaguar (0.00)
lion (0.00)
Image 11:
schipperke (0.31)
ocarina (0.08)
curly-coated_retriever (0.08)
Image 12:
milk_can (0.07)
Persian_cat (0.04)
Madagascar_cat (0.03)
Image 13:
chiffonier (0.24)
pool_table (0.20)
desk (0.08)
Image 14:
whiskey_jug (0.14)
lotion (0.13)
nipple (0.05)
Image 15:
Japanese_spaniel (

**9. Large-scale image recognition using transfer learning...**

a. Make a training set of at least 100 images for each class. You might, for example, identify your
own photos based on their position (beach, mountain, area, etc.) or use an existing dataset, such as
the flowers dataset or MIT&#39;s places dataset (requires registration, and it is huge).

b. Create a preprocessing phase that resizes and crops the image to 299 x 299 pixels while also
adding some randomness for data augmentation.

c. Using the previously trained Inception v3 model, freeze all layers up to the bottleneck layer (the
last layer before output layer) and replace output layer with appropriate number of outputs for
your new classification task (e.g., the flowers dataset has five mutually exclusive classes so the
output layer must have five neurons and use softmax activation function).

d. Separate the data into two sets: a training and a test set. The training set is used to train the
model, and the test set is used to evaluate it.

In [5]:
import tensorflow as tf
import tensorflow_datasets as tfds

# Step 1: Load the Flowers dataset
(raw_train, raw_validation, raw_test), metadata = tfds.load(
    'tf_flowers',
    split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'],
    with_info=True,
    as_supervised=True,
)

# Step 2: Preprocessing
IMG_SIZE = 299  # All images will be resized to 299x299

def format_example(image, label):
    image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
    # Data augmentation - Random flipping
    image = tf.image.random_flip_left_right(image)
    image = tf.image.random_flip_up_down(image)
    # Normalize to [-1,1]
    image = (image / 127.5) - 1
    return image, label

train = raw_train.map(format_example)
validation = raw_validation.map(format_example)
test = raw_test.map(format_example)

# Batching and shuffling
BATCH_SIZE = 32
SHUFFLE_BUFFER_SIZE = 1000

train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)
validation_batches = validation.batch(BATCH_SIZE)
test_batches = test.batch(BATCH_SIZE)

# Step 3: Transfer Learning with Inception v3
base_model = tf.keras.applications.InceptionV3(include_top=False, weights='imagenet', input_shape=(IMG_SIZE, IMG_SIZE, 3))
base_model.trainable = False  # Freeze all layers of the base model

# Create new model on top
global_average_layer = tf.keras.layers.GlobalAveragePooling2D()(base_model.output)
prediction_layer = tf.keras.layers.Dense(metadata.features['label'].num_classes, activation='softmax')(global_average_layer)
model = tf.keras.models.Model(inputs=base_model.input, outputs=prediction_layer)

model.compile(optimizer=tf.keras.optimizers.Adam(), loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Model summary
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                Output Shape                 Param #   Connected to                  
 input_5 (InputLayer)        [(None, 299, 299, 3)]        0         []                            
                                                                                                  
 conv2d_376 (Conv2D)         (None, 149, 149, 32)         864       ['input_5[0][0]']             
                                                                                                  
 batch_normalization_376 (B  (None, 149, 149, 32)         96        ['conv2d_376[0][0]']          
 atchNormalization)                                                                               
                                                                                                  
 activation_376 (Activation  (None, 149, 149, 32)         0         ['batch_normalization_376[

                                                                                                  
 conv2d_387 (Conv2D)         (None, 35, 35, 32)           6144      ['average_pooling2d_36[0][0]']
                                                                                                  
 batch_normalization_381 (B  (None, 35, 35, 64)           192       ['conv2d_381[0][0]']          
 atchNormalization)                                                                               
                                                                                                  
 batch_normalization_383 (B  (None, 35, 35, 64)           192       ['conv2d_383[0][0]']          
 atchNormalization)                                                                               
                                                                                                  
 batch_normalization_386 (B  (None, 35, 35, 96)           288       ['conv2d_386[0][0]']          
 atchNorma

 )                                                                  ]']                           
                                                                                                  
 activation_393 (Activation  (None, 35, 35, 96)           0         ['batch_normalization_393[0][0
 )                                                                  ]']                           
                                                                                                  
 activation_394 (Activation  (None, 35, 35, 64)           0         ['batch_normalization_394[0][0
 )                                                                  ]']                           
                                                                                                  
 mixed1 (Concatenate)        (None, 35, 35, 288)          0         ['activation_388[0][0]',      
                                                                     'activation_390[0][0]',      
          

 )                                                                  ]']                           
                                                                                                  
 conv2d_404 (Conv2D)         (None, 35, 35, 96)           55296     ['activation_403[0][0]']      
                                                                                                  
 batch_normalization_404 (B  (None, 35, 35, 96)           288       ['conv2d_404[0][0]']          
 atchNormalization)                                                                               
                                                                                                  
 activation_404 (Activation  (None, 35, 35, 96)           0         ['batch_normalization_404[0][0
 )                                                                  ]']                           
                                                                                                  
 conv2d_40

                                                                                                  
 conv2d_406 (Conv2D)         (None, 17, 17, 192)          147456    ['mixed3[0][0]']              
                                                                                                  
 conv2d_409 (Conv2D)         (None, 17, 17, 192)          172032    ['activation_408[0][0]']      
                                                                                                  
 conv2d_414 (Conv2D)         (None, 17, 17, 192)          172032    ['activation_413[0][0]']      
                                                                                                  
 conv2d_415 (Conv2D)         (None, 17, 17, 192)          147456    ['average_pooling2d_39[0][0]']
                                                                                                  
 batch_normalization_406 (B  (None, 17, 17, 192)          576       ['conv2d_406[0][0]']          
 atchNorma

 activation_423 (Activation  (None, 17, 17, 160)          0         ['batch_normalization_423[0][0
 )                                                                  ]']                           
                                                                                                  
 average_pooling2d_40 (Aver  (None, 17, 17, 768)          0         ['mixed4[0][0]']              
 agePooling2D)                                                                                    
                                                                                                  
 conv2d_416 (Conv2D)         (None, 17, 17, 192)          147456    ['mixed4[0][0]']              
                                                                                                  
 conv2d_419 (Conv2D)         (None, 17, 17, 192)          215040    ['activation_418[0][0]']      
                                                                                                  
 conv2d_42

 atchNormalization)                                                                               
                                                                                                  
 activation_428 (Activation  (None, 17, 17, 160)          0         ['batch_normalization_428[0][0
 )                                                                  ]']                           
                                                                                                  
 activation_433 (Activation  (None, 17, 17, 160)          0         ['batch_normalization_433[0][0
 )                                                                  ]']                           
                                                                                                  
 average_pooling2d_41 (Aver  (None, 17, 17, 768)          0         ['mixed5[0][0]']              
 agePooling2D)                                                                                    
          

                                                                                                  
 batch_normalization_438 (B  (None, 17, 17, 192)          576       ['conv2d_438[0][0]']          
 atchNormalization)                                                                               
                                                                                                  
 batch_normalization_443 (B  (None, 17, 17, 192)          576       ['conv2d_443[0][0]']          
 atchNormalization)                                                                               
                                                                                                  
 activation_438 (Activation  (None, 17, 17, 192)          0         ['batch_normalization_438[0][0
 )                                                                  ]']                           
                                                                                                  
 activatio

 )                                                                  ]']                           
                                                                                                  
 conv2d_447 (Conv2D)         (None, 8, 8, 320)            552960    ['activation_446[0][0]']      
                                                                                                  
 conv2d_451 (Conv2D)         (None, 8, 8, 192)            331776    ['activation_450[0][0]']      
                                                                                                  
 batch_normalization_447 (B  (None, 8, 8, 320)            960       ['conv2d_447[0][0]']          
 atchNormalization)                                                                               
                                                                                                  
 batch_normalization_451 (B  (None, 8, 8, 192)            576       ['conv2d_451[0][0]']          
 atchNorma

 )                                                                  ]']                           
                                                                                                  
 activation_458 (Activation  (None, 8, 8, 384)            0         ['batch_normalization_458[0][0
 )                                                                  ]']                           
                                                                                                  
 activation_459 (Activation  (None, 8, 8, 384)            0         ['batch_normalization_459[0][0
 )                                                                  ]']                           
                                                                                                  
 batch_normalization_460 (B  (None, 8, 8, 192)            576       ['conv2d_460[0][0]']          
 atchNormalization)                                                                               
          

 )                                                                  ]']                           
                                                                                                  
 activation_464 (Activation  (None, 8, 8, 384)            0         ['batch_normalization_464[0][0
 )                                                                  ]']                           
                                                                                                  
 activation_467 (Activation  (None, 8, 8, 384)            0         ['batch_normalization_467[0][0
 )                                                                  ]']                           
                                                                                                  
 activation_468 (Activation  (None, 8, 8, 384)            0         ['batch_normalization_468[0][0
 )                                                                  ]']                           
          

In [6]:
# Train the model
history = model.fit(train_batches, epochs=5, validation_data=validation_batches)

# Evaluate on the test dataset
loss, accuracy = model.evaluate(test_batches)
print(f"Test accuracy: {accuracy*100:.2f}%")

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5
Test accuracy: 90.19%
