In [None]:
1. What are the advantages of a CNN over a fully connected DNN for image classification?
2. Consider a CNN composed of three convolutional layers, each with 3 × 3 kernels, a stride of
2, and &quot;same&quot; padding. The lowest layer outputs 100 feature maps, the middle one outputs
200, and the top one outputs 400. The input images are RGB images of 200 × 300 pixels.
What is the total number of parameters in the CNN? If we are using 32-bit floats, at least how much
RAM will this network require when making a prediction for a single instance? What about when
training on a mini-batch of 50 images?
3. If your GPU runs out of memory while training a CNN, what are five things you could try to
solve the problem?
4. Why would you want to add a max pooling layer rather than a convolutional layer with the
same stride?
5. When would you want to add a local response normalization layer?
6. Can you name the main innovations in AlexNet, compared to LeNet-5? What about the main
innovations in GoogLeNet, ResNet, SENet, and Xception?
7. What is a fully convolutional network? How can you convert a dense layer into a
convolutional layer?
8. What is the main technical difficulty of semantic segmentation?
9. Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST.
10. Use transfer learning for large image classification, going through these steps:
a. Create a training set containing at least 100 images per class. For example, you could
classify your own pictures based on the location (beach, mountain, city, etc.), or
alternatively you can use an existing dataset (e.g., from TensorFlow Datasets).
b. Split it into a training set, a validation set, and a test set.
c. Build the input pipeline, including the appropriate preprocessing operations, and
optionally add data augmentation.
d. Fine-tune a pretrained model on this dataset.

In [None]:
1. Advantages of CNN over fully connected DNN for image classification:
   - **Translation invariance**: CNNs can detect patterns regardless of their location in the image due to the use of shared weights in convolutional layers.
   - **Parameter efficiency**: CNNs exploit spatial correlations in images, resulting in fewer parameters compared to fully connected DNNs.
   - **Hierarchical feature learning**: CNNs learn hierarchical representations of features, capturing low-level features in early layers and high-level features in deeper layers.
   - **Better handling of large inputs**: CNNs can efficiently process large input images without requiring massive computational resources compared to fully connected DNNs.

2. Total number of parameters in the CNN:
   - Each convolutional layer with 3x3 kernels and 100, 200, and 400 feature maps respectively:
     - Parameters per layer = (3 * 3 * input_channels + 1) * output_channels
     - Parameters for the first layer = (3 * 3 * 3 + 1) * 100 = 2800
     - Parameters for the second layer = (3 * 3 * 100 + 1) * 200 = 180200
     - Parameters for the third layer = (3 * 3 * 200 + 1) * 400 = 720400
   - Total parameters = 2800 + 180200 + 720400 = 903400

   RAM required for prediction:
   - Total RAM = (number of parameters * 32 bits) / 8
   - RAM for prediction = (903400 * 32) / 8 ≈ 3.4 MB

   RAM required for training on a mini-batch of 50 images:
   - RAM for training = RAM for prediction * 50 ≈ 170 MB

3. If GPU runs out of memory while training a CNN, you could try:
   - Reducing batch size.
   - Reducing model complexity (e.g., reducing the number of layers or feature maps).
   - Using mixed precision training.
   - Increasing GPU memory by using a GPU with more memory.
   - Using data parallelism across multiple GPUs.

4. Max pooling layers are used to downsample feature maps and reduce computational complexity while maintaining important features. Adding a convolutional layer with the same stride would increase the number of parameters and computations, leading to higher memory and computational requirements.

5. Local Response Normalization (LRN) layers are used to normalize activations within local neighborhoods, encouraging competition between adjacent neurons and enhancing the contrast between features. They are typically added after convolutional layers in CNN architectures to improve generalization and robustness.

6. Main innovations:
   - **AlexNet**: Introduced the concept of deep CNNs with multiple convolutional layers, ReLU activation, dropout regularization, and GPU acceleration.
   - **GoogLeNet**: Introduced the inception module with parallel convolutional operations of different sizes to capture features at multiple scales efficiently.
   - **ResNet**: Introduced residual connections to address the vanishing gradient problem, enabling training of very deep networks (hundreds of layers).
   - **SENet (Squeeze-and-Excitation Networks)**: Introduced the attention mechanism by adaptively recalibrating channel-wise feature responses.
   - **Xception**: Introduced depthwise separable convolutions to reduce the number of parameters and computations while maintaining expressive power.

7. Fully Convolutional Network (FCN) replaces dense layers with convolutional layers, allowing the network to accept input of any size and produce output feature maps with spatial dimensions. To convert a dense layer into a convolutional layer, you set the kernel size equal to the input size and the number of filters equal to the number of neurons in the dense layer.

8. The main technical difficulty of semantic segmentation is achieving accurate pixel-level predictions while preserving spatial information and handling class imbalance, occlusions, and varying object scales within an image.

9. Building a CNN from scratch for MNIST classification:

```python
import tensorflow as tf
from tensorflow.keras import layers, models

# Define CNN architecture
model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Load MNIST data
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize pixel values to between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

# Reshape images for CNN input
train_images = train_images.reshape((-1, 28, 28, 1))
test_images = test_images.reshape((-1, 28, 28, 1))

# Train the model
model.fit(train_images, train_labels, epochs=5, validation_data=(test_images, test_labels))

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)
```

10. Transfer learning for large image classification:
   - (a) Collect or use an existing dataset containing at least 100 images per class.
   - (b) Split the dataset into training, validation, and test sets.
   - (c) Build the input pipeline, preprocess images, and optionally apply data augmentation techniques.
   - (d) Fine-tune a pretrained model (e.g., ResNet, Inception, VGG) on the training set and evaluate its performance on the validation set. Adjust hyperparameters and continue fine-tuning until satisfactory performance is achieved. Finally, evaluate the model on the test set to assess its generalization ability.