### 1. What are the advantages of a CNN over a fully connected DNN for image classification?

CNNs have several advantages over fully connected DNNs for image classification:

- Parameter Efficiency: CNNs generally use fewer parameters, reducing the computational cost and the risk of overfitting.
  
- Feature Hierarchies: CNNs can learn hierarchical features, from edges to complex structures, which is beneficial for image data.

- Translational Invariance: CNNs are better equipped to handle translational variance in images.

- Local Connectivity: CNNs consider the local spatial coherence of pixels, making them more semantically meaningful for image tasks.

---

### 2. Consider a CNN composed of three convolutional layers, each with 3 × 3 kernels, a stride of 2, and "same" padding. What is the total number of parameters?

The total number of parameters in this CNN would be calculated as follows:

- First Conv Layer: Parameters = \( (3 \times 3 \times 3 + 1) \times 100 = 2800 \) (Weights + Bias)
- Second Conv Layer: Parameters = \( (3 \times 3 \times 100 + 1) \times 200 = 180200 \)
- Third Conv Layer: Parameters = \( (3 \times 3 \times 200 + 1) \times 400 = 720400 \)

Total Parameters = 2800 + 180200 + 720400 = 903400

---

### 3. If your GPU runs out of memory while training a CNN, what are five things you could try to solve the problem?

If you're running out of GPU memory, you could try:

1. Decrease the Batch Size: This will lower the memory requirement for each training iteration.

2. Gradient Accumulation: Use smaller batches but accumulate gradients over multiple steps before performing an update.

3. Simplify the Model: Reduce the complexity of your CNN by lowering the number of layers or units in each layer.

4. Data Generators: Stream data into the model batch-by-batch instead of loading it all into memory at once.

5. Mixed-Precision Training: Use a combination of 16-bit and 32-bit floating-point numbers to reduce memory usage.

---

### 4. Why would you want to add a max pooling layer rather than a convolutional layer with the same stride?

Max pooling layers offer advantages such as:

- Parameter Efficiency: Max pooling has zero parameters, thereby not increasing the model complexity.
  
- Effective Downsampling: It provides an aggressive way to reduce spatial dimensions, which can help in focusing on more abstract features.

---

### 5. When would you want to add a local response normalization layer?

Local response normalization is generally used when:

- You want to encourage competition between adjacent feature maps.
  
- You expect neurons detecting similar features to be highly activated and aim to dampen the responses that are uniformly large across all feature maps.

---

### 6. Can you name the main innovations in AlexNet, compared to LeNet-5? What about the main innovations in GoogLeNet, ResNet, SENet, and Xception?

- AlexNet: Brought deeper architectures, ReLU activation, and dropout.
  
- GoogLeNet: Introduced the inception modules to capture multi-scale information.

- ResNet: Utilized residual connections to train deeper networks effectively.

- SENet: Introduced Squeeze-and-Excitation blocks to recalibrate feature maps.

- Xception: Employed depthwise separable convolutions for more efficient computation.

---

### 7. What is a fully convolutional network? How can you convert a dense layer into a convolutional layer?

A fully convolutional network is a CNN where all layers are convolutional. To convert a dense layer to a convolutional layer, you can replace it with a convolutional layer whose kernel size matches the spatial dimensions of the input volume.

---

### 8. What is the main technical difficulty of semantic segmentation?

The main challenge in semantic segmentation is to maintain spatial resolution throughout the network so that the output segmentation map is precise. This is difficult because typical CNN architectures downsample the input to extract features.

---

### 9. Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST.


In [2]:
import tensorflow as tf

# Load the MNIST dataset
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Preprocess the data
train_images = train_images.reshape((60000, 28, 28, 1)).astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1)).astype('float32') / 255
train_labels = tf.keras.utils.to_categorical(train_labels, 10)
test_labels = tf.keras.utils.to_categorical(test_labels, 10)

# Build the CNN model
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D((2, 2)),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(train_images, train_labels, epochs=10, batch_size=64, validation_split=0.2)

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)
print("Test accuracy: {:.2f}%".format(test_acc * 100))


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test accuracy: 99.20%


### 10. Use transfer learning for large image classification.

a. Create a training set containing at least 100 images per class. For example, you could classify your own pictures based on the location (beach, mountain, city, etc.), or alternatively you can use an existing dataset (e.g., from TensorFlow Datasets).

b. Split it into a training set, a validation set, and a test set.

c. Build the input pipeline, including the appropriate preprocessing operations, and optionally add data augmentation.

d. Fine-tune a pretrained model on this dataset.

---

In [4]:
import tensorflow as tf
import tensorflow_datasets as tfds

# Load the dataset (replace this with your dataset if needed)
(train_set, test_set), dataset_info = tfds.load(
    'tf_flowers',
    split=['train[:80%]', 'train[80%:]'],
    as_supervised=True,
    with_info=True
)

# Preprocess the dataset
def preprocess(dataset):
    def _preprocess_img(image, label):
        image = tf.image.resize(image, (224, 224))
        image = image / 255.0  # normalize to [0,1]
        return image, label
    return dataset.map(_preprocess_img)

train_set = preprocess(train_set).batch(32).shuffle(buffer_size=1000).prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
test_set = preprocess(test_set).batch(32)

# Load a pre-trained model (VGG16)
base_model = tf.keras.applications.VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Freeze the base model layers
base_model.trainable = False

# Create the final model
model = tf.keras.Sequential([
    base_model,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(dataset_info.features['label'].num_classes, activation='softmax')
])

# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(train_set, epochs=10, validation_data=test_set)

# Unfreeze some layers of the base model for fine-tuning
base_model.trainable = True
for layer in base_model.layers[:15]:
    layer.trainable = False

# Recompile the model (with a lower learning rate)
model.compile(optimizer=tf.keras.optimizers.Adam(1e-5),
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Fine-tune the model
model.fit(train_set, epochs=10, validation_data=test_set)

# Evaluate the model
loss, accuracy = model.evaluate(test_set)
print(f"Final test accuracy: {accuracy*100:.2f}%")


[1mDownloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to C:\Users\shuklas\tensorflow_datasets\tf_flowers\3.0.1...[0m


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/1 [00:00<?, ? splits/s]

Generating train examples...: 0 examples [00:00, ? examples/s]

Shuffling C:\Users\shuklas\tensorflow_datasets\tf_flowers\3.0.1.incompleteVINU9M\tf_flowers-train.tfrecord*...…

[1mDataset tf_flowers downloaded and prepared to C:\Users\shuklas\tensorflow_datasets\tf_flowers\3.0.1. Subsequent calls will reuse this data.[0m
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Final test accuracy: 88.96%
