In [1]:
# 1. What are the advantages of a CNN over a fully connected DNN for image classification?

# Ans:
# Localized feature learning: CNNs leverage convolutional layers to learn spatially localized features, capturing patterns in 
# different regions of the image. This is particularly effective for image analysis tasks where spatial relationships are important.

# Parameter efficiency: CNNs use parameter sharing and local connectivity, significantly reducing the number of parameters compared to
# fully connected DNNs. This makes CNNs more efficient for processing high-dimensional image data.

# Translation invariance: CNNs are able to recognize patterns regardless of their position in the image, thanks to pooling layers that 
# downsample feature maps. This enables CNNs to handle variations in object position and scale.

# Hierarchical feature representation: CNNs typically consist of multiple convolutional and pooling layers, allowing them to learn 
# hierarchical representations of image features. This enables the network to capture increasingly complex and abstract features.

# Overall, CNNs excel in image classification tasks due to their ability to capture spatial information, parameter efficiency, 
# translation invariance, and hierarchical feature learning.

In [3]:
# 2. Consider a CNN composed of three convolutional layers, each with 3 × 3 kernels, a stride of
# 2, and &quot;same&quot; padding. The lowest layer outputs 100 feature maps, the middle one outputs
# 200, and the top one outputs 400. The input images are RGB images of 200 × 300 pixels.

# Ans:
# The CNN described consists of three convolutional layers with 3x3 kernels, a stride of 2, and "same" padding. The lowest layer produces
# 100 feature maps, the middle layer produces 200, and the top layer produces 400. The input images are RGB images with dimensions of 
# 200x300 pixels.
# To calculate the total number of parameters in the CNN, we need to consider the parameters in the convolutional layers.

# For each convolutional layer, the number of parameters can be calculated as:
# (number of input channels * kernel size * kernel size * number of output channels) + (number of output channels)

# For the given CNN with three convolutional layers, each with 3x3 kernels, the parameter calculation would be as follows:

# First Convolutional Layer:
# Parameters = (3 * 3 * 3 * 100) + 100 = 2,800

# Second Convolutional Layer:
# Parameters = (100 * 3 * 3 * 200) + 200 = 1,802,00

# Third Convolutional Layer:
# Parameters = (200 * 3 * 3 * 400) + 400 = 7,204,00

# Total number of parameters = 2,800 + 1,802,00 + 7,204,00 = 9,007,200

# Now, to estimate the RAM required for predictions and training:

# For a single prediction instance:
# Assuming a 32-bit float (4 bytes) for each parameter, the RAM required for predictions would be:
# RAM = Total number of parameters * 4 bytes = 9,007,200 * 4 = 36,028,800 bytes (approximately 34.35 MB)

# For a mini-batch of 50 images:
# The RAM required would be:
# RAM = Total number of parameters * mini-batch size * 4 bytes = 9,007,200 * 50 * 4 = 1,801,440,000 bytes (approximately 1.68 GB)

# Please note that these calculations consider only the parameters and not the memory required for storing intermediate activations, 
# gradients, or other overheads associated with training the network.

In [5]:
# 3. If your GPU runs out of memory while training a CNN, what are five things you could try to solve the problem?

# Ans:
# If your GPU runs out of memory while training a CNN, here are five things you could try to solve the problem:

# Reduce batch size: Use a smaller batch size to reduce the memory requirements per iteration.
# Decrease model complexity: Reduce the number of layers, parameters, or filter sizes in the CNN architecture to reduce memory usage.
# Use mixed precision training: Utilize mixed precision techniques, such as TensorFlow's Automatic Mixed Precision (AMP), to reduce memory
# usage without sacrificing accuracy.
# Enable memory optimization techniques: Enable memory optimization flags and settings provided by deep learning frameworks, such as 
# TensorFlow's memory growth or memory optimization options.
# Utilize data augmentation: Apply data augmentation techniques during training to generate augmented images on-the-fly, reducing the
# need to store additional copies of the training data in memory.
# These approaches can help mitigate memory issues when training a CNN on a GPU, allowing for successful training even with limited 
# GPU memory.

In [6]:
# 4. Why would you want to add a max pooling layer rather than a convolutional layer with the same stride?

# Ans:
# Adding a max pooling layer instead of a convolutional layer with the same stride can be beneficial for two reasons:

# Dimension reduction: Max pooling reduces the spatial dimensions of the feature maps by selecting the maximum value within each pooling 
# window. This helps in reducing the computational complexity and the number of parameters in the network.

# Translation invariance: Max pooling introduces a degree of translation invariance by selecting the maximum value within each pooling 
# window. This allows the network to capture the presence of a feature regardless of its precise location in the input, making the
# model more robust to variations in object position or spatial shifts.

# Overall, using a max pooling layer provides dimension reduction and translation invariance, which can enhance the network's efficiency
# and robustness in handling spatial features.

In [7]:
# 5. When would you want to add a local response normalization layer?

# Ans:
# A local response normalization (LRN) layer is typically added in convolutional neural networks (CNNs) when there is a need to enhance
# the network's ability to generalize across different contrast levels and suppress response to strong activations. The LRN layer helps
# normalize the response of neurons within a local neighborhood, allowing the network to focus on relatively stronger activations and 
# promote competition among neighboring neurons. This can be beneficial in scenarios where contrast normalization and local inhibition
# are desired, such as in image recognition tasks where variations in contrast and response suppression are relevant factors.

In [8]:
# 6. Can you name the main innovations in AlexNet, compared to LeNet-5? What about the main innovations in GoogLeNet, ResNet, SENet,
# and Xception?

# Ans:
# AlexNet introduced several key innovations compared to LeNet-5, including the use of rectified linear units (ReLU) as activation 
# functions, dropout regularization, data augmentation, and the utilization of GPUs for efficient training of deep neural networks.

# GoogLeNet introduced the inception module, which utilized parallel convolutional layers of different sizes to capture features at 
# various scales and combined them through concatenation. This allowed for efficient and deep network architectures.

# ResNet introduced residual connections, which enabled the training of extremely deep networks by addressing the vanishing gradient 
# problem. These skip connections allowed gradients to flow directly to earlier layers, facilitating the training of deep neural 
# networks with hundreds or even thousands of layers.

# SENet introduced the concept of squeeze-and-excitation blocks, which adaptively recalibrate the channel-wise feature responses by 
# leveraging global information. This mechanism improves the model's ability to capture important features and enhances its performance.

# Xception introduced depthwise separable convolutions, which decouple spatial and channel-wise convolutions. This reduces the 
# computational complexity while maintaining the representational capacity, resulting in more efficient and powerful network architectures.

# These innovations in deep learning architectures have played significant roles in advancing the field and achieving state-of-the-art 
# performance in various computer vision tasks.

In [9]:
# 7. What is a fully convolutional network? How can you convert a dense layer into a convolutional layer?

# Ans:
# A fully convolutional network (FCN) is a type of neural network architecture that consists entirely of convolutional layers, 
# without any fully connected (dense) layers. FCNs are commonly used in tasks such as image segmentation, where the output is a
# dense prediction map.

# To convert a dense layer into a convolutional layer, you can use a 1x1 convolutional layer. This operation allows for spatial 
# information to be preserved while transforming the dense layer into a convolutional layer. By setting the appropriate number 
# of filters in the 1x1 convolutional layer, you can control the output dimensions and effectively convert the dense layer into a 
# convolutional layer.

In [10]:
# 8. What is the main technical difficulty of semantic segmentation?

# Ans:
# The main technical difficulty of semantic segmentation is accurately assigning the correct class label to each pixel in an image.
# This requires overcoming challenges such as handling object boundaries, handling varying object sizes and shapes, 
# dealing with occlusions and overlapping objects, and ensuring spatial coherence in the segmentation results.

In [12]:
pip install tensorflow

Collecting tensorflow
  Downloading tensorflow-2.12.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (585.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m585.9/585.9 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hCollecting tensorboard<2.13,>=2.12
  Downloading tensorboard-2.12.3-py3-none-any.whl (5.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m44.5 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting libclang>=13.0.0
  Downloading libclang-16.0.0-py2.py3-none-manylinux2010_x86_64.whl (22.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m22.9/22.9 MB[0m [31m37.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting opt-einsum>=2.3.2
  Downloading opt_einsum-3.3.0-py3-none-any.whl (65 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.5/65.5 kB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting keras<2.13,>=2.12.0
  Downloadin

In [14]:
# 9. Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST.

# Sol:
import tensorflow as tf
from tensorflow.keras import layers

# Define the CNN model
model = tf.keras.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

# Load and preprocess the MNIST dataset
mnist = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
train_images = train_images.reshape(-1, 28, 28, 1) / 255.0
test_images = test_images.reshape(-1, 28, 28, 1) / 255.0

# Train the model
model.fit(train_images, train_labels, epochs=10, batch_size=16)

# Evaluate the model
test_loss, test_acc = model.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test accuracy: 0.9890000224113464


In [21]:
# 10. Use transfer learning for large image classification, going through these steps:
# a. Create a training set containing at least 100 images per class. For example, you could
# classify your own pictures based on the location (beach, mountain, city, etc.), or
# alternatively you can use an existing dataset (e.g., from TensorFlow Datasets).
# b. Split it into a training set, a validation set, and a test set.
# c. Build the input pipeline, including the appropriate preprocessing operations, and optionally add data augmentation.
# d. Fine-tune a pretrained model on this dataset.

# Sol:

import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess the data
x_train = x_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0
x_test = x_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0

# Convert labels to one-hot encoded vectors
y_train = tf.one_hot(y_train, depth=10)
y_test = tf.one_hot(y_test, depth=10)

# Build the model
model = tf.keras.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train,
          validation_split=0.1,
          batch_size=32,
          epochs=10)

# Evaluate the model on the test set
test_loss, test_acc = model.evaluate(x_test, y_test)
print('Test accuracy:', test_acc)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
Test accuracy: 0.9850999712944031
