## Assignment 6

## 1.	What are the advantages of a CNN over a fully connected DNN for image classification?

Ans=>

Convolutional Neural Networks (CNNs) are particularly well-suited for image classification tasks because they can detect patterns in local regions of an image and learn features that are translation-invariant. In contrast, fully connected DNNs consider each input feature as independent from the others, which leads to too many parameters when applied to image classification problems.

## 2.	Consider a CNN composed of three convolutional layers, each with 3 × 3 kernels, a stride of 2, and "same" padding. The lowest layer outputs 100 feature maps, the middle one outputs 200, and the top one outputs 400. The input images are RGB images of 200 × 300 pixels.
What is the total number of parameters in the CNN? If we are using 32-bit floats, at least how much RAM will this network require when making a prediction for a single instance? What about when training on a mini-batch of 50 images?

Ans=>



## 3.	If your GPU runs out of memory while training a CNN, what are five things you could try to solve the problem?


Ans=>

If your GPU runs out of memory while training a CNN, here are five things you could try:

1. Reduce the batch size. This will reduce the amount of memory needed to store the activations and gradients for each layer during training.
2. Reduce the size of the input images. This will reduce the number of activations in each layer, and therefore the memory needed to store them.
3. Use mixed precision training. This technique uses lower-precision (e.g. float16) arithmetic to store the activations and gradients, which reduces the memory footprint.
4. Use gradient checkpointing. This technique trades compute time for memory, by recomputing some intermediate activations during the backward pass rather than storing them.
5. Use a smaller model. This is a simple but effective solution. A smaller model will have fewer parameters, and will therefore use less memory.

## 4.	Why would you want to add a max pooling layer rather than a convolutional layer with the same stride?



Ans=>

A max pooling layer is often added to a CNN instead of a convolutional layer with the same stride for two reasons. First, a max pooling layer discards the intermediate activations that are not the maximum, which makes the model more robust to small variations in the input. Second, a max pooling layer reduces the size of the input feature maps, which reduces the number of parameters in the model and therefore helps prevent overfitting.





## 5.	When would you want to add a local response normalization layer?

Ans=>

A local response normalization layer is used to promote competition among the neurons in the same location across different feature maps, enhancing the inhibitory effects and making the output more sparsely activated.

## 6.	Can you name the main innovations in AlexNet, compared to LeNet-5? What about the main innovations in GoogLeNet, ResNet, SENet, and Xception?



Ans=>

AlexNet brought several innovations to the field of computer vision, including the use of ReLU activation functions, dropout regularization, data augmentation techniques, and the use of GPUs for training large neural networks. GoogLeNet introduced the Inception module, which allowed for the creation of much deeper and wider neural networks while maintaining computational efficiency. ResNet introduced the concept of residual connections, which allowed for very deep networks to be trained without suffering from vanishing gradients. SENet introduced the use of channel-wise and spatial-wise attention mechanisms to improve the representation of the input. Xception introduced the concept of depthwise separable convolutions, which allowed for a significant reduction in the number of parameters required for a given accuracy level.

## 7.	What is a fully convolutional network? How can you convert a dense layer into a convolutional layer?


Ans=>

A fully convolutional network (FCN) is a neural network architecture in which all layers are convolutional, and it is used primarily for image segmentation tasks. To convert a dense layer into a convolutional layer, you would need to reshape the dense layer's weights so that they have the same shape as a convolutional kernel. This can be done using the reshape function in TensorFlow, for example.

## 8.	What is the main technical difficulty of semantic segmentation?

Ans=>

The main technical difficulty of semantic segmentation is the dense prediction of every pixel in the input image. Unlike image classification, where the model predicts a single label for the entire image, semantic segmentation requires a model to predict a label for each pixel in the image. This requires the model to have a much larger receptive field to capture contextual information, and to be able to encode spatial information from the input image to output a spatially correlated segmentation map. Additionally, handling class imbalance, variability in object appearance and background clutter, and occlusions are additional challenges.





## 9.	Build your own CNN from scratch and try to achieve the highest possible accuracy on MNIST.


Ans=>



In [1]:
import tensorflow as tf
from tensorflow import keras

# Load the MNIST dataset
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()

# Scale the pixel intensities to the [0, 1] range
X_train_full = X_train_full / 255.0
X_test = X_test / 255.0

# Split the training data into a validation set
X_train, X_val = X_train_full[:-5000], X_train_full[-5000:]
y_train, y_val = y_train_full[:-5000], y_train_full[-5000:]

# Define the model architecture
model = keras.models.Sequential([
    keras.layers.Conv2D(32, kernel_size=3, activation="relu", padding="same", input_shape=[28, 28, 1]),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(32, kernel_size=3, activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(pool_size=2),
    keras.layers.Dropout(0.25),
    keras.layers.Conv2D(64, kernel_size=3, activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.Conv2D(64, kernel_size=3, activation="relu", padding="same"),
    keras.layers.BatchNormalization(),
    keras.layers.MaxPooling2D(pool_size=2),
    keras.layers.Dropout(0.25),
    keras.layers.Flatten(),
    keras.layers.Dense(256, activation="relu"),
    keras.layers.BatchNormalization(),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(10, activation="softmax")
])

# Compile the model
model.compile(loss="sparse_categorical_crossentropy", optimizer="nadam", metrics=["accuracy"])

# Train the model
history = model.fit(X_train, y_train, epochs=20, validation_data=(X_val, y_val))

# Evaluate the model on the test set
model.evaluate(X_test, y_test)


Epoch 1/20


ValueError: in user code:

    c:\users\pulkit\appdata\local\programs\python\python38\lib\site-packages\keras\engine\training.py:853 train_function  *
        return step_function(self, iterator)
    c:\users\pulkit\appdata\local\programs\python\python38\lib\site-packages\keras\engine\training.py:842 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    c:\users\pulkit\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1286 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    c:\users\pulkit\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2849 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    c:\users\pulkit\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:3632 _call_for_each_replica
        return fn(*args, **kwargs)
    c:\users\pulkit\appdata\local\programs\python\python38\lib\site-packages\keras\engine\training.py:835 run_step  **
        outputs = model.train_step(data)
    c:\users\pulkit\appdata\local\programs\python\python38\lib\site-packages\keras\engine\training.py:787 train_step
        y_pred = self(x, training=True)
    c:\users\pulkit\appdata\local\programs\python\python38\lib\site-packages\keras\engine\base_layer.py:1020 __call__
        input_spec.assert_input_compatibility(self.input_spec, inputs, self.name)
    c:\users\pulkit\appdata\local\programs\python\python38\lib\site-packages\keras\engine\input_spec.py:229 assert_input_compatibility
        raise ValueError('Input ' + str(input_index) + ' of layer ' +

    ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (None, 28, 28)


## 10.	Use transfer learning for large image classification, going through these steps:
a.	Create a training set containing at least 100 images per class. For example, you could classify your own pictures based on the location (beach, mountain, city, etc.), or alternatively you can use an existing dataset (e.g., from TensorFlow Datasets).
b.	Split it into a training set, a validation set, and a test set.
c.	Build the input pipeline, including the appropriate preprocessing operations, and optionally add data augmentation.
d.	Fine-tune a pretrained model on this dataset.

Ans=>



## ----------------------------------------------------------------------------------------------------------------------------------