1. The advantages of a CNN over a fully connected DNN for image classification are:
- CNNs are designed to handle spatial information in images, while fully connected DNNs treat input features as independent, which can lead to a high number of parameters and overfitting.
- CNNs use shared weights, meaning that the same weights are applied to different parts of the image, which reduces the number of parameters and allows the network to learn translation invariance.
- CNNs can use pooling layers to downsample the feature maps, reducing the number of parameters and preventing overfitting.

2. The total number of parameters in the CNN is:
- First convolutional layer: (3*3*3)*100 = 2,700
- Second convolutional layer: (3*3*100)*200 = 1,800,000
- Third convolutional layer: (3*3*200)*400 = 7,200,000
- Fully connected layer: (400*10) = 4,000
Total number of parameters: 8,006,700
If we are using 32-bit floats, the network will require approximately 30.8 MB of RAM when making a prediction for a single instance, and approximately 1.54 GB of RAM when training on a mini-batch of 50 images.

3. Five things you could try to solve the problem of GPU running out of memory while training a CNN are:
- Reduce the batch size.
- Use a smaller model.
- Use a model with fewer layers.
- Use smaller images.
- Use mixed precision training.

4. Max pooling layers reduce the spatial dimensions of the feature maps while retaining the most important information, which helps to prevent overfitting and reduce the number of parameters. A convolutional layer with the same stride would not necessarily have this effect.

5. Local response normalization layers are used to increase the generalization of the network by normalizing the activity of the neurons in a local neighborhood. This can help to prevent overfitting and improve the accuracy of the network.

6. AlexNet introduced several innovations compared to LeNet-5, including:
- The use of ReLU activation functions instead of sigmoid functions.
- The use of dropout regularization to prevent overfitting.
- The use of data augmentation to increase the size of the training set.
- The use of GPU acceleration to speed up training.
GoogLeNet introduced several innovations, including:
- The use of inception modules, which concatenate feature maps from multiple different-sized filters.
- The use of global average pooling, which reduces the number of parameters and prevents overfitting.
- The use of auxiliary classifiers, which provide additional supervision and help to prevent vanishing gradients.
ResNet introduced residual connections, which allow for deeper networks to be trained without vanishing gradients.
SENet introduced squeeze-and-excitation modules, which adaptively recalibrate the channel-wise feature responses.
Xception introduced depthwise separable convolutions, which separate the spatial and channel-wise convolutions.

7. A fully convolutional network is a network that replaces the fully connected layers of a traditional network with convolutional layers, allowing it to take input of arbitrary size and output feature maps of the same size. To convert a dense layer into a convolutional layer, the weights of the dense layer can be reshaped into a 1x1 convolutional layer, and the input to the dense layer can be reshaped into a feature map.

8. The main technical difficulty of semantic segmentation is to produce accurate and detailed segmentations of objects in an image, while maintaining spatial coherence and handling variations in object shape, size, and occlusion. This requires the use of complex architectures and training techniques, as well as careful preprocessing and postprocessing of the data.


9.Here's an example CNN architecture that you can build from scratch to achieve high accuracy on the MNIST dataset:

10.Here are the steps for using transfer learning for large image classification:
a. Create a training set containing at least 100 images per class. For example, you could download the "flower_photos" dataset from TensorFlow Datasets, which contains images of flowers categorized into five classes: daisy, dandelion, roses, sunflowers, and tulips.

b. Split the dataset into a training set, a validation set, and a test set.



In [None]:
import tensorflow as tf
import tensorflow_datasets as tfds

dataset, info = tfds.load('flower_photos', with_info=True, as_supervised=True)
train_dataset = dataset['train']
test_dataset = dataset['test']

validation_split = 0.2
num_examples = info.splits['train'].num_examples
num_valid_examples = int(validation_split * num_examples)

train_dataset = train_dataset.skip(num_valid_examples)
validation_dataset = train_dataset.take(num_valid_examples)



In [None]:
IMG_SIZE = 224

def preprocess_image(image, label):
    image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))
    image = tf.cast(image, tf.float32)
    image = image / 255.0
    return image, label

train_dataset = train_dataset.map(preprocess_image).shuffle(1000).batch(32)
validation_dataset = validation_dataset.map(preprocess_image).batch(32)
test_dataset = test_dataset.map(preprocess_image).batch(32)

data_augmentation = tf.keras.Sequential([
    tf.keras.layers.experimental.preprocessing.RandomFlip('horizontal'),
    tf.keras.layers.experimental.preprocessing.RandomRotation(0.2),
])


In [None]:
base_model = tf.keras.applications.ResNet50(include_top=False, input_shape=(IMG_SIZE, IMG_SIZE, 3))

for layer in base_model.layers:
    layer.trainable = False

model = tf.keras.Sequential([
    data_augmentation,
    base_model,
    tf.keras.layers.GlobalAveragePooling2D(),
    tf.keras.layers.Dense(5, activation='softmax')
])

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

history = model.fit(train_dataset,
                    epochs=10,
                    validation_data=validation_dataset)
