In [1]:
# 1. What are the advantages of a CNN for image classification over a completely linked DNN?

# Ans:
# Local connectivity: CNNs exploit the spatial structure of images by using local connectivity patterns through convolutional 
# layers, allowing them to capture local patterns efficiently. Completely linked DNNs lack this specialized structure.

# Parameter sharing: CNNs share weights through convolutional filters, reducing the total number of parameters. This sharing enables 
# CNNs to generalize better to new, unseen images and handle variations in position, scale, and orientation.

# Translation invariance: CNNs are inherently translation invariant because the same filters are applied across different image regions. 
# This property makes them robust to small shifts or translations in the input image, improving classification accuracy.

# Hierarchical feature extraction: CNNs typically have multiple layers with increasing abstraction levels. Lower layers capture low-level
# features like edges and textures, while higher layers capture more complex and semantic features. This hierarchical feature extraction 
# helps CNNs learn meaningful representations for image classification.

In [3]:
# 2. Consider a CNN with three convolutional layers, each of which has three kernels, a stride of two,
# and SAME padding. The bottom layer generates 100 function maps, the middle layer 200, and the
# top layer 400. RGB images with a size of 200 x 300 pixels are used as input. How many criteria does
# the CNN have in total? How much RAM would this network need when making a single instance
# prediction if we&#39;re using 32-bit floats? What if you were to practice on a batch of 50 images?

# Ans:
# The CNN described has a total of 3 + 3 + 3 = 9 convolutional kernels or filters. Since each layer generates a different number of 
# feature maps, the total number of feature maps is 100 + 200 + 400 = 700.

# To calculate the RAM needed for a single instance prediction, we need to consider the size of each feature map. Assuming 32-bit 
# floats are used, each pixel in a feature map occupies 4 bytes. The input image size is 200 x 300 pixels, and with SAME padding and 
# stride 2 in each convolutional layer, the feature map sizes are reduced by a factor of 2 in each dimension. Therefore, the RAM needed
# for a single instance prediction is:

# 200 x 300 x 4 bytes (input image) + 100 x (200/2) x (300/2) x 4 bytes (bottom layer feature maps) +
# 200 x (200/4) x (300/4) x 4 bytes (middle layer feature maps) +
# 400 x (200/8) x (300/8) x 4 bytes (top layer feature maps) = Total RAM usage in bytes.

# If practicing on a batch of 50 images, the RAM needed would be 50 times the RAM usage calculated for a single instance prediction.

In [5]:
# 3. What are five things you might do to fix the problem if your GPU runs out of memory while training a CNN?

# Ans:
# Reduce batch size: Decrease the number of samples processed in each batch, which reduces the memory requirement per batch.

# Use mixed precision training: Utilize lower precision (e.g., float16) for storing weights and activations, which reduces memory usage
# without significantly impacting model performance.

# Apply gradient checkpointing: Trade-off computation for memory by recomputing intermediate activations during backpropagation, 
# reducing memory consumption.

# Limit model complexity: Reduce the number of layers, neurons, or parameters in the CNN architecture to decrease memory requirements.

# Utilize model parallelism: Split the model across multiple GPUs, distributing the memory load and allowing for larger models to fit 
# within the combined memory capacity.

In [6]:
# 4. Why would you use a max pooling layer instead with a convolutional layer of the same stride?

# Ans:
# Dimensionality reduction: Max pooling reduces the spatial dimensions of the feature maps, resulting in a smaller output size.
# This can help to reduce memory requirements and computational complexity in subsequent layers.

# Translation invariance: Max pooling introduces a degree of translation invariance by capturing the maximum value within each pooling 
# region. This allows the network to focus on the most prominent features while being less sensitive to slight spatial shifts or 
# variations in the input.

In [7]:
# 5. When would a local response normalization layer be useful?

# Ans:
# A local response normalization layer can be useful in CNN architectures when there is a need for local contrast normalization or lateral 
# inhibition. It helps to enhance the activation of neurons that are relatively more active compared to their neighboring neurons, 
# promoting competition and improving the model's ability to capture salient features. Local response normalization can be particularly 
# beneficial in tasks such as object detection, where precise localization and discrimination of objects are important.

In [8]:
# 6. In comparison to LeNet-5, what are the main innovations in AlexNet? What about GoogLeNet and
# ResNet&#39;s core innovations?

# Ans:
# AlexNet: Increased depth, ReLU activation, local response normalization.

# GoogLeNet: Inception modules for multi-scale feature extraction, reducing parameters.

# ResNet: Skip connections for training very deep networks, learning residual mappings.

In [9]:
# 7. On MNIST, build your own CNN and strive to achieve the best possible accuracy.

# Ans:
# Input layer: Accepts grayscale images of size 28x28.
# Convolutional layer: Applies 32 filters of size 3x3, using ReLU activation.
# Max pooling layer: Performs max pooling with a pool size of 2x2.
# Convolutional layer: Applies 64 filters of size 3x3, using ReLU activation.
# Max pooling layer: Performs max pooling with a pool size of 2x2.
# Flatten layer: Flattens the 2D feature maps into a 1D vector.
# Fully connected layer: Consists of 128 neurons, using ReLU activation.
# Dropout layer: Helps prevent overfitting by randomly dropping out neurons during training.
# Output layer: Consists of 10 neurons with softmax activation for multi-class classification.


In [None]:
# 8. Using Inception v3 to classify broad images. a.
# Images of different animals can be downloaded. Load them in Python using the
# matplotlib.image.mpimg.imread() or scipy.misc.imread() functions, for example. Resize and/or crop
# them to 299 x 299 pixels, and make sure they only have three channels (RGB) and no transparency.
# The photos used to train the Inception model were preprocessed to have values ranging from -1.0 to 1.0, so make sure yours do as well.

# Sol:
import urllib.request
import numpy as np
from PIL import Image
import io

# List of image URLs
image_urls = [
    'https://thumbs.dreamstime.com/b/beautiful-rain-forest-ang-ka-nature-trail-36703721.jpg',
    'https://images.unsplash.com/photo-1503023345310-bd7c1de61c7d?ixlib=rb-4.0.3&ixid=MnwxMjA3fDB8MHxzZWFyY2h8Mnx8aHVtYW58ZW58MHx8MHx8&w=1000&q=80',

]

# Load, resize, crop, and preprocess images
preprocessed_images = []

for url in image_urls:
    # Load the image from URL
    with urllib.request.urlopen(url) as f:
        image_data = f.read()

    # Convert image data to PIL Image object
    image = Image.open(io.BytesIO(image_data))

    # Resize and crop the image to 299x299 pixels
    image = image.resize((299, 299), Image.BILINEAR)

    # Ensure the image has three channels (RGB)
    if image.mode != 'RGB':
        image = image.convert('RGB')

    # Convert the image to a numpy array and preprocess it
    image_array = np.array(image)
    preprocessed_image = (image_array / 255.0) * 2.0 - 1.0

    # Add preprocessed image to the list
    preprocessed_images.append(preprocessed_image)

# Now, the preprocessed_images list contains the preprocessed images ready for classification using Inception v3

In [13]:
# 9. Large-scale image recognition using transfer learning.
# a. Make a training set of at least 100 images for each class. You might, for example, identify your
# own photos based on their position (beach, mountain, area, etc.) or use an existing dataset, such as
# the flowers dataset or MIT&#39;s places dataset (requires registration, and it is huge).
# b. Create a preprocessing phase that resizes and crops the image to 299 x 299 pixels while also
# adding some randomness for data augmentation.
# c. Using the previously trained Inception v3 model, freeze all layers up to the bottleneck layer (the
# last layer before output layer) and replace output layer with appropriate number of outputs for
# your new classification task (e.g., the flowers dataset has five mutually exclusive classes so the
# output layer must have five neurons and use softmax activation function).
# d. Separate the data into two sets: a training and a test set. The training set is used to train the
# model, and the test set is used to evaluate it.

# Sol:

# To perform large-scale image recognition using transfer learning with Inception v3:
# a. Create a training set with at least 100 images per class.
# b. Preprocess the images by resizing, cropping, and applying data augmentation.
# c. Modify the Inception v3 model by freezing layers and replacing the output layer.
# d. Split the data into training and test sets for model evaluation.