#**Question 1: What is the role of filters and feature maps in Convolutional Neural Network (CNN)?**

Filters (also known as kernels) and feature maps are fundamental components of a Convolutional Neural Network (CNN). Here's a breakdown of their roles:

**Filters (Kernels):**

**- Feature Detection:** Filters are small matrices of numbers that slide across the input image (or the output of a previous layer). Each filter is designed to detect a specific type of feature, such as edges (horizontal, vertical, diagonal), corners, textures, or more complex patterns. For example, one filter might activate strongly when it encounters a vertical edge, while another might activate for a horizontal edge.

**- Weight Sharing:** A key characteristic of CNNs is that the same filter is applied across the entire input image. This weight sharing significantly reduces the number of parameters in the network, making it more efficient and helping it generalize better to new, unseen data.

**- Learned Representations:** During the training process, the values within these filters are learned automatically by the network to best identify relevant features for the given task (e.g., image classification).

**Feature Maps (Activation Maps):**

**- Output of Convolution:** When a filter convolves (slides over) an input image or feature map, it performs element-wise multiplication and summation. The result of this operation at each position creates a single value in the output.

**- Spatial Representation of Features:** The collection of these output values, generated by a single filter across the entire input, forms a feature map. Each feature map represents the presence and strength of the specific feature that its corresponding filter is designed to detect, across different spatial locations in the input.

**- Hierarchical Learning:** As data passes through multiple convolutional layers in a CNN, early layers detect simple features (like edges). Subsequent layers combine these simple features to detect more complex, abstract features (like eyes, noses, or entire objects). Each layer's output is a set of feature maps that serve as the input for the next layer, building a hierarchical representation of the input data.

#**Question 2: Explain the concepts of padding and stride in CNNs(Convolutional Neural Network). How do they affect the output dimensions of feature maps?**

Padding and stride are two important concepts in Convolutional Neural Networks (CNNs) that significantly influence the dimensions of the output feature maps. Let's break them down:

**Padding:**

**- Concept:** Padding involves adding extra pixels (typically zeros) around the border of the input image or feature map before applying the convolution operation. This is also known as 'zero-padding' when the added pixel values are zero.

**- Purpose:** The primary purposes of padding are:

**1.Preserving Spatial Dimensions:** Without padding, the output feature map shrinks with each convolutional layer, especially at the borders, as the filter cannot be centered over border pixels. Padding helps maintain the spatial dimensions of the input, allowing the output feature map to have the same or a similar size as the input.

**2.Retaining Information:** Pixels at the borders of an image are processed fewer times than central pixels during convolution. Padding ensures that border pixels contribute more to the output, preventing loss of information from the edges of the input.

**3.Allowing Deeper Networks:** By preventing excessive reduction in feature map size, padding enables the construction of deeper CNNs without losing all spatial information too quickly.

**Types:** Common types include:

**- 'Valid' (No Padding):** No padding is added. The output feature map will be smaller than the input.

**- 'Same' (Zero Padding):** Padding is added such that the output feature map has the same spatial dimensions as the input feature map, assuming a stride of 1.

**Stride:**

**- Concept:** Stride defines the number of pixels by which the convolution filter shifts across the input image or feature map. A stride of 1 means the filter moves one pixel at a time, while a stride of 2 means it moves two pixels at a time, and so on.

**- Purpose:** The main purposes of stride are:

**1.Downsampling/Dimensionality Reduction:** A stride greater than 1 effectively downsamples the input feature map. This reduces the spatial dimensions, which helps in reducing computational cost and controlling overfitting by extracting a more abstract representation of the features.

**2.Increasing Receptive Field:** By skipping pixels, a larger stride allows subsequent layers to have a larger 'receptive field' (the area of the original input image that a pixel in the feature map corresponds to) without increasing the filter size.

**How they affect the output dimensions of feature maps:**

The output dimension of a feature map (let's say for a square input and filter for simplicity) can be calculated using the following formula:

Output Dimension = [(Input Dimension - Filter Dimension + 2 * Padding) / Stride] + 1

Let's break down the components:

**- Input Dimension (W):** The height or width of the input feature map.

**- Filter Dimension (F):** The height or width of the convolutional filter/kernel.

**- Padding (P):** The number of pixels added to each side of the input (if padding is symmetric).

**- Stride (S):** The number of pixels the filter shifts at each step.

**Impact of each:**

Padding (P): Increasing padding increases the output dimensions. If P=0, the output will be smaller than the input (for S=1, F>1). If P is chosen correctly (e.g., for 'same' padding), the output dimension can be kept equal to the input dimension.
Stride (S): Increasing stride decreases the output dimensions. A larger stride means fewer steps the filter takes across the input, resulting in a smaller output feature map. A stride of 1 maintains the highest spatial resolution (given appropriate padding), while a stride of 2 or more reduces it.

**Example:**

Consider an input image of size (10, 10), a filter of size (3, 3):

1.No Padding, Stride 1 (P=0, S=1): Output = [(10 - 3 + 2 * 0) / 1] + 1 = 7 + 1 = 8. Output size: (8, 8).

2.Padding 1, Stride 1 (P=1, S=1) (often results in 'same' size output for odd filter sizes): Output = [(10 - 3 + 2 * 1) / 1] + 1 = (7 + 2) + 1 = 9 + 1 = 10. Output size: (10, 10).

3.No Padding, Stride 2 (P=0, S=2): Output = [(10 - 3 + 2 * 0) / 2] + 1 = [7 / 2] + 1 = 3 + 1 = 4. Output size: (4, 4) (Note: result is floored for integer output).

In essence, padding helps control the shrinkage of feature maps, while stride controls the downsampling rate, both working together to manage the spatial dimensions and information flow through the CNN.

#**Question 3: Define receptive field in the context of CNNs. Why is it important for deep architectures?**

The **receptive field** in a Convolutional Neural Network (CNN) refers to the region in the input space (e.g., the original input image) that a particular neuron in a feature map is 'looking at' or influenced by. In simpler terms, it's the area of the input image that contributes to the computation of a single output feature.

Let's break it down:

**For the first convolutional layer:** The receptive field of a neuron in the first layer's output feature map is simply the size of the filter itself.

**For deeper layers:** As you move deeper into the network, the receptive field of neurons in subsequent layers becomes progressively larger. Each neuron in a deeper layer's feature map effectively aggregates information from a wider area of the previous layer's feature map, which in turn corresponds to an even wider area of the original input image.

**Why is it important for deep architectures?**

The increasing receptive field in deep CNNs is crucial for several reasons:

**1.Hierarchical Feature Extraction:** Early layers with small receptive fields detect local, low-level features (e.g., edges, corners). As the receptive field grows in deeper layers, neurons can combine these low-level features to recognize more complex, abstract, and global patterns (e.g., textures, parts of objects, or even entire objects). This hierarchical understanding is fundamental to how CNNs achieve impressive performance in tasks like object recognition.

**2.Contextual Understanding:** A larger receptive field allows the network to incorporate more contextual information when making predictions about a specific part of the image. For instance, to identify a 'face', a neuron needs to see not just an 'eye' or a 'nose' but also their spatial relationship within a broader facial structure.

**3.Efficiency:** Instead of using extremely large filters in the first layer to capture global patterns (which would be computationally expensive and require many parameters), deep architectures achieve large receptive fields by stacking multiple smaller filters. This strategy, combined with pooling layers and strides, is far more efficient in terms of parameters and computation.

**4.Spatial Invariance/Robustness:** By gradually increasing the receptive field, the network becomes more robust to small variations in the position or scale of features. A neuron in a deep layer might recognize a feature regardless of its exact pixel location, as it's informed by a wider input area.

In essence, the receptive field determines the scope of information available to a neuron at each level of abstraction within the network, allowing deep CNNs to learn rich and complex representations from raw input data.

#**Question 4: Discuss how filter size and stride influence the number of parameters in a CNN.**

Filter size and stride are critical hyperparameters in Convolutional Neural Networks (CNNs) that significantly influence the number of parameters in the network. Let's break down their impact:

**1. Filter Size (Kernel Size):**

**- Impact on Parameters:** The filter (or kernel) is a small matrix of weights that slides across the input. The number of parameters contributed by a single convolutional layer is primarily determined by:
(Filter Width * Filter Height * Number of Input Channels + 1 (for bias)) * Number of Filters

**- Direct Relationship:** As the filter size increases (e.g., from 3x3 to 5x5 to 7x7), the number of weights within each filter increases quadratically. Consequently, the total number of parameters in that convolutional layer increases proportionally.
For example, if you have 1 input channel and 32 output filters:

A 3x3 filter: (3 * 3 * 1 + 1) * 32 = 10 * 32 = 320 parameters.

A 5x5 filter: (5 * 5 * 1 + 1) * 32 = 26 * 32 = 832 parameters.

**- Trade-offs:** Larger filters can capture broader features but lead to more parameters, increasing computational cost and the risk of overfitting, especially with limited data. Smaller filters (like 3x3) are often preferred in modern architectures because stacking multiple small filters can achieve the same receptive field as one large filter but with fewer parameters and more non-linearities.

**2. Stride:**

**- Impact on Parameters:** Stride defines how many pixels the filter moves at each step. Critically, stride itself does NOT directly influence the number of parameters (weights and biases) within the convolutional filters. The weights and biases of a filter are fixed regardless of how many steps it takes across the input.

**- Indirect Influence:** While stride doesn't change the number of parameters per se, it affects the size of the output feature maps. A larger stride reduces the spatial dimensions of the output feature map. This reduction has an indirect impact on the number of parameters in subsequent layers (especially fully connected layers, if any, that process the flattened output of convolutional layers) because a smaller feature map means fewer inputs to those subsequent layers. For example, if you have a pooling layer or a fully connected layer following a convolutional layer, a smaller feature map (due to a larger stride in the preceding convolutional layer) will result in fewer parameters in those subsequent layers.

**- Computational Cost:** A larger stride also significantly reduces the number of computations required in a convolutional layer because the filter performs fewer operations on the input.

**Summary:**

**Filter size directly increases the number of parameters** within a convolutional layer. Larger filters mean more weights to learn.

**Stride does not directly affect the number of parameters** in a convolutional layer. However, it **indirectly affects the number of parameters in subsequent layers** by altering the spatial dimensions of the feature maps, and it directly impacts the computational cost.

#**Question 5: Compare and contrast different CNN-based architectures like LeNet,AlexNet, and VGG in terms of depth, filter sizes, and performance.**

Here's a comparison and contrast of LeNet, AlexNet, and VGG, focusing on their depth, filter sizes, and performance:

**1. LeNet-5 (1998)**

**Context:** One of the earliest and foundational CNNs, primarily designed for handwritten digit recognition (e.g., ZIP codes).

**Depth:** Relatively shallow, typically 7 layers (3 convolutional, 2 pooling, 2 fully connected).

**Filter Sizes:** Used small 5x5 convolutional filters.

**Key Innovations:** Introduced the core concepts of CNNs: convolutional layers, pooling layers, and fully connected layers. Employed shared weights and local receptive fields.
**Performance:** Achieved good performance on tasks like MNIST handwritten digit recognition, but limited by today's standards and not scalable to complex image datasets.

**2. AlexNet (2012)**

**Context:** A groundbreaking architecture that won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012, significantly outperforming traditional methods. This sparked the deep learning revolution.

**Depth:** Deeper than LeNet, typically 8 layers (5 convolutional, 3 fully connected).

**Filter Sizes:** Started with a large 11x11 filter in the first convolutional layer, followed by 5x5 and 3x3 filters in subsequent layers.

**Key Innovations:** Showcased the power of deeper networks, ReLU activation functions (addressing vanishing gradients), dropout (for regularization), data augmentation, and GPU training (using two GPUs). Its large filter size in the initial layer was a distinctive feature.

**Performance:** Achieved a top-5 error rate of 15.3% on ImageNet, a significant leap forward. It demonstrated that deeper CNNs trained on large datasets with GPUs could achieve state-of-the-art results on complex image recognition tasks.

**3. VGGNet (2014)**

**Context:** Developed by the Visual Geometry Group at Oxford, it was a runner-up in ILSVRC 2014 and further emphasized the importance of network depth.

**Depth:** Significantly deeper, with architectures like VGG-16 (16 layers) and VGG-19 (19 layers) being common. It focused on increasing depth by stacking many convolutional layers.

**Filter Sizes:** A defining characteristic of VGG is its exclusive use of very small 3x3 convolutional filters throughout the network. It showed that stacking multiple 3x3 filters could achieve the same receptive field as a larger filter (e.g., two 3x3 layers have an effective receptive field of 5x5, and three 3x3 layers have a 7x7 receptive field) but with fewer parameters and more non-linearities.

**Key Innovations:** Emphasized the idea of increasing depth with small 3x3 filters as a way to improve performance. Showed that simplicity and uniformity in architecture (repeated blocks of 3x3 convolutions and 2x2 max-pooling) could lead to excellent results.

**Performance:** Achieved a top-5 error rate of 7.3% on ImageNet, further pushing the boundaries of accuracy. However, its very deep structure and numerous filters led to a very large number of parameters (e.g., VGG-16 has 138 million parameters), making it computationally expensive and memory-intensive.

In Summary:

LeNet was a conceptual pioneer, establishing the foundation for CNNs, but was limited in scale and complexity.
AlexNet was the breakthrough, demonstrating that deeper CNNs, combined with modern techniques (like ReLU and GPUs), could solve complex image recognition problems effectively.
VGG refined the architectural approach by showing the power of extreme depth achieved through uniform small 3x3 filters, further boosting performance but at the cost of significantly increased parameters and computational demands. Its simplicity and effectiveness made it a popular baseline for subsequent research.

#**Question 6: Using keras, build and train a simple CNN model on the MNIST dataset from scratch. Include code for module creation, compilation, training, and evaluation.**


In [7]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

In [8]:
# Define the CNN model architecture
model = Sequential([
    Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D(pool_size=(2, 2)),
    Conv2D(64, kernel_size=(3, 3), activation='relu'),
    MaxPooling2D(pool_size=(2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax') # 10 classes for MNIST digits
])

model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [9]:
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [10]:
# Train the model
# Assuming x_train_reshaped and y_train_one_hot are already loaded and preprocessed
# from previous steps based on the kernel state.
history = model.fit(x_train_reshaped, y_train_one_hot, epochs=5, batch_size=128, validation_split=0.1)

Epoch 1/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 111ms/step - accuracy: 0.8554 - loss: 0.4990 - val_accuracy: 0.9752 - val_loss: 0.0822
Epoch 2/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m79s[0m 103ms/step - accuracy: 0.9798 - loss: 0.0643 - val_accuracy: 0.9870 - val_loss: 0.0491
Epoch 3/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m85s[0m 111ms/step - accuracy: 0.9881 - loss: 0.0384 - val_accuracy: 0.9890 - val_loss: 0.0413
Epoch 4/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m80s[0m 106ms/step - accuracy: 0.9911 - loss: 0.0297 - val_accuracy: 0.9900 - val_loss: 0.0362
Epoch 5/5
[1m422/422[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 109ms/step - accuracy: 0.9929 - loss: 0.0216 - val_accuracy: 0.9882 - val_loss: 0.0353


In [11]:
# Evaluate the model
# Assuming x_test_reshaped and y_test_one_hot are already loaded and preprocessed.
loss, accuracy = model.evaluate(x_test_reshaped, y_test_one_hot)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 10ms/step - accuracy: 0.9849 - loss: 0.0451
Test Loss: 0.0338
Test Accuracy: 0.9887


#**Question 7: Load and preprocess the CIFAR-10 dataset using Keras, and create a CNN model to classify RGB images. Show your preprocessing and architecture.**


In [12]:
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

In [13]:
# Load the CIFAR-10 dataset
(x_train_cifar, y_train_cifar), (x_test_cifar, y_test_cifar) = cifar10.load_data()

# Normalize pixel values to be between 0 and 1
x_train_cifar = x_train_cifar.astype('float32') / 255.0
x_test_cifar = x_test_cifar.astype('float32') / 255.0

# One-hot encode the labels
y_train_cifar = tf.keras.utils.to_categorical(y_train_cifar, 10)
y_test_cifar = tf.keras.utils.to_categorical(y_test_cifar, 10)

print(f"x_train_cifar shape: {x_train_cifar.shape}")
print(f"y_train_cifar shape: {y_train_cifar.shape}")
print(f"x_test_cifar shape: {x_test_cifar.shape}")
print(f"y_test_cifar shape: {y_test_cifar.shape}")

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step
x_train_cifar shape: (50000, 32, 32, 3)
y_train_cifar shape: (50000, 10)
x_test_cifar shape: (10000, 32, 32, 3)
y_test_cifar shape: (10000, 10)


In [14]:
# Define the CNN model architecture for CIFAR-10
cifar_model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax') # 10 classes for CIFAR-10
])

# Display the model summary
cifar_model.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [15]:
# Compile the CIFAR-10 model
cifar_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [16]:
# Train the CIFAR-10 model
# Note: Training CIFAR-10 can take longer than MNIST
history_cifar = cifar_model.fit(x_train_cifar, y_train_cifar, epochs=10, batch_size=64, validation_split=0.1)

Epoch 1/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m71s[0m 95ms/step - accuracy: 0.3293 - loss: 1.8081 - val_accuracy: 0.4724 - val_loss: 1.4810
Epoch 2/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m63s[0m 89ms/step - accuracy: 0.5416 - loss: 1.2794 - val_accuracy: 0.5848 - val_loss: 1.1523
Epoch 3/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m62s[0m 88ms/step - accuracy: 0.6151 - loss: 1.0955 - val_accuracy: 0.6212 - val_loss: 1.1102
Epoch 4/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m64s[0m 91ms/step - accuracy: 0.6449 - loss: 1.0029 - val_accuracy: 0.6702 - val_loss: 0.9546
Epoch 5/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m79s[0m 88ms/step - accuracy: 0.6787 - loss: 0.9092 - val_accuracy: 0.6652 - val_loss: 0.9737
Epoch 6/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m63s[0m 90ms/step - accuracy: 0.7024 - loss: 0.8440 - val_accuracy: 0.6930 - val_loss: 0.8819
Epoch 7/10
[1m7

In [17]:
# Evaluate the CIFAR-10 model
loss_cifar, accuracy_cifar = cifar_model.evaluate(x_test_cifar, y_test_cifar, verbose=0)
print(f"CIFAR-10 Test Loss: {loss_cifar:.4f}")
print(f"CIFAR-10 Test Accuracy: {accuracy_cifar:.4f}")

CIFAR-10 Test Loss: 0.8615
CIFAR-10 Test Accuracy: 0.7050


#**Question 8: Using PyTorch, write a script to define and train a CNN on the MNIST dataset. Include model definition, data loaders, training loop, and accuracy evaluation.**

In [18]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Check for GPU availability
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cpu


### 1. Define the CNN Model

In [19]:
class MNIST_CNN(nn.Module):
    def __init__(self):
        super(MNIST_CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.relu1 = nn.ReLU()
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.relu2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        # Fully connected layers
        # After two MaxPool2d layers (stride 2), the 28x28 image becomes 7x7
        # 64 channels * 7 * 7 = 3136 input features to the first dense layer
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.relu3 = nn.ReLU()
        self.fc2 = nn.Linear(128, 10) # 10 classes for MNIST

    def forward(self, x):
        x = self.pool1(self.relu1(self.conv1(x)))
        x = self.pool2(self.relu2(self.conv2(x)))
        x = x.view(-1, 64 * 7 * 7) # Flatten the tensor for the fully connected layer
        x = self.relu3(self.fc1(x))
        x = self.fc2(x)
        return x

model = MNIST_CNN().to(device)
print(model)

MNIST_CNN(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu1): ReLU()
  (pool1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (relu2): ReLU()
  (pool2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (fc1): Linear(in_features=3136, out_features=128, bias=True)
  (relu3): ReLU()
  (fc2): Linear(in_features=128, out_features=10, bias=True)
)


### 2. Load and Preprocess Data, Create DataLoaders

In [20]:
# Define transformations
transform = transforms.Compose([
    transforms.ToTensor(), # Convert PIL Image to Tensor
    transforms.Normalize((0.1307,), (0.3081,)) # Normalize with MNIST mean and std
])

# Load MNIST training and test datasets
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Create DataLoaders
train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=1000, shuffle=False)

print(f"Number of training samples: {len(train_dataset)}")
print(f"Number of test samples: {len(test_dataset)}")

100%|██████████| 9.91M/9.91M [00:00<00:00, 42.4MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 1.13MB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 10.4MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 6.01MB/s]

Number of training samples: 60000
Number of test samples: 10000





### 3. Define Loss Function and Optimizer

In [21]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

### 4. Training Loop

In [22]:
num_epochs = 5

print("Starting training...")
for epoch in range(num_epochs):
    model.train() # Set the model to training mode
    for batch_idx, (data, targets) in enumerate(train_loader):
        data, targets = data.to(device), targets.to(device)

        # Forward pass
        scores = model(data)
        loss = criterion(scores, targets)

        # Backward and optimize
        optimizer.zero_grad() # Clear gradients
        loss.backward() # Compute gradients
        optimizer.step() # Update weights

        if (batch_idx + 1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], Step [{batch_idx+1}/{len(train_loader)}], Loss: {loss.item():.4f}')
print("Training complete.")

Starting training...
Epoch [1/5], Step [100/938], Loss: 0.1306
Epoch [1/5], Step [200/938], Loss: 0.1117
Epoch [1/5], Step [300/938], Loss: 0.1619
Epoch [1/5], Step [400/938], Loss: 0.2401
Epoch [1/5], Step [500/938], Loss: 0.0426
Epoch [1/5], Step [600/938], Loss: 0.0612
Epoch [1/5], Step [700/938], Loss: 0.0625
Epoch [1/5], Step [800/938], Loss: 0.0482
Epoch [1/5], Step [900/938], Loss: 0.2216
Epoch [2/5], Step [100/938], Loss: 0.0455
Epoch [2/5], Step [200/938], Loss: 0.1120
Epoch [2/5], Step [300/938], Loss: 0.0456
Epoch [2/5], Step [400/938], Loss: 0.0109
Epoch [2/5], Step [500/938], Loss: 0.0507
Epoch [2/5], Step [600/938], Loss: 0.0022
Epoch [2/5], Step [700/938], Loss: 0.0184
Epoch [2/5], Step [800/938], Loss: 0.0696
Epoch [2/5], Step [900/938], Loss: 0.0073
Epoch [3/5], Step [100/938], Loss: 0.1190
Epoch [3/5], Step [200/938], Loss: 0.0178
Epoch [3/5], Step [300/938], Loss: 0.1009
Epoch [3/5], Step [400/938], Loss: 0.0050
Epoch [3/5], Step [500/938], Loss: 0.0143
Epoch [3/5], 

### 5. Accuracy Evaluation

In [23]:
def check_accuracy(loader, model):
    model.eval() # Set the model to evaluation mode
    num_correct = 0
    num_samples = 0

    with torch.no_grad(): # Disable gradient calculation during evaluation
        for x, y in loader:
            x, y = x.to(device), y.to(device)

            scores = model(x)
            _, predictions = scores.max(1) # Get the index of the max log-probability
            num_correct += (predictions == y).sum().item()
            num_samples += predictions.size(0)

    accuracy = (num_correct / num_samples) * 100
    print(f'Got {num_correct} / {num_samples} with accuracy {accuracy:.2f}%')
    return accuracy

print("Checking accuracy on training set:")
check_accuracy(train_loader, model)
print("Checking accuracy on test set:")
check_accuracy(test_loader, model)

Checking accuracy on training set:
Got 59788 / 60000 with accuracy 99.65%
Checking accuracy on test set:
Got 9902 / 10000 with accuracy 99.02%


99.02

#**Question 9: Given a custom image dataset stored in a local directory, write code using Keras ImageDataGenerator to preprocess and train a CNN model.**


In [24]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
import os
import shutil
import numpy as np
from PIL import Image

# Define base directory for the dummy dataset
base_dir = 'custom_image_dataset'

# Define class names
class_names = ['class_a', 'class_b']

# Create dummy dataset directory structure
for dataset_type in ['train', 'validation']:
    for class_name in class_names:
        os.makedirs(os.path.join(base_dir, dataset_type, class_name), exist_ok=True)

# Function to create dummy images
def create_dummy_image(path, size=(64, 64), color=(0, 0, 0)):
    img = Image.new('RGB', size, color)
    img.save(path)

# Create dummy images for training
for i in range(50):
    create_dummy_image(os.path.join(base_dir, 'train', 'class_a', f'img_{i}.png'), color=(255, 0, 0)) # Red for class A
    create_dummy_image(os.path.join(base_dir, 'train', 'class_b', f'img_{i}.png'), color=(0, 255, 0)) # Green for class B

# Create dummy images for validation
for i in range(10):
    create_dummy_image(os.path.join(base_dir, 'validation', 'class_a', f'val_img_{i}.png'), color=(200, 0, 0))
    create_dummy_image(os.path.join(base_dir, 'validation', 'class_b', f'val_img_{i}.png'), color=(0, 200, 0))

print("Dummy dataset created successfully at: ", os.path.abspath(base_dir))


Dummy dataset created successfully at:  /content/custom_image_dataset


### 1. Define Data Generators with Augmentation

In [25]:
# Define image dimensions and batch size
img_height, img_width = 64, 64
batch_size = 32

# Training Data Generator with Augmentation
train_datagen = ImageDataGenerator(
    rescale=1./255,             # Normalize pixel values to [0, 1]
    rotation_range=20,          # Randomly rotate images up to 20 degrees
    width_shift_range=0.2,      # Randomly shift images horizontally
    height_shift_range=0.2,     # Randomly shift images vertically
    shear_range=0.2,            # Apply shear transformation
    zoom_range=0.2,             # Apply random zoom
    horizontal_flip=True,       # Randomly flip images horizontally
    fill_mode='nearest'         # Strategy for filling newly created pixels
)

# Validation Data Generator (only rescaling)
validation_datagen = ImageDataGenerator(rescale=1./255)

# Flow training images in batches from 'train' directory
train_generator = train_datagen.flow_from_directory(
    os.path.join(base_dir, 'train'),
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical' # Use 'categorical' for one-hot encoded labels
)

# Flow validation images in batches from 'validation' directory
validation_generator = validation_datagen.flow_from_directory(
    os.path.join(base_dir, 'validation'),
    target_size=(img_height, img_width),
    batch_size=batch_size,
    class_mode='categorical'
)

print(f"Found {train_generator.samples} training images belonging to {train_generator.num_classes} classes.")
print(f"Found {validation_generator.samples} validation images belonging to {validation_generator.num_classes} classes.")


Found 100 images belonging to 2 classes.
Found 20 images belonging to 2 classes.
Found 100 training images belonging to 2 classes.
Found 20 validation images belonging to 2 classes.


### 2. Define the CNN Model Architecture

In [26]:
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(img_height, img_width, 3)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Conv2D(128, (3, 3), activation='relu'),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(128, activation='relu'),
    Dense(train_generator.num_classes, activation='softmax') # Output layer for number of classes
])

model.summary()


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


### 3. Compile the Model

In [27]:
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])


### 4. Train the Model

In [28]:
epochs = 10

history = model.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=validation_generator.samples // batch_size
)

print("Training complete.")


Epoch 1/10


  self._warn_if_super_not_called()


[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 307ms/step - accuracy: 0.3897 - loss: 0.5909 - val_accuracy: 1.0000 - val_loss: 0.1021
Epoch 2/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step - accuracy: 1.0000 - loss: 0.0802 - val_accuracy: 1.0000 - val_loss: 0.0250
Epoch 3/10




[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 177ms/step - accuracy: 1.0000 - loss: 0.0084 - val_accuracy: 1.0000 - val_loss: 4.3987e-05
Epoch 4/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 42ms/step - accuracy: 1.0000 - loss: 3.1590e-06 - val_accuracy: 1.0000 - val_loss: 3.7551e-06
Epoch 5/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 177ms/step - accuracy: 1.0000 - loss: 1.2277e-07 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 6/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 41ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 7/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 199ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 8/10
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 46ms/step - accuracy: 1.0000 - loss: 0.0000e+00 - val_accuracy: 1.0000 - val_loss: 0.0000e+00
Epoch 9/10

### 5. Evaluate the Model

In [29]:
loss, accuracy = model.evaluate(validation_generator, verbose=1)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")

# Optional: Clean up the dummy dataset directory
# shutil.rmtree(base_dir)
# print(f"Removed dummy dataset directory: {base_dir}")


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 91ms/step - accuracy: 1.0000 - loss: 0.0000e+00
Test Loss: 0.0000
Test Accuracy: 1.0000
