# CNN Architecture

Question 1: What is the role of filters and feature maps in Convolutional Neural
Network (CNN)?

Answer : Role of Filters and Feature Maps in Convolutional Neural Networks (CNN)

In a Convolutional Neural Network (CNN), filters and feature maps are fundamental components used for automatic feature extraction from input data, especially images. Filters (or kernels) are small learnable matrices that move across the input image and perform convolution operations. Their primary role is to detect important local features such as edges, corners, textures, and shapes. Each filter is designed to learn a specific pattern present in the image.

When a filter is applied to the input image, it generates an output called a feature map. A feature map represents the spatial distribution of the detected feature, showing where and how strongly a particular feature appears in the image. Higher activation values in the feature map indicate a stronger presence of the feature, while lower values indicate weaker or no activation.

By using multiple filters, CNNs generate multiple feature maps that capture different types of features at various levels of complexity. This process enables CNNs to learn hierarchical representations, where early layers detect simple features and deeper layers capture more complex and abstract patterns, improving the performance of tasks such as image classification and object recognition.

Question 2: Explain the concepts of padding and stride in CNNs(Convolutional Neural Network). How do they affect the output dimensions of feature maps?

Answer : Padding and Stride in Convolutional Neural Networks (CNNs)

In Convolutional Neural Networks (CNNs), padding and stride are important hyperparameters that control how convolution operations are applied to the input and how the size of the output feature maps is determined.

Padding refers to the addition of extra pixels, usually with zero values, around the border of the input image before applying the convolution filter. The main purpose of padding is to preserve the spatial dimensions of the input and to ensure that edge information is not lost during convolution. Common types of padding include valid padding (no padding) and same padding (padding added to keep the output size the same as the input size).

Stride defines the number of pixels by which the filter moves across the input image during the convolution operation. A stride of 1 means the filter moves one pixel at a time, resulting in a larger output feature map. A larger stride value reduces the size of the output feature map by skipping pixels, which also decreases computational complexity.

Both padding and stride directly affect the dimensions of the output feature map. Padding increases or preserves the spatial size, while stride controls the rate of downsampling. The output size can be calculated using the formula:

Output Size
=
(
ùëÅ
‚àí
ùêπ
+
2
ùëÉ
)
ùëÜ
+
1
Output Size=
S
(N‚àíF+2P)
	‚Äã

+1

where
ùëÅ
N is the input size,
ùêπ
F is the filter size,
ùëÉ
P is the padding, and
ùëÜ
S is the stride.

Thus, padding and stride play a crucial role in controlling feature map dimensions and preserving important information in CNNs.


Question 3: Define receptive field in the context of CNNs. Why is it important for deep architectures?

Answer : Receptive Field in Convolutional Neural Networks (CNNs)

In the context of Convolutional Neural Networks (CNNs), the receptive field refers to the specific region of the input image that influences the activation of a particular neuron in a feature map. In other words, it is the area of the input that a neuron ‚Äúsees‚Äù or responds to when producing its output.

The receptive field is important for deep CNN architectures because it determines how much contextual information a neuron can capture. In early layers, neurons have small receptive fields and focus on local features such as edges and textures. As the network becomes deeper, the receptive field increases, allowing neurons to capture larger and more complex patterns, such as shapes, objects, or entire regions of an image.

A larger receptive field enables deep architectures to learn hierarchical feature representations by combining simple features from earlier layers into more abstract and meaningful features in deeper layers. This capability is essential for tasks like image classification, object detection, and scene understanding, where global context and spatial relationships play a critical role.


Question 4: Discuss how filter size and stride influence the number of parameters in a CNN.

Answer : Influence of Filter Size and Stride on the Number of Parameters in CNN

In a Convolutional Neural Network (CNN), the number of parameters is mainly determined by the filter size, the number of filters, and the depth of the input, while stride affects the output feature map size and computational cost but does not directly change the number of parameters.

Filter size directly influences the number of parameters in a CNN. A larger filter contains more weights, which increases the total number of trainable parameters. For example, a
3
√ó
3
3√ó3 filter has fewer parameters than a
5
√ó
5
5√ó5 filter. The total number of parameters in a convolutional layer is calculated as:

(
Filter height
√ó
Filter width
√ó
Input channels
+
1
)
√ó
Number of filters
(Filter height√óFilter width√óInput channels+1)√óNumber of filters

where the extra 1 represents the bias term. Therefore, increasing the filter size increases the model‚Äôs capacity as well as the risk of overfitting.

Stride, on the other hand, controls how the filter moves across the input image. Changing the stride does not alter the number of parameters because the same filter weights are reused at every position. However, a larger stride reduces the size of the output feature maps, which lowers the number of activations and reduces computational complexity in subsequent layers.

In summary, filter size directly affects the number of parameters, while stride affects the spatial dimensions of feature maps and computational efficiency but not the number of parameters.


Question 5: Compare and contrast different CNN-based architectures like LeNet,
AlexNet, and VGG in terms of depth, filter sizes, and performance.

Answer : Comparison of CNN Architectures ‚Äì LeNet, AlexNet, and VGG

LeNet, AlexNet, and VGG are landmark CNN architectures that show the evolution of deep learning models in terms of depth, filter sizes, and performance.

LeNet is one of the earliest CNN architectures, designed primarily for handwritten digit recognition. It is a shallow network with a small number of layers and uses relatively large filter sizes in the initial layers. LeNet has a low number of parameters and computational requirements, making it suitable for simple tasks, but its performance is limited on complex image datasets.

AlexNet marked a major breakthrough in deep learning by winning the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. It is deeper than LeNet and uses multiple convolutional layers with a combination of large and smaller filter sizes. AlexNet introduced important concepts such as ReLU activation, dropout, and GPU-based training, which significantly improved performance on large-scale image classification tasks.

VGG further increased network depth by using a very deep architecture with many convolutional layers. Unlike AlexNet, VGG consistently uses small
3
√ó
3
3√ó3 filters throughout the network. This design increases the number of layers and parameters, allowing the model to learn more complex and hierarchical features. VGG achieved strong performance on ImageNet but at the cost of high computational and memory requirements.

In comparison, LeNet is shallow with limited performance, AlexNet offers improved depth and performance with moderate complexity, and VGG is very deep with superior feature representation but high computational cost.


Question 6: Using keras, build and train a simple CNN model on the MNIST dataset
from scratch. Include code for module creation, compilation, training, and evaluation.

Answer :

# Import required libraries

import tensorflow as tf

from tensorflow.keras import layers, models

from tensorflow.keras.datasets import mnist

from tensorflow.keras.utils import to_categorical

# Load MNIST dataset

(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess the data

x_train = x_train.reshape(60000, 28, 28, 1) / 255.0

x_test = x_test.reshape(10000, 28, 28, 1) / 255.0

y_train = to_categorical(y_train, 10)

y_test = to_categorical(y_test, 10)

# Build the CNN model

model = models.Sequential()

model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))

model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(64, (3, 3), activation='relu'))

model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Flatten())

model.add(layers.Dense(128, activation='relu'))

model.add(layers.Dense(10, activation='softmax'))

# Compile the model

model.compile(optimizer='adam',

              loss='categorical_crossentropy',

              metrics=['accuracy'])

# Train the model

history = model.fit(x_train, y_train,

                    epochs=5,

                    batch_size=64,

                    validation_split=0.1)

# Evaluate the model

test_loss, test_accuracy = model.evaluate(x_test, y_test)

print("Test Loss:", test_loss)

print("Test Accuracy:", test_accuracy)

- Sample Output:

Epoch 1/5

843/843 ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 10s 11ms/step - accuracy: 0.9201 - loss: 0.2604 -
 val_accuracy: 0.9783 - val_loss: 0.0725
Epoch 2/5
843/843 ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 9s 10ms/step - accuracy: 0.9831 - loss: 0.0546 - val_accuracy: 0.9856 - val_loss: 0.0468
Epoch 3/5
843/843 ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 9s 10ms/step - accuracy: 0.9892 - loss: 0.0332 - val_accuracy: 0.9891 - val_loss: 0.0382
Epoch 4/5
843/843 ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 9s 10ms/step - accuracy: 0.9923 - loss: 0.0234 - val_accuracy: 0.9904 - val_loss: 0.0351
Epoch 5/5
843/843 ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 9s 10ms/step - accuracy: 0.9941 - loss: 0.0178 - val_accuracy: 0.9912 - val_loss: 0.0329

313/313 ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 1s 3ms/step - accuracy: 0.9910 - loss: 0.0307
Test Loss: 0.0307
Test Accuracy: 0.9910


Question 7: Load and preprocess the CIFAR-10 dataset using Keras, and create a
CNN model to classify RGB images. Show your preprocessing and architecture.

Answer : # Import required libraries

import tensorflow as tf

from tensorflow.keras import layers, models

from tensorflow.keras.datasets import cifar10

from tensorflow.keras.utils import to_categorical

# Load CIFAR-10 dataset

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Preprocess the data

# Normalize pixel values to range [0, 1]

x_train = x_train.astype('float32') / 255.0

x_test = x_test.astype('float32') / 255.0

# One-hot encode class labels

y_train = to_categorical(y_train, 10)

y_test = to_categorical(y_test, 10)

# Build CNN model for RGB images

model = models.Sequential()

model.add(layers.Conv2D(32, (3, 3), activation='relu',

                        input_shape=(32, 32, 3)))

model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(64, (3, 3), activation='relu'))

model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(128, (3, 3), activation='relu'))

model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Flatten())

model.add(layers.Dense(128, activation='relu'))

model.add(layers.Dense(10, activation='softmax'))

# Compile the model

model.compile(optimizer='adam',

              loss='categorical_crossentropy',

              metrics=['accuracy'])

# Display model architecture

model.summary()

# Train the model

model.fit(x_train, y_train,

          epochs=10,

          batch_size=64,

          validation_split=0.1)

# Evaluate the model

test_loss, test_accuracy = model.evaluate(x_test, y_test)

print("Test Loss:", test_loss)

print("Test Accuracy:", test_accuracy)

Model: "sequential"

Expected Output (Sample):
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d (Conv2D)              (None, 30, 30, 32)        896
max_pooling2d (MaxPooling2D) (None, 15, 15, 32)        0
conv2d_1 (Conv2D)            (None, 13, 13, 64)        18496
max_pooling2d_1 (MaxPooling2D) (None, 6, 6, 64)        0
conv2d_2 (Conv2D)            (None, 4, 4, 128)         73856
max_pooling2d_2 (MaxPooling2D) (None, 2, 2, 128)       0
flatten (Flatten)            (None, 512)               0
dense (Dense)                (None, 128)               65664
dense_1 (Dense)              (None, 10)                1290
=================================================================


Question 8: Using PyTorch, write a script to define and train a CNN on the MNIST
dataset. Include model definition, data loaders, training loop, and accuracy evaluation.

Answer : # Import required libraries

import torch

import torch.nn as nn

import torch.optim as optim

from torchvision import datasets, transforms

from torch.utils.data import DataLoader

# Device configuration

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Data preprocessing and loading

transform = transforms.Compose([

    transforms.ToTensor(),

    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = datasets.MNIST(root='./data',

                               train=True,

                               transform=transform,

                               download=True)

test_dataset = datasets.MNIST(root='./data',

                              train=False,

                              transform=transform,

                              download=True)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Define CNN model

class CNN(nn.Module):

    def __init__(self):

        super(CNN, self).__init__()

        self.conv1 = nn.Conv2d(1, 32, kernel_size=3)

        self.conv2 = nn.Conv2d(32, 64, kernel_size=3)

        self.pool = nn.MaxPool2d(2, 2)

        self.fc1 = nn.Linear(64 * 5 * 5, 128)

        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):

        x = torch.relu(self.conv1(x))

        x = self.pool(torch.relu(self.conv2(x)))

        x = x.view(x.size(0), -1)

        x = torch.relu(self.fc1(x))

        x = self.fc2(x)

        return x

# Initialize model, loss, and optimizer

model = CNN().to(device)

criterion = nn.CrossEntropyLoss()

optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop

epochs = 5

for epoch in range(epochs):

    model.train()

    running_loss = 0.0

    for images, labels in train_loader:

        images, labels = images.to(device), labels.to(device)

        optimizer.zero_grad()

        outputs = model(images)

        loss = criterion(outputs, labels)

        loss.backward()

        optimizer.step()

        running_loss += loss.item()

    print(f"Epoch [{epoch+1}/{epochs}], Loss: {running_loss/len(train_loader):.4f}")

# Model evaluation

model.eval()

correct = 0

total = 0

with torch.no_grad():

    for images, labels in test_loader:

        images, labels = images.to(device), labels.to(device)

        outputs = model(images)

        _, predicted = torch.max(outputs.data, 1)

        total += labels.size(0)

        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total

print(f"Test Accuracy: {accuracy:.2f}%")


Sample Output:

Epoch [1/5], Loss: 0.1856

Epoch [2/5], Loss: 0.0583

Epoch [3/5], Loss: 0.0412

Epoch [4/5], Loss: 0.0316

Epoch [5/5], Loss: 0.0254

Test Accuracy: 98.9%


Question 9: Given a custom image dataset stored in a local directory, write code using Keras ImageDataGenerator to preprocess and train a CNN model.

Answer : # Import required libraries

import tensorflow as tf

from tensorflow.keras.preprocessing.image import ImageDataGenerator

from tensorflow.keras import layers, models

 Define dataset directory structure

 dataset/

 ‚îú‚îÄ‚îÄ train/

‚îÇ   ‚îú‚îÄ‚îÄ class1/

 ‚îÇ   ‚îú‚îÄ‚îÄ class2/

 ‚îî‚îÄ‚îÄ validation/

     ‚îú‚îÄ‚îÄ class1/
     
    ‚îú‚îÄ‚îÄ class2/

train_dir = "dataset/train"

validation_dir = "dataset/validation"

# ImageDataGenerator for preprocessing and augmentation

train_datagen = ImageDataGenerator(

    rescale=1./255,

    rotation_range=20,

    width_shift_range=0.2,

    height_shift_range=0.2,

    zoom_range=0.2,

    horizontal_flip=True
)

validation_datagen = ImageDataGenerator(rescale=1./255)

# Load images from directory

train_generator = train_datagen.flow_from_directory(

    train_dir,

    target_size=(128, 128),

    batch_size=32,

    class_mode='categorical'
)

validation_generator = validation_datagen.flow_from_directory(

    validation_dir,

    target_size=(128, 128),

    batch_size=32,

    class_mode='categorical'
)

# Build CNN model

model = models.Sequential()

model.add(layers.Conv2D(32, (3, 3), activation='relu',

                        input_shape=(128, 128, 3)))

model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(64, (3, 3), activation='relu'))

model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Conv2D(128, (3, 3), activation='relu'))

model.add(layers.MaxPooling2D((2, 2)))

model.add(layers.Flatten())

model.add(layers.Dense(128, activation='relu'))

model.add(layers.Dense(train_generator.num_classes, activation='softmax'))

# Compile the model

model.compile(

    optimizer='adam',

    loss='categorical_crossentropy',

    metrics=['accuracy']
)

# Display model architecture

model.summary()

# Train the model

history = model.fit(

    train_generator,

    epochs=10,

    validation_data=validation_generator
)

# Evaluate the model

loss, accuracy = model.evaluate(validation_generator)

print("Validation Loss:", loss)

print("Validation Accuracy:", accuracy)


Sample Output:

Found 800 images belonging to 2 classes.
Found 200 images belonging to 2 classes.
Epoch 1/10
25/25 ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ accuracy: 0.78 - loss: 0.52 - val_accuracy: 0.82 - val_loss: 0.45
...
Epoch 10/10
accuracy: 0.93 - loss: 0.18 - val_accuracy: 0.90 - val_loss: 0.25

Validation Loss: 0.25
Validation Accuracy: 0.90

Question 10: You are working on a web application for a medical imaging startup. Your task is to build and deploy a CNN model that classifies chest X-ray images into ‚ÄúNormal‚Äùand ‚ÄúPneumonia‚Äù categories. Describe your end-to-end approach‚Äìfrom data preparation and model training to deploying the model as a web app using Streamlit.

Answer: End-to-End Approach for Chest X-ray Classification Using CNN and Streamlit

To build and deploy a CNN model that classifies chest X-ray images into Normal and Pneumonia categories, an end-to-end pipeline is followed, covering data preparation, model training, evaluation, and deployment as a web application.

Data Preparation:

The first step involves collecting a labeled chest X-ray dataset and organizing it into training, validation, and testing directories. Since X-ray images vary in size and intensity, images are resized to a fixed resolution and normalized to scale pixel values between 0 and 1. Data augmentation techniques such as rotation, flipping, and zooming are applied to improve model generalization and reduce overfitting.

Model Training:

A Convolutional Neural Network (CNN) is designed or a pre-trained model such as ResNet or VGG is fine-tuned using transfer learning. The model learns discriminative features from X-ray images through convolution, pooling, and fully connected layers. The network is compiled with an appropriate optimizer (e.g., Adam) and a binary classification loss function. Model performance is evaluated using metrics such as accuracy, precision, recall, and confusion matrix analysis.

Model Evaluation and Saving:

After training, the model is validated on unseen test data to ensure reliable performance. The trained model is then saved in a format such as HDF5 or SavedModel for deployment.

Web Application Deployment Using Streamlit:

For deployment, a Streamlit-based web application is developed. The saved CNN model is loaded into the app, and users can upload chest X-ray images through the interface. The uploaded image is preprocessed in the same way as the training data and passed to the model for prediction. The app displays the predicted class (Normal or Pneumonia) along with the confidence score.

Conclusion:

This end-to-end approach ensures a seamless workflow from data preparation and CNN training to real-time deployment, enabling healthcare professionals to quickly and efficiently analyze chest X-ray images through a user-friendly web application.









