**Question 1:** What is the role of filters and feature maps in Convolutional Neural Networks (CNN)?

**Answer:**  
In Convolutional Neural Networks (CNNs), filters (also called kernels) and feature maps are the fundamental components responsible for automatic feature extraction from input data, particularly images. A filter is a small matrix of learnable parameters that slides over the input image and performs an element-wise multiplication followed by summation. Each filter is designed to detect a specific pattern such as edges, corners, textures, or more complex shapes.

When a filter convolves with the input, the result is a **feature map**. A feature map represents the spatial locations where a particular feature is detected. For example, an edge-detection filter will produce high values in the feature map where edges are present in the image. Multiple filters are used in each convolutional layer, resulting in multiple feature maps, each capturing a different type of feature.

The hierarchical nature of CNNs allows early layers to learn low-level features like edges and gradients, while deeper layers learn high-level semantic features such as objects or shapes. This automatic feature learning eliminates the need for manual feature engineering, which is a major advantage of CNNs over traditional machine learning approaches.

Additionally, filters are shared across the entire image, which drastically reduces the number of parameters compared to fully connected networks. This weight sharing makes CNNs computationally efficient and helps them generalize better. Feature maps preserve spatial relationships, enabling CNNs to understand the structure and composition of images effectively. Overall, filters and feature maps are the core mechanisms that enable CNNs to learn rich and meaningful representations from visual data.

---

**Question 2:**  Explain the concepts of padding and stride in CNNs(Convolutional Neural Network). How do they affect the output dimensions of feature maps?


**Answer:**  
Padding and stride are two critical hyperparameters in convolutional neural networks that control how convolution operations are applied to input data. **Padding** refers to adding extra pixels (usually zeros) around the border of the input image before applying the convolution. The main purpose of padding is to control the spatial size of the output feature map and to preserve edge information. Without padding, convolution reduces the size of the image, which may result in loss of important boundary features. Common padding types include *valid padding* (no padding) and *same padding* (output size equals input size).

**Stride** defines how many pixels the filter moves at each step while sliding across the input. A stride of 1 means the filter moves one pixel at a time, resulting in a larger output feature map. A larger stride reduces the spatial dimensions of the output and acts as a form of downsampling.

The output size of a convolution operation is determined by the formula:  
\[ Output = \frac{(N - F + 2P)}{S} + 1 \]  
where N is input size, F is filter size, P is padding, and S is stride.

Padding increases output dimensions, while stride decreases them. Together, they control computational cost, feature resolution, and model performance. Proper selection of padding and stride ensures efficient learning without losing critical spatial information.

---

**Question 3:** Define receptive field in the context of CNNs. Why is it important for deep architectures?

**Answer:**  
The receptive field in a Convolutional Neural Network refers to the region of the input image that influences the activation of a particular neuron in a feature map. In simpler terms, it defines how much of the original image a neuron can “see.” In the first convolutional layer, the receptive field is equal to the filter size. However, as we move deeper into the network, the receptive field grows due to stacking of convolutional and pooling layers.

The importance of the receptive field lies in its ability to capture contextual information. Small receptive fields focus on local features such as edges and textures, while larger receptive fields enable neurons to capture global structures and object-level information. Deep architectures allow CNNs to gradually expand the receptive field without increasing filter size, which improves efficiency.

A properly designed receptive field ensures that neurons in deeper layers can integrate information from larger portions of the image, which is crucial for tasks like object detection and medical image analysis. If the receptive field is too small, the network may fail to capture long-range dependencies. If too large too early, it may lose fine details. Thus, receptive field design directly impacts learning quality and performance.

---

**Question 4:**  Discuss how filter size and stride influence the number of parameters in a CNN.

**Answer:**  
The number of parameters in a CNN is primarily determined by the filter size, number of filters, and input depth. A convolutional filter has parameters equal to (filter height × filter width × input channels). Increasing the filter size directly increases the number of parameters, leading to higher computational cost and risk of overfitting.

Stride, on the other hand, does not directly affect the number of parameters but influences the size of the output feature map. Larger strides reduce spatial dimensions, which indirectly reduces the number of activations and computations in subsequent layers.

Using smaller filters (e.g., 3×3) stacked in deeper networks is often preferred over large filters (e.g., 7×7) because it reduces parameters while increasing non-linearity and receptive field. This design principle is widely used in modern architectures like VGG.

Thus, careful selection of filter size balances expressiveness and efficiency, while stride controls spatial resolution and computational load.

---

**Question 5:** Compare and contrast different CNN-based architectures like LeNet, AlexNet, and VGG in terms of depth, filter sizes, and performance.

**Answer:**  
LeNet is one of the earliest CNN architectures, designed for handwritten digit recognition. It is shallow, with only a few convolutional layers and small filter sizes. It performs well on simple datasets like MNIST but lacks capacity for complex tasks.

AlexNet marked a breakthrough in deep learning by winning the ImageNet competition. It introduced deeper architecture, ReLU activation, dropout, and GPU training. AlexNet uses larger filters (e.g., 11×11) in early layers and has significantly more parameters than LeNet.

VGG networks further deepened CNNs by using uniform 3×3 filters throughout the architecture. This design increased depth while keeping parameters manageable. VGG achieved better performance and demonstrated that depth is crucial for representation learning.

In summary, LeNet is simple and lightweight, AlexNet is deeper and more powerful, and VGG emphasizes depth and small filters for high performance.

---

**Question 6:** **Using keras, build and train a simple CNN model on the MNIST dataset from scratch. Include code for module creation, compilation, training, and evaluation.**

In [1]:
import tensorflow as tf
from tensorflow.keras import layers, models

# Load data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1,28,28,1)/255.0
x_test = x_test.reshape(-1,28,28,1)/255.0

model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, validation_split=0.1)
model.evaluate(x_test, y_test)


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m44s[0m 23ms/step - accuracy: 0.9578 - loss: 0.1383 - val_accuracy: 0.9858 - val_loss: 0.0500
Epoch 2/5
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 22ms/step - accuracy: 0.9857 - loss: 0.0459 - val_accuracy: 0.9902 - val_loss: 0.0370
Epoch 3/5
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 23ms/step - accuracy: 0.9899 - loss: 0.0309 - val_accuracy: 0.9908 - val_loss: 0.0311
Epoch 4/5
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 22ms/step - accuracy: 0.9930 - loss: 0.0218 - val_accuracy: 0.9878 - val_loss: 0.0426
Epoch 5/5
[1m1688/1688[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m37s[0m 22ms/step - accuracy: 0.9948 - loss: 0.0168 - val_accuracy: 0.9918 - val_loss: 0.0342
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 9ms/step - accuracy: 0.9895 - loss: 0.0348


[0.03478849306702614, 0.9894999861717224]

---

**Question 7:**  **Load and preprocess the CIFAR-10 dataset using Keras, and create a CNN model to classify RGB images. Show your preprocessing and architecture.**

In [4]:
from tensorflow.keras import models, layers
from tensorflow.keras.datasets import cifar10

(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train, x_test = x_train/255.0, x_test/255.0

model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(32,32,3)),
    layers.MaxPooling2D((2,2)),
    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D((2,2)),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5, validation_split=0.1)
model.evaluate(x_test, y_test)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 25ms/step - accuracy: 0.4542 - loss: 1.5153 - val_accuracy: 0.5382 - val_loss: 1.2831
Epoch 2/5
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m40s[0m 24ms/step - accuracy: 0.5918 - loss: 1.1634 - val_accuracy: 0.6208 - val_loss: 1.0938
Epoch 3/5
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 24ms/step - accuracy: 0.6394 - loss: 1.0397 - val_accuracy: 0.6422 - val_loss: 1.0516
Epoch 4/5
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m41s[0m 23ms/step - accuracy: 0.6672 - loss: 0.9583 - val_accuracy: 0.6562 - val_loss: 1.0052
Epoch 5/5
[1m1407/1407[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m34s[0m 24ms/step - accuracy: 0.6886 - loss: 0.8935 - val_accuracy: 0.6740 - val_loss: 0.9500
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 9ms/step - accuracy: 0.6653 - loss: 0.9713


[0.9713466763496399, 0.6653000116348267]

---

**Question 8:** **Using PyTorch, write a script to define and train a CNN on the MNIST dataset. Include model definition, data loaders, training loop, and accuracy evaluation**

In [1]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

transform = transforms.Compose([transforms.ToTensor()])
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('.', train=True, download=True, transform=transform),
    batch_size=64, shuffle=True)

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.fc1 = nn.Linear(64*5*5, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.max_pool2d(x, 2)
        x = torch.relu(self.conv2(x))
        x = torch.max_pool2d(x, 2)
        x = x.view(-1, 64*5*5)
        x = torch.relu(self.fc1(x))
        return self.fc2(x)

model = CNN()
optimizer = optim.Adam(model.parameters())
criterion = nn.CrossEntropyLoss()

for epoch in range(1):
    for data, target in train_loader:
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()


100.0%
100.0%
100.0%
100.0%


---

**Question 9:** **Given a custom image dataset stored in a local directory, write code using Keras ImageDataGenerator to preprocess and train a CNN model.**

In [4]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2
)

train_gen = datagen.flow_from_directory(
    "data",
    target_size=(224,224),
    batch_size=32,
    class_mode="binary",
    subset="training"
)

val_gen = datagen.flow_from_directory(
    "data",
    target_size=(224,224),
    batch_size=32,
    class_mode="binary",
    subset="validation"
)

Found 14055 images belonging to 1 classes.
Found 3513 images belonging to 1 classes.


---

**Question 10:** **You are working on a web application for a medical imaging startup. Your task is to build and deploy a CNN model that classifies chest X-ray images into “Normal” and “Pneumonia” categories. Describe your end-to-end approach–from data preparation and model training to deploying the model as a web app using Streamlit**

### problem statement:

Chest X-Ray Pneumonia Detection Using CNN

Pneumonia is a serious lung infection that can be life-threatening if not detected early. Chest X-ray imaging is one of the most common diagnostic tools used by radiologists. However, manual analysis is time-consuming and error-prone.

The objective of this project is to build an end-to-end deep learning system that:

 - Classifies chest X-ray images into Normal and Pneumonia
 - Uses a Convolutional Neural Network (CNN)
 - Deploys the trained model as a web application using Streamlit

This solution covers data preparation, model training, evaluation, and deployment.

**Dataset Organization**

The dataset is organized into training, validation, and testing folders. Each folder contains two subfolders corresponding to class labels.

In [12]:
import os

base_dir = "small_data"

folders = [
    "train/NORMAL",
    "train/PNEUMONIA",
    "val/NORMAL",
    "val/PNEUMONIA"
]

for folder in folders:
    os.makedirs(os.path.join(base_dir, folder), exist_ok=True)

**Why Data Preprocessing is Important**

Medical image datasets are often small and imbalanced. To improve generalization and avoid overfitting:

 - Images are resized to a fixed size
 - Pixel values are normalized
 - Data augmentation is applied

Augmentation simulates real-world variations such as rotation and flipping.

In [18]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=15,
    zoom_range=0.1,
    horizontal_flip=True
)

val_datagen = ImageDataGenerator(rescale=1./255)

train_data = train_datagen.flow_from_directory(
    "small_data/train",
    target_size=(224,224),
    batch_size=16,
    class_mode="binary"
)

val_data = val_datagen.flow_from_directory(
    "small_data/val",
    target_size=(224,224),
    batch_size=16,
    class_mode="binary"
)

print("Train samples:", train_data.samples)
print("Validation samples:", val_data.samples)


Found 390 images belonging to 2 classes.
Found 16 images belonging to 2 classes.
Train samples: 390
Validation samples: 16


**Compilation Details**

- Optimizer: Adam (fast convergence)
- Loss Function: Binary Cross-Entropy (binary classification)
- Metric: Accuracy

In [19]:
# compiling the model
model.compile(
    optimizer="adam",
    loss="binary_crossentropy",
    metrics=["accuracy"]
)

In [20]:
# model training
history = model.fit(
    train_data,
    validation_data=val_data,
    epochs=10
)

Epoch 1/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m82s[0m 2s/step - accuracy: 0.8051 - loss: 0.4486 - val_accuracy: 0.8125 - val_loss: 0.2974
Epoch 2/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m33s[0m 1s/step - accuracy: 0.9026 - loss: 0.2533 - val_accuracy: 0.9375 - val_loss: 0.3244
Epoch 3/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m38s[0m 1s/step - accuracy: 0.9282 - loss: 0.2002 - val_accuracy: 0.8750 - val_loss: 0.3337
Epoch 4/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m42s[0m 1s/step - accuracy: 0.9128 - loss: 0.2215 - val_accuracy: 0.8125 - val_loss: 0.4741
Epoch 5/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m45s[0m 1s/step - accuracy: 0.9487 - loss: 0.1457 - val_accuracy: 0.8125 - val_loss: 0.4063
Epoch 6/10
[1m25/25[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 1s/step - accuracy: 0.9513 - loss: 0.1465 - val_accuracy: 0.7500 - val_loss: 0.3627
Epoch 7/10
[1m25/25[0m [32m━━━━━━━━━━

In [21]:
# saving the model
model.save("pneumonia_model.h5")



In [1]:
# creating the streamlit app file
# app.py

**Why Transfer Learning?**

Training a CNN from scratch requires a large dataset. Medical datasets are usually limited.
Hence, transfer learning is used.

We use MobileNetV2, a lightweight and efficient CNN pre-trained on ImageNet.

Benefits:

 - Faster convergence
 - Better accuracy
 - Reduced computational cost

---