# **CNN Architecture**

**Question 1: What is the role of filters and feature maps in Convolutional Neural
Network (CNN)?**


**Answer:**
In a CNN, filters (also called kernels) slide over the image and detect specific patterns like edges, textures, or colors. Each filter learns a different pattern during training.

The output created by applying a filter is called a feature map. It shows where that specific pattern was found in the image. As you go deeper into the network, feature maps capture more complex patterns like shapes or objects.

So, filters detect patterns, and feature maps represent those patterns in the image.

**Question 2:** Explain the concepts of padding and stride in CNNs(Convolutional Neural
Network). How do they affect the output dimensions of feature maps?

**Answer:**
Padding means adding extra pixels (usually zeros) around the image before applying a filter.
Its purpose is to control how much the image shrinks after convolution.


 - With padding, the output feature map stays larger.


 - Without padding, the feature map becomes smaller.


 - Stride is how many pixels the filter moves at a time.


 - A stride of 1 moves the filter one pixel at a time, producing a larger output.


 - A stride greater than 1 skips pixels, producing a smaller output.


 - Effect:

        Padding increases output size, while a larger stride decreases output size.

**Question 3: Define receptive field in the context of CNNs. Why is it important for deep
architectures?**


**Answer:**
The receptive field is the region of the input image that a particular neuron in a CNN "looks at" or responds to.

In shallow layers, the receptive field is small because neurons only see tiny parts of the image. As the network becomes deeper, the receptive field grows because each layer builds on the previous layer’s outputs.

Importance:
A larger receptive field allows deep CNNs to understand bigger and more complex patterns, such as shapes, objects, or context in the image. This helps the network make more accurate predictions.

**Question 4: Discuss how filter size and stride influence the number of parameters in a
CNN.**


**Answer:**
The filter size affects the number of parameters because each filter has its own weights.
A larger filter (for example 5×5 instead of 3×3) means more weights, so the CNN has more parameters.

The stride does not change the number of parameters in the filters.
It only controls how the filter moves across the image.
Changing stride affects the output size, not the number of learnable weights.

So, filter size increases parameters, while stride does not affect parameters.

**Question 5: Compare and contrast different CNN-based architectures like LeNet,
AlexNet, and VGG in terms of depth, filter sizes, and performance.**


**Answer:**
 - **LeNet** is one of the earliest CNNs. It is shallow, with only a few convolutional layers. It uses small filters and works well for simple tasks like digit recognition. Its performance is basic compared to modern models.

 - **AlexNet**is deeper than LeNet and introduced larger filters in the early layers. It uses ReLU activation and dropout. It performs much better than LeNet and was a major breakthrough for large-scale image classification.

 - **VGG** is much deeper than both LeNet and AlexNet. It uses many layers with small 3×3 filters. Its depth gives very strong performance and high accuracy, but it requires more computation and memory.

In summary: LeNet is shallow, AlexNet is deeper with larger filters, and VGG is the deepest with small filters but very high performance.

**Question 6: Using keras, build and train a simple CNN model on the MNIST dataset
from scratch. Include code for module creation, compilation, training, and evaluation.**

In [1]:
# Import libraries
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical

# Load MNIST data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Reshape and normalize
x_train = x_train.reshape(-1, 28, 28, 1) / 255.0
x_test = x_test.reshape(-1, 28, 28, 1) / 255.0

# One-hot encode labels
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# Build model
model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(28,28,1)),
    MaxPooling2D((2,2)),

    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D((2,2)),

    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train model
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_split=0.1)

# Evaluate
test_loss, test_acc = model.evaluate(x_test, y_test)
print("Test Accuracy:", test_acc)


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 54ms/step - accuracy: 0.8832 - loss: 0.3867 - val_accuracy: 0.9865 - val_loss: 0.0498
Epoch 2/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m46s[0m 54ms/step - accuracy: 0.9836 - loss: 0.0516 - val_accuracy: 0.9863 - val_loss: 0.0440
Epoch 3/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m79s[0m 52ms/step - accuracy: 0.9894 - loss: 0.0328 - val_accuracy: 0.9900 - val_loss: 0.0326
Epoch 4/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m84s[0m 54ms/step - accuracy: 0.9928 - loss: 0.0211 - val_accuracy: 0.9905 - val_loss: 0.0356
Epoch 5/5
[1m844/844[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 53ms/step - accuracy: 0.9947 - loss: 0.0170 - val_accuracy: 0.9917 - val_loss: 0.0341
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 9ms/step - accuracy: 0.9854 - loss: 0.0436
Test Accuracy: 0.9887999892234802


**Question 7: Load and preprocess the CIFAR-10 dataset using Keras, and create a
CNN model to classify RGB images. Show your preprocessing and architecture.**


In [2]:
# Import libraries
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize pixel values (RGB)
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# One-hot encode labels
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Build CNN model
model = Sequential([
    Conv2D(32, (3,3), activation='relu', input_shape=(32,32,3)),
    MaxPooling2D((2,2)),

    Conv2D(64, (3,3), activation='relu'),
    MaxPooling2D((2,2)),

    Conv2D(128, (3,3), activation='relu'),
    MaxPooling2D((2,2)),

    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])

# Compile model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train model
model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.1)


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step
Epoch 1/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m67s[0m 93ms/step - accuracy: 0.3153 - loss: 1.8594 - val_accuracy: 0.5242 - val_loss: 1.3119
Epoch 2/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m63s[0m 89ms/step - accuracy: 0.5574 - loss: 1.2451 - val_accuracy: 0.6140 - val_loss: 1.1130
Epoch 3/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m61s[0m 86ms/step - accuracy: 0.6300 - loss: 1.0511 - val_accuracy: 0.6424 - val_loss: 1.0215
Epoch 4/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 88ms/step - accuracy: 0.6787 - loss: 0.9164 - val_accuracy: 0.6730 - val_loss: 0.9639
Epoch 5/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m62s[0m 88ms/step - accuracy: 0.7069 - loss: 0.8386 - val_accuracy: 0.7016 - val_loss: 0.8836
Epoch 6/10
[1m

<keras.src.callbacks.history.History at 0x7f17210d80e0>

**Question 8: Using PyTorch, write a script to define and train a CNN on the MNIST
dataset. Include model definition, data loaders, training loop, and accuracy evaluation.**


In [4]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Transform
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Data Loaders
train_data = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_data = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
test_loader = DataLoader(test_data, batch_size=1000, shuffle=False)

# Define CNN
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3)      # 28 → 26
        self.conv2 = nn.Conv2d(32, 64, 3)     # 26 → 24
        self.pool = nn.MaxPool2d(2,2)         # 24 → 12

        # 64 channels * 12 * 12 = 9216
        self.fc1 = nn.Linear(64 * 12 * 12, 128)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = torch.relu(self.conv1(x))
        x = torch.relu(self.conv2(x))
        x = self.pool(x)
        x = x.view(x.size(0), -1)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = CNN()

# Loss + Optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training Loop
for epoch in range(5):
    for images, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    print("Epoch:", epoch+1, "Loss:", loss.item())

# Evaluation
correct = 0
total = 0

with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print("Test Accuracy:", correct / total)


Epoch: 1 Loss: 0.01631917618215084
Epoch: 2 Loss: 0.009268590249121189
Epoch: 3 Loss: 0.0019121990771964192
Epoch: 4 Loss: 0.0023090927861630917
Epoch: 5 Loss: 0.029460176825523376
Test Accuracy: 0.9876


**Question 9: Given a custom image dataset stored in a local directory, write code using
Keras ImageDataGenerator to preprocess and train a CNN model.**


In [None]:
# STEP 1 — Upload ZIP file
from google.colab import files
uploaded = files.upload()

# STEP 2 — Unzip
import zipfile
import io

zip_name = list(uploaded.keys())[0]  # automatically gets the uploaded file name
print("Uploaded file:", zip_name)

zip_ref = zipfile.ZipFile(io.BytesIO(uploaded[zip_name]), 'r')
zip_ref.extractall('/content/rice')
zip_ref.close()

# Path to dataset
data_path = "/content/rice/Rice_Image_Dataset"

# STEP 3 — Image Preprocessing
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rescale=1./255,
    validation_split=0.2
)

train_data = datagen.flow_from_directory(
    data_path,
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical',
    subset='training'
)

val_data = datagen.flow_from_directory(
    data_path,
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical',
    subset='validation'
)

# STEP 4 — Build CNN Model
from tensorflow.keras import layers, models

model = models.Sequential([
    layers.Conv2D(32, (3,3), activation='relu', input_shape=(224,224,3)),
    layers.MaxPooling2D(2,2),

    layers.Conv2D(64, (3,3), activation='relu'),
    layers.MaxPooling2D(2,2),

    layers.Conv2D(128, (3,3), activation='relu'),
    layers.MaxPooling2D(2,2),

    layers.Flatten(),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.3),
    layers.Dense(5, activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# STEP 5 — Train
history = model.fit(
    train_data,
    epochs=10,
    validation_data=val_data
)

# STEP 6 — Save model in Keras 3 format
model.save("rice_model.keras")

print("Model saved as rice_model.keras")


**Question 10: You are working on a web application for a medical imaging startup. Your
task is to build and deploy a CNN model that classifies chest X-ray images into “Normal”
and “Pneumonia” categories. Describe your end-to-end approach–from data preparation
and model training to deploying the model as a web app using Streamlit.**


**Answer:**

**1. Problem framing**
Binary classification: input = chest X-ray image, output = “Normal” or “Pneumonia”. Goal: high sensitivity for Pneumonia, robust generalisation, fast inference for web app.

**2. Data preparation**

Collect labeled X-rays, keep patient-level split so same patient isn’t in train+test.

Inspect class balance and image sizes.

Clean: remove corrupted files, consistent file formats, convert to single-channel (grayscale) or 3-channel if using ImageNet backbones.

Split: train / val / test (e.g., 70/15/15) at patient-level.

Augmentation (only for train): random rotation ±15°, horizontal flip (if clinically acceptable), random zoom, brightness/contrast jitter. Do NOT apply augmentation to validation/test.

Normalize pixel values; if using pretrained backbone, use that model’s normalization (e.g., mean/std or scale to [0,1]).

Handle imbalance: use class weights in loss, oversample minority, or focal loss.

**3. Model choice & architecture**

Prefer transfer learning (faster, better with limited medical data). Use a pretrained backbone (e.g., DenseNet121 / ResNet50 / EfficientNet) + small classification head: global average pooling → dropout → dense (1, sigmoid).

If training from scratch, use a small custom CNN and heavy augmentation, but transfer learning is standard for X-rays.

**4. Loss, metrics, and other training details**

Loss: binary cross-entropy; consider focal loss if hard negatives or heavy imbalance.

Metrics to monitor: sensitivity (recall) for Pneumonia, specificity, accuracy, AUC-ROC. Prioritize sensitivity in model selection.

Optimizer: Adam with lr schedule; use ReduceLROnPlateau or cosine schedule.

Regularization: dropout (0.3–0.5), weight decay.

Early stopping on validation AUC/sensitivity.

Use mixed precision and batch size tuning to speed training.

Use stratified mini-batches and patient-level grouping when possible.

**5. Training pipeline**

Build data generators / tf.data pipeline for efficient IO and augmentation.

Freeze backbone initially, train head for few epochs, then unfreeze some layers and fine-tune with a lower lr.

Save best model checkpoint by validation AUC or sensitivity.

Validate on unseen test set only once after final model chosen.

**6. Model evaluation & validation**

Evaluate test set for AUC, confusion matrix, sensitivity, specificity.

Calibration: check reliability (e.g., calibration curve); apply temperature scaling if needed.

Explainability: generate Grad-CAM / saliency maps for sample predictions to verify model focuses on lungs (important for clinical trust).

Perform robustness checks: different image sources, scanners, noise.

**7. Exporting & packaging model**

Export final model to a format suitable for deployment: SavedModel (TensorFlow) or TorchScript/ONNX (PyTorch).

Optionally create a lightweight version (quantized TFLite) for edge/mobile.

Include preprocessing pipeline code with the model or bundle normalization parameters.

**8. Web app design (Streamlit)**

Frontend: simple UI to upload an X-ray, show original image + Grad-CAM overlay + predicted probability + label.

Backend inference: load exported model once on app start, apply same preprocessing, run predict, optionally run explainability (Grad-CAM) and return results.

Example minimal Streamlit flow: user uploads image → app preprocesses → model.predict → display probability and heatmap → allow download of report.


**9. Deployment options**

Local/Cloud: deploy Streamlit on a VM, or use Streamlit Cloud, Heroku, AWS EC2, or Docker + any cloud provider.

Containerize: create Dockerfile including model files, expose Streamlit port.

Scale: use Gunicorn + multiple workers or wrap model inference behind a REST microservice (FastAPI) and let frontend call API. Use GPU instances for high throughput.

**10. Security, privacy & compliance**

De-identify images and follow HIPAA/GDPR rules as applicable.

Secure uploads (HTTPS), limit file sizes, validate image MIME types.

Audit logs and user authentication for clinical use. Never store PHI unless permitted.

**11. Monitoring & maintenance**

Monitor latency, throughput, and model drift. Log predictions and periodically re-evaluate on new labeled data.

Set up alerting if performance drops. Retrain with new data when performance degrades.

**12. Testing & validation for clinical readiness**

Extensive external validation on datasets from different hospitals.

Clinical review of false positives/negatives.

If intended for clinical use, follow regulatory requirements and perform prospective trials.

**13. Checklist before release**

Reproducible training script and fixed random seeds.

Unit tests for preprocessing and inference.

Key documentation: input format, expected normalization, model versioning, and instructions for rollback.

**14. Quick summary (one-line)**
Prepare patient-split, augmented data → use transfer learning and prioritize sensitivity → validate with AUC + Grad-CAM → export model → deploy as Streamlit app (Docker) with secure uploads, monitoring, and clinical validation.

