# CNN Architecture | Assignment

# Question 1: What is the role of filters and feature maps in Convolutional Neural Network (CNN)?

Filters and feature maps are fundamental components of Convolutional Neural Networks (CNNs). Here's a breakdown of their roles:

*   **Filters (or Kernels):** These are small matrices that slide over the input image (or a previous layer's feature map). Each filter is designed to detect a specific pattern or feature in the image, such as edges, corners, or textures. During the convolution operation, the filter is multiplied element-wise with the corresponding portion of the input, and the results are summed to produce a single value in the output feature map. Different filters in a layer learn to detect different features.

*   **Feature Maps:** The output of the convolution operation is a feature map. Each feature map represents the result of applying a specific filter to the entire input. It shows where in the input the feature detected by that filter is present and how strongly it is present. For example, if a filter is designed to detect vertical edges, its corresponding feature map will highlight the locations in the image where vertical edges are found. As the data progresses through the layers of the CNN, the feature maps become more abstract, representing higher-level features like object parts or even complete objects.

## Question 2: Explain the concepts of padding and stride in CNNs(Convolutional Neural Network). How do they affect the output dimensions of feature maps?

Ans : Padding and stride are two important concepts in Convolutional Neural Networks (CNNs) that influence the output dimensions of feature maps and the overall behavior of the network.

*   **Padding:** Padding involves adding extra pixels (typically with values of zero) around the border of the input image or feature map. The primary purpose of padding is to prevent the loss of information at the edges of the input during convolution. Without padding, the pixels at the edges are only included in a few convolution operations, while the pixels in the center are included in many. This can lead to the underrepresentation of edge features in the output feature map. Padding ensures that edge pixels are processed more equally, helping to preserve spatial information. Padding can also be used to maintain the spatial dimensions of the output feature map the same as the input. Common types of padding include "valid" (no padding) and "same" (padding to maintain output dimensions).

*   **Stride:** Stride refers to the number of pixels the filter shifts over the input during each convolution step. A stride of 1 means the filter moves one pixel at a time. A stride of 2 means the filter moves two pixels at a time, effectively downsampling the spatial dimensions of the output feature map. Increasing the stride reduces the size of the output feature map, which can help to reduce computational cost and the number of parameters in the network.

**How they affect output dimensions:**

The output dimensions of a feature map after a convolutional layer are determined by the input dimensions, the filter size, the padding, and the stride. The formula for calculating the output dimension (either height or width) is:

$$ Output Dimension = \lfloor \frac{(Input Dimension - Filter Size + 2 \times Padding)}{Stride} \rfloor + 1 $$

Where:
*   Input Dimension: The spatial dimension (height or width) of the input.
*   Filter Size: The spatial dimension (height or width) of the filter.
*   Padding: The amount of padding applied to one side of the input. If padding is applied symmetrically, the total padding is $2 \times Padding$.
*   Stride: The number of pixels the filter shifts.

*   **Padding:** Increases the effective input dimension, which can lead to a larger output feature map. With "same" padding, the output dimensions can be the same as the input dimensions.
*   **Stride:** Reduces the number of convolution operations, leading to a smaller output feature map. A larger stride results in a greater reduction in dimensions.

By carefully selecting padding and stride values, CNN architects can control the spatial dimensions of feature maps at each layer, influencing the network's ability to capture features at different scales and manage computational resources.

## Question 3: Define receptive field in the context of CNNs. Why is it important for deep architectures?

Ans: In the context of Convolutional Neural Networks (CNNs), the **receptive field** of a neuron in a particular layer is the region in the original input image that influences that neuron's activation. In simpler terms, it's the area of the input image that a specific filter "sees" when computing the output value at a particular location in a feature map.

**How it's determined:**

The receptive field size is not just the size of the filter in the current layer. It's a cumulative effect of the filter sizes and strides in all the preceding layers. As you go deeper into the network, the receptive field of neurons in later layers becomes larger, covering a wider area of the original input image.

**Why it's important for deep architectures:**

The increasing receptive field size in deeper layers is crucial for the effectiveness of CNNs in deep architectures:

*   **Hierarchical Feature Extraction:** Early layers with smaller receptive fields focus on extracting local, low-level features like edges and corners. As the receptive field grows in deeper layers, neurons can combine these low-level features to detect more complex, high-level features like textures, shapes, and eventually, object parts or even entire objects. This hierarchical extraction of features is a key reason why CNNs are so powerful for image recognition tasks.
*   **Contextual Information:** A larger receptive field allows neurons to incorporate more contextual information from the input image when making a decision. For example, to identify an object like a car, a neuron in a deep layer needs to consider not just a small patch of pixels, but a larger area that includes wheels, windows, and the overall shape. The receptive field enables this by providing a wider view of the input.
*   **Robustness to Translations:** The increasing receptive field also contributes to the network's robustness to translations (shifting the object's position in the image). Because deeper layers consider larger areas, a slight shift in the object's position is less likely to significantly change the activation patterns in these layers.
*   **Efficiency:** While a larger receptive field could theoretically be achieved with a single large filter, using multiple layers with smaller filters is more efficient in terms of the number of parameters and computational cost. This is because the parameters are shared across different locations in the input through the convolution operation.

In summary, the receptive field is a fundamental concept that explains how CNNs build up a hierarchical representation of the input image. Its increasing size in deeper layers allows the network to capture increasingly complex features and contextual information, which is essential for achieving high performance in various computer vision tasks.

## Question 4: Discuss how filter size and stride influence the number of parameters in a CNN.

Ans : In a Convolutional Neural Network (CNN), the number of parameters is a critical factor that affects the model's complexity, memory usage, and computational cost. Filter size and stride significantly influence the number of parameters in a convolutional layer.

**Filter Size:**

*   The number of parameters in a single filter is equal to its spatial dimensions (height * width) multiplied by the number of input channels.
*   A convolutional layer typically has multiple filters (also known as kernels). The total number of parameters in a convolutional layer is the number of filters multiplied by the number of parameters in each filter.
*   Therefore, increasing the filter size directly increases the number of parameters in a convolutional layer. For example, a 5x5 filter has 25 parameters (assuming a single input channel), while a 3x3 filter has only 9 parameters. Using larger filters in early layers can lead to a rapid increase in the total number of parameters in the network.

**Stride:**

*   Stride does **not** directly affect the number of parameters within a convolutional layer. The number of parameters in a filter and the number of filters remain the same regardless of the stride value.
*   However, stride indirectly influences the number of parameters in subsequent layers. A larger stride reduces the spatial dimensions of the output feature map from the current layer. This smaller output feature map becomes the input to the next layer.
*   Since the number of parameters in a convolutional layer depends on the size of the input feature map (specifically, the number of input channels), a smaller input feature map from a layer with a larger stride will result in fewer parameters in the subsequent convolutional layer.

**In summary:**

*   **Filter size** has a **direct** impact on the number of parameters in a convolutional layer. Larger filters mean more parameters.
*   **Stride** has an **indirect** impact on the number of parameters in **subsequent** convolutional layers by reducing the spatial dimensions of the input to those layers. Larger strides lead to smaller input feature maps for the next layer, resulting in fewer parameters in that layer.

Therefore, careful consideration of both filter size and stride is essential when designing a CNN architecture to balance the model's capacity to learn complex features with computational efficiency and memory constraints.

## Question 5: Compare and contrast different CNN-based architectures like LeNet, AlexNet, and VGG in terms of depth, filter sizes, and performance.

Ans: Here's a comparison of some prominent CNN architectures: LeNet, AlexNet, and VGG, highlighting their characteristics in terms of depth, filter sizes, and performance:

| Feature          | LeNet                                     | AlexNet                                       | VGG (e.g., VGG16/VGG19)                           |
| :--------------- | :---------------------------------------- | :-------------------------------------------- | :------------------------------------------------ |
| **Depth**        | Shallow (a few convolutional layers)      | Deeper than LeNet                             | Much deeper than LeNet and AlexNet                |
| **Filter Sizes** | Relatively larger filters (e.g., 5x5)     | Mixed filter sizes (e.g., 11x11, 5x5, 3x3)      | Primarily uses small filters (3x3) throughout     |
| **Architecture** | Simple, with convolution and pooling layers | Introduced ReLU activation, dropout, and LRN    | Relies on stacking small filters, pooling layers |
| **Parameters**   | Relatively few                          | More parameters than LeNet                    | Significantly more parameters due to depth         |
| **Performance**  | Early success on handwritten digits       | Broke records on ImageNet, significant improvement | Achieved state-of-the-art results on ImageNet     |
| **Key Idea**     | Basic CNN principles                      | Leveraging GPUs, regularization techniques    | Depth is key, using small, stacked filters        |

**Comparison and Contrast:**

*   **Depth:** The most significant difference is depth. LeNet is shallow, AlexNet is deeper, and VGG is much deeper. This increase in depth allows later architectures to learn more complex and hierarchical feature representations.
*   **Filter Sizes:** LeNet used larger filters, while AlexNet used a mix. VGG's key contribution was demonstrating that using only small (3x3) filters throughout the network, stacked in multiple layers, could be highly effective. This is because stacking two 3x3 filters with a stride of 1 has an effective receptive field equivalent to a 5x5 filter, but with fewer parameters.
*   **Performance:** Each architecture represented a significant step forward in performance on image recognition tasks compared to its predecessors. AlexNet's success on ImageNet in 2012 was a turning point, and VGG further improved upon this by exploring the impact of depth.
*   **Computational Cost:** With increasing depth and parameters, the computational cost also increased significantly from LeNet to VGG.

In essence, the evolution from LeNet to VGG shows a trend towards deeper networks, the use of smaller, stacked filters, and the incorporation of regularization techniques to handle the increased complexity and prevent overfitting. These advancements have been crucial in achieving the high performance seen in modern CNNs.

## Question 6: Using keras, build and train a simple CNN model on the MNIST dataset from scratch. Include code for module creation, compilation, training, and evaluation.

In [None]:
import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Load and preprocess the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Reshape data for CNN
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1).astype('float32') / 255
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1).astype('float32') / 255

# One-hot encode the labels
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)

# Build the CNN model
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(loss=tf.keras.losses.categorical_crossentropy,
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])

# Train the model
model.fit(x_train, y_train,
          batch_size=128,
          epochs=10,
          verbose=1,
          validation_data=(x_test, y_test))

# Evaluate the model
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/10
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m9s[0m 9ms/step - accuracy: 0.8571 - loss: 0.4956 - val_accuracy: 0.9811 - val_loss: 0.0626
Epoch 2/10
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9807 - loss: 0.0626 - val_accuracy: 0.9859 - val_loss: 0.0455
Epoch 3/10
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.9871 - loss: 0.0412 - val_accuracy: 0.9884 - val_loss: 0.0360
Epoch 4/10
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9902 - loss: 0.0305 - val_accuracy: 0.9890 - val_loss: 0.0327
Epoch 5/10
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - accuracy: 0.9921 - loss: 0.0260 - val_accuracy: 0.9893 - val_loss: 0.0334
Epoch 6/10
[1m469/469[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 4ms/step - accuracy: 0.9943 - loss: 0.0184 - val_accuracy: 0.9914 - val_loss: 0.0273
Epoch 7/10
[1m469/469[0m 

## Question 7: Load and preprocess the CIFAR-10 dataset using Keras, and create a CNN model to classify RGB images. Show your preprocessing and architecture.

In [None]:
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.utils import to_categorical

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Preprocessing
# Normalize pixel values to be between 0 and 1
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# One-hot encode the labels
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

print("CIFAR-10 dataset loaded and preprocessed.")
print("x_train shape:", x_train.shape)
print("y_train shape:", y_train.shape)
print("x_test shape:", x_test.shape)
print("y_test shape:", y_test.shape)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
[1m170498071/170498071[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 0us/step
CIFAR-10 dataset loaded and preprocessed.
x_train shape: (50000, 32, 32, 3)
y_train shape: (50000, 10)
x_test shape: (10000, 32, 32, 3)
y_test shape: (10000, 10)


In [None]:
# Build the CNN model
model_cifar10 = Sequential()

# Convolutional Layer 1
model_cifar10.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)))
model_cifar10.add(Conv2D(32, (3, 3), activation='relu', padding='same'))
model_cifar10.add(MaxPooling2D(pool_size=(2, 2)))
model_cifar10.add(Dropout(0.25))

# Convolutional Layer 2
model_cifar10.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model_cifar10.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model_cifar10.add(MaxPooling2D(pool_size=(2, 2)))
model_cifar10.add(Dropout(0.25))

# Flattening
model_cifar10.add(Flatten())

# Dense layers
model_cifar10.add(Dense(512, activation='relu'))
model_cifar10.add(Dropout(0.5))
model_cifar10.add(Dense(10, activation='softmax'))

# Compile the model
model_cifar10.compile(loss='categorical_crossentropy',
                      optimizer='adam',
                      metrics=['accuracy'])

# Show model architecture
model_cifar10.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [None]:
# Train the model
history = model_cifar10.fit(x_train, y_train,
                            batch_size=64,
                            epochs=20,
                            verbose=1,
                            validation_data=(x_test, y_test),
                            shuffle=True)

# Evaluate the model
score = model_cifar10.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Epoch 1/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 8ms/step - accuracy: 0.8903 - loss: 0.3140 - val_accuracy: 0.7963 - val_loss: 0.6617
Epoch 2/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 7ms/step - accuracy: 0.8908 - loss: 0.3046 - val_accuracy: 0.8015 - val_loss: 0.6581
Epoch 3/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 7ms/step - accuracy: 0.8941 - loss: 0.2993 - val_accuracy: 0.7974 - val_loss: 0.6792
Epoch 4/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 7ms/step - accuracy: 0.8961 - loss: 0.2963 - val_accuracy: 0.7984 - val_loss: 0.6812
Epoch 5/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 7ms/step - accuracy: 0.8998 - loss: 0.2826 - val_accuracy: 0.7932 - val_loss: 0.6936
Epoch 6/20
[1m782/782[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 7ms/step - accuracy: 0.9012 - loss: 0.2726 - val_accuracy: 0.7992 - val_loss: 0.6888
Epoch 7/20
[1m782/782[0

## Question 8: Using PyTorch, write a script to define and train a CNN on the MNIST dataset. Include model definition, data loaders, training loop, and accuracy evaluation.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Define the CNN model
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        # Calculate the size of the flattened layer
        # Input size is 28x28. After conv1 (3x3, padding=1) it's 28x28.
        # After conv2 (3x3, padding=1) it's 28x28.
        # After pool (2x2, stride=2) it's 14x14.
        self.fc1 = nn.Linear(64 * 14 * 14, 128)
        self.fc2 = nn.Linear(128, 10)
        self.relu = nn.ReLU()
        self.flatten = nn.Flatten()

    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.pool(self.relu(self.conv2(x)))
        x = self.flatten(x)
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# Load and preprocess the MNIST dataset
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)

print("MNIST dataset loaded and preprocessed for PyTorch.")

# Define loss function and optimizer
model_pytorch = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model_pytorch.parameters(), lr=0.001)

# Training loop
epochs = 10
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_pytorch.to(device)

for epoch in range(epochs):
    model_pytorch.train()
    running_loss = 0.0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model_pytorch(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * images.size(0)
    epoch_loss = running_loss / len(train_loader.dataset)
    print(f"Epoch {epoch+1}/{epochs}, Loss: {epoch_loss:.4f}")

print("Finished Training")

# Evaluate the model
model_pytorch.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model_pytorch(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total
print(f'Accuracy of the PyTorch model on the 10000 test images: {accuracy:.2f}%')

MNIST dataset loaded and preprocessed for PyTorch.
Epoch 1/10, Loss: 0.1124
Epoch 2/10, Loss: 0.0353
Epoch 3/10, Loss: 0.0214
Epoch 4/10, Loss: 0.0154
Epoch 5/10, Loss: 0.0092
Epoch 6/10, Loss: 0.0110
Epoch 7/10, Loss: 0.0066
Epoch 8/10, Loss: 0.0060
Epoch 9/10, Loss: 0.0048
Epoch 10/10, Loss: 0.0062
Finished Training
Accuracy of the PyTorch model on the 10000 test images: 99.22%


## Question 9: Given a custom image dataset stored in a local directory, write code using Keras ImageDataGenerator to preprocess and train a CNN model.

In [None]:
import os
import numpy as np
from PIL import Image

# Create dummy directories for a simple dataset (e.g., two classes: 'class_a' and 'class_b')
base_dir = './dummy_dataset'
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')

train_class_a_dir = os.path.join(train_dir, 'class_a')
train_class_b_dir = os.path.join(train_dir, 'class_b')
validation_class_a_dir = os.path.join(validation_dir, 'class_a')
validation_class_b_dir = os.path.join(validation_dir, 'class_b')

# Create the directories if they don't exist
os.makedirs(train_class_a_dir, exist_ok=True)
os.makedirs(train_class_b_dir, exist_ok=True)
os.makedirs(validation_class_a_dir, exist_ok=True)
os.makedirs(validation_class_b_dir, exist_ok=True)

# Create some dummy images (simple colored squares)
def create_dummy_image(directory, filename, color, size=(50, 50)):
    img = Image.new('RGB', size, color=color)
    img.save(os.path.join(directory, filename))

# Create dummy images for training
for i in range(10):
    create_dummy_image(train_class_a_dir, f'image_a_{i}.png', color='red')
    create_dummy_image(train_class_b_dir, f'image_b_{i}.png', color='blue')

# Create dummy images for validation
for i in range(5):
    create_dummy_image(validation_class_a_dir, f'image_a_{i}.png', color='red')
    create_dummy_image(validation_class_b_dir, f'image_b_{i}.png', color='blue')

print("Dummy dataset created.")

Dummy dataset created.


In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout

# Define the path to the dummy dataset
base_dir = './dummy_dataset'
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')

# Set up data generators for training and validation
train_datagen = ImageDataGenerator(rescale=1./255) # Only rescaling for this simple example

validation_datagen = ImageDataGenerator(rescale=1./255)

# Flow training images in batches from the directory
train_generator = train_datagen.flow_from_directory(
    train_dir,                 # Path to the training directory
    target_size=(50, 50),      # Resize images to match dummy image size
    batch_size=4,
    class_mode='categorical')  # Use categorical labels for classification

# Flow validation images in batches from the directory
validation_generator = validation_datagen.flow_from_directory(
    validation_dir,            # Path to the validation directory
    target_size=(50, 50),
    batch_size=4,
    class_mode='categorical')

Found 20 images belonging to 2 classes.
Found 10 images belonging to 2 classes.


In [None]:
# Build a simple CNN model
model_custom_data = Sequential()

model_custom_data.add(Conv2D(16, (3, 3), activation='relu', input_shape=(50, 50, 3)))
model_custom_data.add(MaxPooling2D(pool_size=(2, 2)))

model_custom_data.add(Conv2D(32, (3, 3), activation='relu'))
model_custom_data.add(MaxPooling2D(pool_size=(2, 2)))

model_custom_data.add(Flatten())
model_custom_data.add(Dense(64, activation='relu'))
model_custom_data.add(Dense(train_generator.num_classes, activation='softmax')) # Output layer with number of classes

# Compile the model
model_custom_data.compile(loss='categorical_crossentropy',
                          optimizer='adam',
                          metrics=['accuracy'])

# Show model architecture
model_custom_data.summary()

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [None]:
# Train the model using the data generators
epochs = 5 # Train for a few epochs on the dummy data

history_custom_data = model_custom_data.fit(
    train_generator,
    steps_per_epoch=train_generator.samples // train_generator.batch_size,
    epochs=epochs,
    validation_data=validation_generator,
    validation_steps=validation_generator.samples // validation_generator.batch_size)

Epoch 1/5


  self._warn_if_super_not_called()


[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 158ms/step - accuracy: 0.7931 - loss: 0.4465 - val_accuracy: 1.0000 - val_loss: 0.0417
Epoch 2/5
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 57ms/step - accuracy: 1.0000 - loss: 0.0255 - val_accuracy: 1.0000 - val_loss: 0.0020
Epoch 3/5
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 1.0000 - loss: 7.6240e-04 - val_accuracy: 1.0000 - val_loss: 4.7324e-05
Epoch 4/5
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 14ms/step - accuracy: 1.0000 - loss: 1.9851e-05 - val_accuracy: 1.0000 - val_loss: 3.0398e-06
Epoch 5/5
[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 12ms/step - accuracy: 1.0000 - loss: 2.6766e-06 - val_accuracy: 1.0000 - val_loss: 4.1723e-07


Question 10: You are working on a web application for a medical imaging startup. Your
task is to build and deploy a CNN model that classifies chest X-ray images into “Normal”
and “Pneumonia” categories. Describe your end-to-end approach–from data preparation
and model training to deploying the model as a web app using Streamlit.

Here's an end-to-end approach for building and deploying a CNN model to classify chest X-ray images as "Normal" or "Pneumonia," and serving it as a web application using Streamlit:

Answer :

**1. Data Preparation:**

*   **Dataset Acquisition:** Obtain a suitable dataset of chest X-ray images with labels for "Normal" and "Pneumonia." Publicly available datasets like the ChestX-ray8 dataset or the Kaggle Chest X-Ray Images (Pneumonia) dataset can be used. Ensure you have appropriate licenses for the data.
*   **Data Loading and Exploration:** Load the images and their corresponding labels. Explore the dataset to understand the class distribution, image dimensions, and potential data imbalances.
*   **Data Preprocessing:**
    *   **Resizing:** Resize all images to a consistent size suitable for the CNN model (e.g., 224x224 pixels).
    *   **Normalization:** Normalize pixel values to a standard range (e.g., 0-1) by dividing by 255.
    *   **Data Augmentation:** Apply data augmentation techniques (e.g., rotation, zooming, flipping) to increase the size and diversity of the training data, which helps improve model generalization and reduce overfitting.
*   **Splitting Data:** Split the dataset into training, validation, and testing sets. A common split is 70-15-15 or 80-10-10.

**2. Model Building and Training:**

*   **Choose a CNN Architecture:** Select an appropriate CNN architecture. You can build a model from scratch or use a pre-trained model (transfer learning) like VGG16, ResNet, or EfficientNet, fine-tuning it on your chest X-ray dataset. Transfer learning is often beneficial for medical imaging tasks with limited data.
*   **Model Definition:** Define the CNN model using a framework like TensorFlow/Keras or PyTorch. The model will typically consist of convolutional layers, pooling layers, activation functions (e.g., ReLU), and fully connected layers for classification, with a final output layer having two units (for "Normal" and "Pneumonia") and a softmax activation.
*   **Compilation:** Compile the model with an appropriate loss function (e.g., binary cross-entropy for two classes), an optimizer (e.g., Adam), and evaluation metrics (e.g., accuracy, precision, recall, F1-score).
*   **Training:** Train the model on the training data, using the validation data to monitor performance and prevent overfitting. Use techniques like early stopping and learning rate scheduling if needed.

**3. Model Evaluation:**

*   **Evaluate on Test Set:** Evaluate the trained model on the held-out test set to assess its performance on unseen data. Calculate relevant metrics to understand the model's strengths and weaknesses.
*   **Confusion Matrix:** Generate a confusion matrix to visualize the model's predictions and identify misclassifications.

**4. Model Deployment (Streamlit Web App):**

*   **Save the Trained Model:** Save the trained CNN model in a format suitable for deployment (e.g., Keras .h5, TensorFlow SavedModel, PyTorch .pth).
*   **Build the Streamlit App:**
    *   **Install Streamlit:** Install the Streamlit library (`pip install streamlit`).
    *   **Create a Python Script:** Write a Python script using Streamlit to create the web application.
    *   **Load the Model:** Load the saved trained model within the Streamlit script.
    *   **File Uploader:** Add a file uploader component to allow users to upload chest X-ray images.
    *   **Image Preprocessing:** Implement the same preprocessing steps used during training for the uploaded image.
    *   **Prediction:** Use the loaded model to predict the class (Normal or Pneumonia) of the preprocessed image.
    *   **Display Results:** Display the uploaded image and the model's prediction to the user.
*   **Deployment:**
    *   **Local Deployment:** Run the Streamlit app locally using `streamlit run your_app_script.py`.
    *   **Cloud Deployment:** Deploy the Streamlit app to a cloud platform (e.g., Heroku, Google Cloud Platform, AWS, Streamlit Cloud). This typically involves creating a requirements file listing dependencies and configuring the deployment environment.

**End-to-end Flow in the Web App:**

1.  User uploads a chest X-ray image through the Streamlit interface.
2.  The Streamlit app receives the image and preprocesses it.
3.  The preprocessed image is fed into the loaded CNN model.
4.  The model predicts whether the image shows a "Normal" chest or "Pneumonia."
5.  The Streamlit app displays the original image and the prediction result to the user.

This end-to-end approach covers the key steps from data handling and model development to creating an interactive web application for practical use.