# <font color="#418FDE" size="6.5" uppercase>**CNNs from Scratch**</font>

>Last update: 20260129.
    
By the end of this Lecture, you will be able to:
- Construct a convolutional neural network using Conv2d, pooling, and fully connected layers in PyTorch. 
- Train the CNN on a small image dataset using the standard training loop and appropriate loss and metrics. 
- Analyze model performance using accuracy, confusion matrices, and simple error inspection. 


## **1. Building CNN Layers**

### **1.1. Understanding Conv2d Kernels**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_A/image_01_01.jpg?v=1769713549" width="250">



>* Kernel is a small sliding numeric stencil
>* Sliding kernels learn patterns like edges during training

>* Many kernels scan all channels, small regions
>* Each kernel detects different patterns and objects

>* Kernel size, stride, padding shape feature maps
>* These choices control patterns detected and detail



In [None]:
#@title Python Code - Understanding Conv2d Kernels

# This script visualizes simple Conv2d style kernels.
# It helps beginners understand kernel sliding behavior.
# We use NumPy only and keep outputs small.

# import required library for arrays.
import numpy as np

# set deterministic random seed for reproducibility.
np.random.seed(42)

# create a tiny grayscale image with simple pattern.
image = np.array(
    [[0, 0, 0, 0, 0],
     [0, 1, 1, 1, 0],
     [0, 1, 2, 1, 0],
     [0, 1, 1, 1, 0],
     [0, 0, 0, 0, 0]],
    dtype=float,
)

# define a simple edge detection style kernel.
kernel = np.array(
    [[-1, -1, -1],
     [0, 0, 0],
     [1, 1, 1]],
    dtype=float,
)

# compute output spatial size for valid convolution.
out_height = image.shape[0] - kernel.shape[0] + 1
out_width = image.shape[1] - kernel.shape[1] + 1

# validate that output size is positive.
assert out_height > 0 and out_width > 0

# allocate output feature map for convolution result.
output = np.zeros((out_height, out_width), dtype=float)

# perform manual convolution with stride one.
for i in range(out_height):
    for j in range(out_width):
        # extract current image patch under kernel.
        patch = image[i : i + 3, j : j + 3]

        # compute elementwise product and sum.
        value = np.sum(patch * kernel)

        # store result in output feature map.
        output[i, j] = value

# print original image to inspect local pattern.
print("Input image (5x5 pixels):")
print(image)

# print kernel values representing learned stencil.
print("\nKernel (3x3 edge detector):")
print(kernel)

# print resulting feature map after convolution.
print("\nOutput feature map (3x3):")
print(output)




### **1.2. Understanding Pooling Layers**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_A/image_01_02.jpg?v=1769713603" width="250">



>* Pooling shrinks feature maps while keeping key information
>* Sliding windows summarize regions, adding shift robustness

>* Max pooling keeps the strongest feature in regions
>* Average pooling keeps the mean, giving smoother summaries

>* Pooling cuts computation and speeds up models
>* Pooling adds robustness to small image changes



In [None]:
#@title Python Code - Understanding Pooling Layers

# This script explains pooling layers visually.
# It uses TensorFlow to simulate simple pooling.
# Run all cells to see printed comparisons.

# !pip install tensorflow==2.20.0.

# Import required libraries safely.
import os
import random
import numpy as np
import tensorflow as tf

# Set deterministic random seeds.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version briefly.
print("TensorFlow version:", tf.__version__)

# Create a tiny example feature map.
example_map = np.array(
    [
        [1.0, 2.0, 0.5, 0.0],
        [3.0, 4.0, 1.0, 0.5],
        [0.0, 1.0, 2.0, 3.0],
        [0.5, 1.5, 2.5, 4.0],
    ],
    dtype=np.float32,
)

# Validate the feature map shape.
if example_map.shape != (4, 4):
    raise ValueError("Example map must be shape (4,4).")

# Add batch and channel dimensions.
input_tensor = example_map[np.newaxis, ..., np.newaxis]

# Confirm the tensor has correct shape.
if input_tensor.shape != (1, 4, 4, 1):
    raise ValueError("Input tensor must be shape (1,4,4,1).")

# Define a 2x2 max pooling layer.
max_pool = tf.keras.layers.MaxPool2D(
    pool_size=(2, 2),
    strides=(2, 2),
    padding="valid",
)

# Define a 2x2 average pooling layer.
avg_pool = tf.keras.layers.AveragePooling2D(
    pool_size=(2, 2),
    strides=(2, 2),
    padding="valid",
)

# Apply max pooling to the input.
max_pooled = max_pool(input_tensor)

# Apply average pooling to the input.
avg_pooled = avg_pool(input_tensor)

# Convert pooled outputs to numpy arrays.
max_pooled_np = max_pooled.numpy()[0, :, :, 0]
avg_pooled_np = avg_pooled.numpy()[0, :, :, 0]

# Print the original feature map.
print("Original 4x4 feature map:")
print(example_map)

# Print the max pooled result.
print("\nMax pooled 2x2 feature map:")
print(max_pooled_np)

# Print the average pooled result.
print("\nAverage pooled 2x2 feature map:")
print(avg_pooled_np)

# Explain how pooling reduces spatial size.
print("\nEach 2x2 block became one pooled value.")



### **1.3. Flattening and Linear Layers**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_A/image_01_03.jpg?v=1769713666" width="250">



>* Flattening turns 2D feature maps into vectors
>* These vectors feed linear layers for predictions

>* Flattened features feed into fully connected layers
>* Linear layers combine patterns to produce class scores

>* Flatten size sets first linear layer inputs
>* Intermediate linears refine features into final classes



In [None]:
#@title Python Code - Flattening and Linear Layers

# This script shows flattening and linear layers.
# We build a tiny CNN using TensorFlow Keras.
# Focus on connecting conv outputs to dense layers.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras layers.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic random seeds.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version briefly.
print("TensorFlow version:", tf.__version__)

# Load MNIST dataset from Keras.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Select a small subset for speed.
train_samples = 2000
test_samples = 500
x_train = x_train[:train_samples]
y_train = y_train[:train_samples]

# Slice test data subset.
x_test = x_test[:test_samples]
y_test = y_test[:test_samples]

# Normalize pixel values to range zero one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Add channel dimension for convolution.
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# Confirm shapes before building model.
print("Train shape:", x_train.shape)
print("Test shape:", x_test.shape)

# Define input shape for the model.
input_shape = (28, 28, 1)
num_classes = 10

# Build a simple sequential CNN model.
model = keras.Sequential([
    layers.Input(shape=input_shape),
    layers.Conv2D(8, (3, 3), activation="relu"),
    layers.MaxPooling2D(pool_size=(2, 2)),
    layers.Conv2D(16, (3, 3), activation="relu"),
])

# Add flatten layer to unroll feature maps.
model.add(layers.Flatten())

# Add a hidden dense layer with ReLU.
model.add(layers.Dense(32, activation="relu"))

# Add final dense layer for class scores.
model.add(layers.Dense(num_classes, activation="softmax"))

# Call model once to ensure inputs/outputs are defined.
_ = model(x_train[:1])

# Build the model explicitly to define inputs/outputs.
model.build((None,) + input_shape)

# Show model summary in one short line.
model.summary(print_fn=lambda x: None)
print("Model built with Flatten and Dense layers.")

# Compile model with optimizer and loss.
model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train model briefly with silent output.
history = model.fit(
    x_train,
    y_train,
    epochs=2,
    batch_size=64,
    verbose=0,
    validation_split=0.1,
)

# Evaluate model on test subset.
test_loss, test_acc = model.evaluate(
    x_test,
    y_test,
    verbose=0,
)

# Print evaluation results clearly.
print("Test loss:", round(float(test_loss), 4))
print("Test accuracy:", round(float(test_acc), 4))

# Inspect how flatten changes tensor shapes.
conv_output_model = keras.Model(
    inputs=model.inputs,
    outputs=model.layers[3].output,
)

# Take one sample batch from test set.
sample_batch = x_test[:1]

# Get convolutional feature maps.
feature_maps = conv_output_model.predict(sample_batch, verbose=0)

# Get flattened vector from model layer.
flatten_layer = model.layers[4]
flattened_vector = flatten_layer(feature_maps).numpy()

# Print shapes before and after flattening.
print("Feature maps shape:", feature_maps.shape)
print("Flattened vector shape:", flattened_vector.shape)

# Confirm flattened size equals dense input units.
print("Dense input units:", flattened_vector.shape[1])



## **2. Training Vision CNNs**

### **2.1. Image Normalization Basics**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_A/image_02_01.jpg?v=1769713801" width="250">



>* Normalize pixels to stable, consistent value ranges
>* This stabilizes training and improves final performance

>* Normalize each RGB channel to similar statistics
>* Helps model ignore brightness, focus on shapes

>* Normalization reduces dataset and deployment distribution shifts
>* Preprocessing keeps inputs consistent, stabilizing training over epochs



In [None]:
#@title Python Code - Image Normalization Basics

# This script shows basic image normalization concepts.
# We use TensorFlow to load and normalize images.
# Focus is on simple, clear, and short code.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras utilities.
import tensorflow as tf
from tensorflow.keras import datasets

# Set deterministic random seeds for reproducibility.
SEED_VALUE = 42
random.seed(SEED_VALUE)
np.random.seed(SEED_VALUE)

# Set TensorFlow random seed for reproducibility.
tf.random.set_seed(SEED_VALUE)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Load CIFAR10 dataset with small color images.
(x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()

# Confirm dataset shapes before normalization.
print("Train shape:", x_train.shape, "Test shape:", x_test.shape)

# Select a tiny subset for quick demonstration.
subset_size = 8
x_small = x_train[:subset_size]

# Copy subset for normalized version comparison.
x_small_norm = x_small.astype("float32")

# Show original pixel range for the subset.
print("Original min:", x_small.min(), "max:", x_small.max())

# Scale pixels from integers to range zero one.
x_small_norm = x_small_norm / 255.0

# Compute per channel mean and standard deviation.
channel_means = x_small_norm.mean(axis=(0, 1, 2))

# Compute standard deviation for each color channel.
channel_stds = x_small_norm.std(axis=(0, 1, 2))

# Standardize each channel using mean and standard deviation.
x_small_norm = (x_small_norm - channel_means) / channel_stds

# Verify normalized subset shape matches original subset.
assert x_small_norm.shape == x_small.shape

# Compute new per channel mean after normalization.
new_means = x_small_norm.mean(axis=(0, 1, 2))

# Compute new per channel standard deviation values.
new_stds = x_small_norm.std(axis=(0, 1, 2))

# Print original channel statistics before normalization.
print("Original means:", channel_means)

# Print new channel statistics after normalization.
print("New means:", new_means)

# Print new standard deviations after normalization.
print("New stds:", new_stds)

# Show a few sample normalized pixel values.
print("Sample normalized pixels:", x_small_norm[0, 0, 0, :])




### **2.2. CrossEntropy Loss for Labels**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_A/image_02_02.jpg?v=1769713832" width="250">



>* Cross entropy compares predictions to true class labels
>* It outputs a single loss minimized during training

>* High loss when model is uncertain
>* Loss shrinks as confidence in correct class grows

>* Use logits and integer labels; loss handles softmax
>* Average batch loss, backpropagate, improve correct-class probabilities



In [None]:
#@title Python Code - CrossEntropy Loss for Labels

# This script shows cross entropy loss usage.
# It uses TensorFlow for a tiny CNN example.
# Focus is on labels and logits representation.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras submodules.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic random seeds.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version briefly.
print("TensorFlow version:", tf.__version__)

# Load MNIST dataset from Keras.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Select a small subset for speed.
num_train = 2000
num_test = 500
x_train = x_train[:num_train]
y_train = y_train[:num_train]

# Slice test data subset.
x_test = x_test[:num_test]
y_test = y_test[:num_test]

# Normalize pixel values to range zero one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Add channel dimension for CNN input.
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# Confirm shapes are as expected.
print("Train shape:", x_train.shape, y_train.shape)

# Build a simple CNN model.
model = keras.Sequential([
    layers.Conv2D(8, (3, 3), activation="relu", input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(32, activation="relu"),
    layers.Dense(10)
])

# Show that final layer outputs logits.
print("Output shape (logits):", model.output_shape)

# Define sparse categorical crossentropy loss.
loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Compile model with accuracy metric.
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])

# Train model briefly with silent verbose.
history = model.fit(x_train, y_train, epochs=3, batch_size=64, verbose=0)

# Evaluate model on test subset.
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)

# Print loss and accuracy summary.
print("Test loss (cross entropy):", float(test_loss))
print("Test accuracy:", float(test_acc))

# Take a small batch for manual loss demo.
batch_images = x_test[:4]
batch_labels = y_test[:4]

# Get raw logits from the model.
logits = model(batch_images, training=False)

# Compute cross entropy loss for batch.
batch_loss = loss_fn(batch_labels, logits).numpy()

# Print labels and corresponding loss value.
print("Batch labels:", batch_labels)
print("Batch cross entropy loss:", float(batch_loss))

# Confirm logits are not probabilities yet.
print("Logits sample:", logits[0].numpy())



### **2.3. Accuracy Metric Setup**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_A/image_02_03.jpg?v=1769713901" width="250">



>* Accuracy measures correct predictions over total images
>* Compute batch accuracy and track learning progress

>* Track correct predictions across all epoch batches
>* Epoch accuracy is stable and checks generalization

>* Accuracy is simple but has important limits
>* Use extra metrics for imbalanced, high-stakes tasks



In [None]:
#@title Python Code - Accuracy Metric Setup

# This script shows accuracy metric setup.
# We use TensorFlow to mimic PyTorch ideas.
# Focus is computing batch and epoch accuracy.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic random seeds.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version once.
print("TensorFlow version:", tf.__version__)

# Load MNIST dataset from Keras.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Select a small subset for speed.
train_samples = 2000
test_samples = 500
x_train = x_train[:train_samples]
y_train = y_train[:train_samples]

# Slice test subset for evaluation.
x_test = x_test[:test_samples]
y_test = y_test[:test_samples]

# Normalize pixel values to zero one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Add channel dimension for convolution.
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# Confirm shapes are as expected.
assert x_train.shape[1:] == (28, 28, 1)
assert x_test.shape[1:] == (28, 28, 1)

# Build a small CNN classification model.
model = keras.Sequential([
    layers.Conv2D(16, (3, 3), activation="relu", input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(32, (3, 3), activation="relu"),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(64, activation="relu"),
    layers.Dense(10, activation="softmax"),
])

# Compile model with loss and optimizer.
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=0.001),
    loss="sparse_categorical_crossentropy",
    metrics=[],
)

# Create TensorFlow datasets for batching.
batch_size = 64
train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_ds = train_ds.shuffle(buffer_size=train_samples, seed=seed_value)
train_ds = train_ds.batch(batch_size)

# Prepare test dataset for evaluation.
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_ds = test_ds.batch(batch_size)

# Define a function to compute batch accuracy.
def batch_accuracy(logits_batch, labels_batch):
    # Get predicted class indices.
    preds = tf.argmax(logits_batch, axis=1, output_type=tf.int32)
    # Compare predictions with true labels.
    matches = tf.equal(preds, tf.cast(labels_batch, tf.int32))
    # Compute mean accuracy for batch.
    return tf.reduce_mean(tf.cast(matches, tf.float32))

# Train for a few epochs manually.
epochs = 2
for epoch in range(epochs):
    # Reset running counters each epoch.
    epoch_correct = 0
    epoch_total = 0
    for step, (images, labels) in enumerate(train_ds):
        # Run forward pass and compute loss.
        with tf.GradientTape() as tape:
            logits = model(images, training=True)
            loss_value = model.compiled_loss(labels, logits)
        # Apply gradients to update weights.
        grads = tape.gradient(loss_value, model.trainable_variables)
        model.optimizer.apply_gradients(zip(grads, model.trainable_variables))

        # Compute batch accuracy using helper.
        acc_value = batch_accuracy(logits, labels)
        batch_size_now = int(images.shape[0])
        correct_now = float(acc_value) * batch_size_now

        # Update running epoch totals.
        epoch_correct += correct_now
        epoch_total += batch_size_now

    # Compute epoch accuracy from totals.
    epoch_acc = epoch_correct / epoch_total
    print("Epoch", epoch + 1, "train accuracy:", round(epoch_acc, 4))

# Evaluate accuracy on test split.
all_correct = 0
all_total = 0
for images, labels in test_ds:
    # Forward pass in inference mode.
    logits = model(images, training=False)
    acc_value = batch_accuracy(logits, labels)
    batch_size_now = int(images.shape[0])
    correct_now = float(acc_value) * batch_size_now

    # Update running test totals.
    all_correct += correct_now
    all_total += batch_size_now

# Compute final test accuracy value.
test_accuracy = all_correct / all_total
print("Test accuracy:", round(test_accuracy, 4))



## **3. Evaluating CNN Performance**

### **3.1. Confusion Matrix Basics**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_A/image_03_01.jpg?v=1769713992" width="250">



>* Confusion matrices show counts of true vs predicted
>* They reveal specific classification errors beyond accuracy

>* Confusion matrices reveal which classes get mixed up
>* Helps prioritize dangerous errors and guide improvements

>* Confusion matrices reveal imbalance and rare-class errors
>* They guide data, class, and threshold adjustments



In [None]:
#@title Python Code - Confusion Matrix Basics

# This script shows confusion matrix basics.
# We use a tiny CNN on MNIST digits.
# Focus is on evaluation not training.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic random seeds.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version briefly.
print("TensorFlow version:", tf.__version__)

# Load MNIST dataset from Keras.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Select a small subset for speed.
train_samples = 4000
test_samples = 1000
x_train = x_train[:train_samples]
y_train = y_train[:train_samples]

# Slice test data subset.
x_test = x_test[:test_samples]
y_test = y_test[:test_samples]

# Normalize pixel values to zero one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Add channel dimension for CNN.
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# Validate shapes before modeling.
assert x_train.shape[0] == y_train.shape[0]
assert x_test.shape[0] == y_test.shape[0]

# Build a simple CNN model.
model = keras.Sequential([
    layers.Conv2D(8, (3, 3), activation="relu", input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(32, activation="relu"),
    layers.Dense(10, activation="softmax"),
])

# Compile model with optimizer.
model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train briefly with silent output.
history = model.fit(
    x_train,
    y_train,
    epochs=2,
    batch_size=64,
    verbose=0,
    validation_split=0.1,
)

# Evaluate accuracy on test subset.
test_loss, test_acc = model.evaluate(
    x_test,
    y_test,
    verbose=0,
)

# Print overall test accuracy.
print("Test accuracy on subset:", round(float(test_acc), 4))

# Get predicted class probabilities.
y_prob = model.predict(x_test, verbose=0)

# Convert probabilities to predicted labels.
y_pred = np.argmax(y_prob, axis=1)

# Validate prediction shape matches labels.
assert y_pred.shape[0] == y_test.shape[0]

# Compute confusion matrix manually.
num_classes = 10
conf_matrix = np.zeros((num_classes, num_classes), dtype=int)

# Fill confusion matrix counts.
for true_label, pred_label in zip(y_test, y_pred):
    conf_matrix[true_label, pred_label] += 1

# Print small confusion matrix summary.
print("Confusion matrix shape:", conf_matrix.shape)
print("Diagonal correct predictions:", np.trace(conf_matrix))

# Show confusion matrix for first three classes.
print("Top left 3x3 confusion block:")
print(conf_matrix[:3, :3])

# Count misclassified examples for inspection.
mis_idx = np.where(y_pred != y_test)[0]
print("Total misclassified examples:", mis_idx.shape[0])

# Inspect first few misclassified pairs.
for i in mis_idx[:5]:
    print("Index", int(i), "true", int(y_test[i]), "pred", int(y_pred[i]))




### **3.2. Reviewing Misclassified Images**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_A/image_03_02.jpg?v=1769714077" width="250">



>* Inspect misclassified images with true and predicted labels
>* Decide if errors reflect ambiguity or model weaknesses

>* Compare many misclassified images to find patterns
>* Use patterns to uncover data and preprocessing biases

>* Use misclassified images to guide data improvements
>* Target risky failure modes to build safer models



In [None]:
#@title Python Code - Reviewing Misclassified Images

# This script reviews misclassified CNN predictions visually.
# We use TensorFlow to train a tiny CNN quickly.
# Then we plot a few misclassified MNIST images.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic seeds for reproducibility.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version in one short line.
print("TensorFlow version:", tf.__version__)

# Load MNIST dataset from Keras datasets.
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()

# Use a small subset for quick training.
train_samples = 800
test_samples = 200
x_train = x_train[:train_samples]
y_train = y_train[:train_samples]

# Slice test data to a manageable subset.
x_test = x_test[:test_samples]
y_test = y_test[:test_samples]

# Validate shapes before further processing.
assert x_train.shape[0] == train_samples
assert x_test.shape[0] == test_samples

# Normalize pixel values to range zero one.
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Add channel dimension for CNN input.
x_train = np.expand_dims(x_train, axis=-1)
x_test = np.expand_dims(x_test, axis=-1)

# Confirm final input shape is correct.
input_shape = x_train.shape[1:]
print("Input shape:", input_shape)

# Build a simple sequential CNN model.
model = keras.Sequential([
    layers.Conv2D(8, (3, 3), activation="relu", input_shape=input_shape),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(16, (3, 3), activation="relu"),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(32, activation="relu"),
    layers.Dense(10, activation="softmax"),
])

# Compile model with suitable loss and metric.
model.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Detect available device for information only.
physical_gpus = tf.config.list_physical_devices("GPU")
print("GPUs available:", len(physical_gpus))

# Train the model quietly for few epochs.
history = model.fit(
    x_train,
    y_train,
    epochs=3,
    batch_size=64,
    verbose=0,
    validation_split=0.1,
)

# Evaluate accuracy on the small test subset.
loss, acc = model.evaluate(x_test, y_test, verbose=0)
print("Test accuracy on subset:", round(float(acc), 3))

# Get predicted class probabilities for test set.
probs = model.predict(x_test, verbose=0)

# Convert probabilities to predicted class indices.
y_pred = np.argmax(probs, axis=1)

# Identify indices where predictions are incorrect.
mis_idx = np.where(y_pred != y_test)[0]

# Handle case with very few misclassifications.
if mis_idx.size == 0:
    print("No misclassified images found in subset.")
    mis_idx = np.array([0])

# Select up to nine misclassified indices.
max_examples = 9
selected_idx = mis_idx[:max_examples]

# Print a short textual summary of errors.
print("Total misclassified examples:", int(mis_idx.size))
print("Showing up to", int(selected_idx.size), "examples.")

# Import matplotlib for plotting misclassified images.
import matplotlib.pyplot as plt

# Create a square grid for visualization.
cols = 3
rows = int(np.ceil(selected_idx.size / cols))
fig, axes = plt.subplots(rows, cols, figsize=(6, 6))

# Ensure axes is always a flat iterable.
axes = np.array(axes).reshape(-1)

# Loop through selected misclassified indices.
for ax, idx in zip(axes, selected_idx):
    # Get image, true label, and predicted label.
    img = x_test[idx].squeeze()
    true_label = int(y_test[idx])
    pred_label = int(y_pred[idx])

    # Show grayscale image on the subplot.
    ax.imshow(img, cmap="gray")

    # Set title to compare true and predicted.
    ax.set_title(f"T:{true_label} P:{pred_label}")

    # Hide axis ticks for clarity.
    ax.axis("off")

# Turn off any unused subplot axes.
for ax in axes[len(selected_idx) :]:
    ax.axis("off")

# Adjust layout and display the figure.
plt.tight_layout()
plt.show()




### **3.3. Spotting Overfitting Patterns**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Master PyTorch 2.10.0/Module_05/Lecture_A/image_03_03.jpg?v=1769714168" width="250">



>* Overfitting means memorizing training quirks, not general patterns
>* Compare train versus validation metrics to detect overfitting

>* Watch for training and validation curves diverging
>* Use early signs to adjust model or regularization

>* Compare performance across varied conditions and subsets
>* Use confusion matrices to reveal brittle, memorized behavior



In [None]:
#@title Python Code - Spotting Overfitting Patterns

# This script shows CNN overfitting patterns simply.
# We use TensorFlow to train tiny image models.
# Then we compare training and validation performance.

# !pip install tensorflow==2.20.0.

# Import required standard libraries.
import os
import random
import numpy as np

# Import TensorFlow and Keras utilities.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Set deterministic random seeds.
seed_value = 42
random.seed(seed_value)
np.random.seed(seed_value)
tf.random.set_seed(seed_value)

# Print TensorFlow version briefly.
print("TensorFlow version:", tf.__version__)

# Load CIFAR10 dataset from Keras.
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()

# Confirm dataset shapes are as expected.
assert x_train.shape[1:] == (32, 32, 3)
assert y_train.shape[1:] == (1,)

# Select small training subset to force overfitting.
small_train_size = 500
x_train_small = x_train[:small_train_size]
y_train_small = y_train[:small_train_size]

# Select validation subset with different images.
val_size = 2000
x_val = x_train[small_train_size:small_train_size + val_size]
y_val = y_train[small_train_size:small_train_size + val_size]

# Normalize pixel values to range zero one.
x_train_small = x_train_small.astype("float32") / 255.0
x_val = x_val.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

# Define a simple CNN model function.
def create_cnn_model():
    model = keras.Sequential([
        layers.Conv2D(16, (3, 3), activation="relu", input_shape=(32, 32, 3)),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(32, (3, 3), activation="relu"),
        layers.MaxPooling2D((2, 2)),
        layers.Flatten(),
        layers.Dense(64, activation="relu"),
        layers.Dense(10, activation="softmax"),
    ])
    model.compile(
        optimizer="adam",
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"],
    )
    return model

# Create two identical CNN models.
model_overfit = create_cnn_model()
model_regularized = create_cnn_model()

# Add simple regularization to second model.
model_regularized = keras.Sequential([
    layers.Conv2D(16, (3, 3), activation="relu", input_shape=(32, 32, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Dropout(0.25),
    layers.Conv2D(32, (3, 3), activation="relu"),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(64, activation="relu"),
    layers.Dropout(0.5),
    layers.Dense(10, activation="softmax"),
])

# Compile the regularized model.
model_regularized.compile(
    optimizer="adam",
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
)

# Train overfitting model on tiny subset.
history_overfit = model_overfit.fit(
    x_train_small,
    y_train_small,
    epochs=10,
    batch_size=64,
    validation_data=(x_val, y_val),
    verbose=0,
)

# Train regularized model on same data.
history_reg = model_regularized.fit(
    x_train_small,
    y_train_small,
    epochs=10,
    batch_size=64,
    validation_data=(x_val, y_val),
    verbose=0,
)

# Extract final training and validation metrics.
train_acc_overfit = history_overfit.history["accuracy"][-1]
val_acc_overfit = history_overfit.history["val_accuracy"][-1]
train_loss_overfit = history_overfit.history["loss"][-1]
val_loss_overfit = history_overfit.history["val_loss"][-1]

# Extract metrics for regularized model.
train_acc_reg = history_reg.history["accuracy"][-1]
val_acc_reg = history_reg.history["val_accuracy"][-1]
train_loss_reg = history_reg.history["loss"][-1]
val_loss_reg = history_reg.history["val_loss"][-1]

# Print concise comparison to spot overfitting.
print("Overfit model train acc:", round(train_acc_overfit, 3))
print("Overfit model val acc:", round(val_acc_overfit, 3))
print("Overfit model train loss:", round(train_loss_overfit, 3))
print("Overfit model val loss:", round(val_loss_overfit, 3))

# Print regularized model metrics for contrast.
print("Regularized model train acc:", round(train_acc_reg, 3))
print("Regularized model val acc:", round(val_acc_reg, 3))
print("Regularized model train loss:", round(train_loss_reg, 3))
print("Regularized model val loss:", round(val_loss_reg, 3))

# Evaluate both models on held out test set.
_, test_acc_overfit = model_overfit.evaluate(x_test, y_test, verbose=0)
_, test_acc_reg = model_regularized.evaluate(x_test, y_test, verbose=0)

# Print final test accuracies to inspect generalization.
print("Overfit model test acc:", round(test_acc_overfit, 3))
print("Regularized model test acc:", round(test_acc_reg, 3))



# <font color="#418FDE" size="6.5" uppercase>**CNNs from Scratch**</font>


In this lecture, you learned to:
- Construct a convolutional neural network using Conv2d, pooling, and fully connected layers in PyTorch. 
- Train the CNN on a small image dataset using the standard training loop and appropriate loss and metrics. 
- Analyze model performance using accuracy, confusion matrices, and simple error inspection. 

In the next Lecture (Lecture B), we will go over 'Transfer Learning'