# CNN Theory and Implementation with Keras

Convolutional Neural Networks (CNNs) are specialized neural networks designed for processing grid-structured data like images. Unlike fully connected networks that treat all pixels independently, CNNs exploit the spatial structure of images through local connectivity and weight sharing, making them far more efficient and effective for computer vision tasks.

We will build and train CNN models using Keras on a flowers classification dataset, exploring core concepts like convolution, pooling, and data augmentation.

## Table of Contents

1. [Setup and Dataset Loading](#Setup-and-Dataset-Loading)
2. [Why CNNs for Images?](#Why-CNNs-for-Images?)
3. [CNN Architecture Components](#CNN-Architecture-Components)
4. [Building a CNN with Keras](#Building-a-CNN-with-Keras)
5. [Training and Evaluating the Model](#Training-and-Evaluating-the-Model)
6. [Data Augmentation](#Data-Augmentation)
7. [Recap](#Recap)

## Setup and Dataset Loading

First, let's import the necessary libraries and load our dataset.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt
import pathlib
import os

# Set random seed for reproducibility
tf.random.set_seed(42)
np.random.seed(42)

%matplotlib inline

### Loading the Flowers Dataset

We use a dataset of approximately 3,700 flower photographs from 5 different species. This is a moderate-sized dataset perfect for learning CNNs.

In [None]:
# Download and extract the flowers dataset
dataset_url = 'https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz'
data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True)
data_dir = pathlib.Path(data_dir)

# Handle directory structure (Colab vs local)
contents = os.listdir(data_dir)
if 'flower_photos' in contents and len(contents) == 1:
    data_dir = os.path.join(data_dir, 'flower_photos')
    data_dir = pathlib.Path(data_dir)

# Count total images
image_count = len(list(data_dir.glob('*/*.jpg')))
print(f'Total images: {image_count}')
print(f'Categories: {sorted([item.name for item in data_dir.glob("*") if item.is_dir()])}')

### Creating Training and Validation Datasets

We split the data into 80% training and 20% validation. Images are resized to 96×96 pixels for computational efficiency.

In [None]:
image_size = (96, 96)
batch_size = 64

# Create training dataset (80% of data)
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset='training',
    seed=42,
    image_size=image_size,
    batch_size=batch_size
)

# Create validation dataset (20% of data)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset='validation',
    seed=42,
    image_size=image_size,
    batch_size=batch_size
)

class_names = train_ds.class_names
print(f'Classes: {class_names}')

### Visualizing Sample Images

Let's look at a few examples from our training dataset.

In [None]:
plt.figure(figsize=(10, 10))
for images, labels in train_ds.take(1):
    for i in range(9):
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(images[i].numpy().astype("uint8"))
        plt.title(class_names[labels[i]])
        plt.axis("off")
plt.tight_layout()
plt.show()

## Why CNNs for Images?

### The Problem with Fully Connected Networks

Consider using a standard fully connected (dense) neural network for image classification. A 96×96 RGB image has:

- 96 × 96 × 3 = **27,648 input features**

If we connect this to a hidden layer with just 1,000 neurons:

- Number of parameters: 27,648 × 1,000 = **27.6 million parameters**

This creates several problems:

- **Too many parameters**: Difficult to train, requires massive amounts of data
- **Ignores spatial structure**: Treats pixels at position (10, 10) and (90, 90) as completely unrelated
- **Not translation invariant**: Must learn the same pattern (e.g., an edge) at every position independently

### CNNs as the Solution

CNNs address these issues through:

- **Local connectivity**: Neurons connect only to a small region of the input
- **Weight sharing**: Same filter applied across the entire image
- **Translation invariance**: Features learned at one location work at any location

This dramatically reduces parameters while improving performance.

## CNN Architecture Components

A typical CNN consists of several types of layers working together to extract and classify features.

### 1. Convolutional Layers

Convolutional layers are the core building blocks of CNNs. They apply learned filters (kernels) that slide across the input image, detecting features like edges, textures, or more complex patterns.

**Key parameters:**
- `filters`: Number of different feature maps to learn
- `kernel_size`: Size of the sliding window (commonly 3×3 or 5×5)
- `padding`: `'valid'` (no padding) or `'same'` (output same size as input)
- `strides`: How far the filter moves (default is 1)
- `activation`: Typically `'relu'` for non-linearity

**Understanding Padding:**
- **`padding='same'`**: Adds zeros around the input so output size equals input size. Example: 96×96 input with 3×3 kernel stays 96×96.
- **`padding='valid'`**: No padding. Output shrinks. Example: 96×96 input with 3×3 kernel becomes 94×94.

**Understanding Strides:**
- **`strides=1`** (default): Filter moves 1 pixel at a time
- **`strides=2`**: Filter jumps 2 pixels, halving output size. Example: 96×96 → 48×48 (faster than pooling)

In [None]:
# Example: Convolution with different padding
sample_conv_same = layers.Conv2D(filters=32, kernel_size=3, activation='relu', padding='same')
sample_conv_valid = layers.Conv2D(filters=32, kernel_size=3, activation='relu', padding='valid')

# For a 96×96×3 input:
# - 'same' padding: output is 96×96×32
# - 'valid' padding: output is 94×94×32
# - Parameters: (3×3×3 input channels) × 32 filters + 32 biases = 896 parameters
print(f'Conv2D parameters: {(3*3*3) * 32 + 32}')
print("With 'same' padding: 96×96 → 96×96")
print("With 'valid' padding: 96×96 → 94×94")

### 2. Pooling Layers

Pooling layers reduce spatial dimensions while retaining important features. This reduces computation, adds robustness to small translations, and helps prevent overfitting.

**MaxPooling vs AveragePooling:**

- **MaxPooling2D**: Takes the maximum value in each window
  - Preserves strongest features (e.g., edges, important patterns)
  - Most commonly used
  - Best for feature detection tasks

- **AveragePooling2D**: Computes the average value
  - Smooths features
  - Less common, but useful for reducing noise
  - Best for smoother downsampling

**Concrete Example:**

Consider a 4×4 feature map with a 2×2 pooling window:

```
Input:          MaxPool Result:   AvgPool Result:
[1  2  | 5  6]   [4  | 8]         [2.5 | 6.0]
[3  4  | 7  8]   [---+---]        [----+----]
[----+----]      [12 | 16]        [10.5| 14.5]
[9  10 | 13 14]
[11 12 | 15 16]
```

**GlobalAveragePooling2D:**
- Reduces entire feature map (H×W) to a single value per channel
- Example: 12×12×128 → 1×1×128 (then squeezed to 128)
- Alternative to Flatten, reduces parameters significantly

In [None]:
# Demonstrate pooling operations
# MaxPooling with 2×2 window reduces each dimension by half
pool_layer = layers.MaxPooling2D(pool_size=(2, 2))

# For a 96×96 input, this produces a 48×48 output
# Each 2×2 region becomes a single pixel (the maximum value)

# GlobalAveragePooling alternative to Flatten
# Reduces each entire feature map to one value
global_pool = layers.GlobalAveragePooling2D()

print("MaxPooling (2×2): 96×96 → 48×48 (keeps strongest features)")
print("AveragePooling (2×2): 96×96 → 48×48 (smooths features)")
print("GlobalAveragePooling: 12×12×128 → 128 (one value per channel)")

### 3. Activation Functions

Activation functions introduce non-linearity, allowing the network to learn complex patterns.

**ReLU (Rectified Linear Unit)** is the most common choice:
- Formula: `f(x) = max(0, x)`
- Advantages: Fast computation, helps avoid vanishing gradient, introduces sparsity
- Applied after convolutional layers

### 4. Flatten and Dense Layers

After extracting features with convolutional and pooling layers:

- **Flatten**: Converts 3D feature maps to 1D vector
- **Dense layers**: Fully connected layers that learn combinations of features
- **Output layer**: Dense layer with `softmax` activation for multi-class classification

In [None]:
# After convolutional layers, we have a 3D tensor (batch, height, width, channels)
# Flatten converts it to (batch, height × width × channels)

flatten_layer = layers.Flatten()  # Converts 3D to 1D
dense_layer = layers.Dense(64, activation='relu')  # Learns feature combinations
output_layer = layers.Dense(5, activation='softmax')  # 5 classes, produces probabilities

print("Flatten: 12×12×128 → 18,432 features")
print("Dense: Learns which feature combinations predict each class")
print("Output: Softmax produces class probabilities that sum to 1.0")

## Building a CNN with Keras

Now we'll build a complete CNN architecture using the Functional API. Our model will have:

1. Input layer (96×96×3)
2. Rescaling layer (normalize pixel values to 0-1)
3. Three convolutional blocks (Conv2D + MaxPooling)
4. Flatten layer
5. Dense layer
6. Output layer (5 classes)

### Model Architecture

In [None]:
# Input layer
inputs = keras.Input(shape=image_size + (3,), name='input')

# Rescaling to [0, 1] - normalizes pixel values for better training
x = layers.Rescaling(1./255)(inputs)

# First convolutional block: detect basic patterns (edges, colors)
x = layers.Conv2D(32, 3, padding='same', activation='relu', name='conv_1')(x)
x = layers.MaxPooling2D(pool_size=(2, 2), name='pool_1')(x)  # 96×96 → 48×48

# Second convolutional block: detect more complex patterns (textures)
x = layers.Conv2D(64, 3, padding='same', activation='relu', name='conv_2')(x)
x = layers.MaxPooling2D(pool_size=(2, 2), name='pool_2')(x)  # 48×48 → 24×24

# Third convolutional block: detect high-level patterns (petal shapes, flower parts)
x = layers.Conv2D(128, 3, padding='same', activation='relu', name='conv_3')(x)
x = layers.MaxPooling2D(pool_size=(2, 2), name='pool_3')(x)  # 24×24 → 12×12

# Flatten 3D features to 1D for classification
x = layers.Flatten(name='flatten')(x)

# Dense layer learns combinations of features
x = layers.Dense(128, activation='relu', name='dense')(x)

# Output layer with 5 neurons (one per flower class)
outputs = layers.Dense(len(class_names), activation='softmax', name='output')(x)

# Create model
model = keras.Model(inputs=inputs, outputs=outputs, name='flower_cnn')

### Model Summary

Let's inspect the architecture and count parameters.

In [None]:
model.summary()

**Observations:**

- The spatial dimensions decrease progressively: 96×96 → 48×48 → 24×24 → 12×12
- The number of feature maps increases: 32 → 64 → 128
- Total parameters are much fewer than a fully connected network
- Most parameters are in the dense layers, not the convolutional layers

### Checking Data Shapes

Verify the input data shape matches our model.

In [None]:
for image_batch, labels_batch in train_ds.take(1):
    print(f'Image batch shape: {image_batch.shape}')  # (batch_size, height, width, channels)
    print(f'Labels batch shape: {labels_batch.shape}')  # (batch_size,)
    print(f'\nFirst image min/max values: {image_batch[0].numpy().min():.0f}/{image_batch[0].numpy().max():.0f}')

## Training and Evaluating the Model

### Compiling the Model

We configure the model for training with:
- **Optimizer**: Adam (adaptive learning rate, works well by default)
- **Loss**: Sparse categorical crossentropy (for integer labels, not one-hot)
- **Metrics**: Accuracy (percentage of correct predictions)

In [None]:
model.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=['accuracy']
)

### Training

Train the model for a modest number of epochs.

In [None]:
epochs = 10
history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=epochs
)

### Visualizing Training History

Plot training and validation accuracy and loss to understand model behavior.

In [None]:
import pandas as pd

def plot_training_history(history):
    """Plot training and validation metrics."""
    hist = pd.DataFrame(history.history)
    hist['epoch'] = history.epoch

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

    # Loss plot shows how well model fits the data
    ax1.plot(hist['epoch'], hist['loss'], label='Train Loss', marker='o')
    ax1.plot(hist['epoch'], hist['val_loss'], label='Val Loss', marker='s')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.set_title('Training and Validation Loss')
    ax1.legend()
    ax1.grid(True, alpha=0.3)

    # Accuracy plot shows prediction correctness
    ax2.plot(hist['epoch'], hist['accuracy'], label='Train Accuracy', marker='o')
    ax2.plot(hist['epoch'], hist['val_accuracy'], label='Val Accuracy', marker='s')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy')
    ax2.set_title('Training and Validation Accuracy')
    ax2.legend()
    ax2.grid(True, alpha=0.3)

    plt.tight_layout()
    plt.show()

plot_training_history(history)

**Interpretation:**

- If validation loss decreases with training loss: model is learning
- If validation loss stops decreasing or increases: potential overfitting
- Gap between training and validation accuracy indicates generalization performance

### Evaluating on Validation Set

Check final performance metrics.

In [None]:
val_loss, val_accuracy = model.evaluate(val_ds, verbose=0)
print(f'Validation Loss: {val_loss:.4f}')
print(f'Validation Accuracy: {val_accuracy:.4f}')

## Data Augmentation

Data augmentation artificially increases dataset diversity by applying random transformations to training images. This is crucial for preventing overfitting and improving generalization when training data is limited.

### Why Data Augmentation?

- **Prevents overfitting**: Model sees variations of each image, not just memorizing exact examples
- **Improves generalization**: Learns to be invariant to rotations, flips, zooms, etc.
- **Increases effective dataset size**: No need to collect more images

### Common Augmentation Techniques

Keras provides built-in augmentation layers:

- **RandomFlip**: Horizontal/vertical mirroring (useful when orientation doesn't matter)
- **RandomRotation**: Small rotations (realistic for flowers at different angles)
- **RandomZoom**: Simulates different camera distances
- **RandomContrast**: Handles varying lighting conditions

### Creating an Augmentation Pipeline

In [None]:
# Define augmentation pipeline - applied randomly during training only
data_augmentation = keras.Sequential([
    layers.RandomFlip("horizontal_and_vertical"),  # Mirrors image
    layers.RandomRotation(0.2),  # Rotate by ±20% of 360° = ±72°
    layers.RandomZoom(0.2),      # Zoom in/out by ±20%
    layers.RandomContrast(0.2),  # Adjust contrast for lighting variations
], name='data_augmentation')

### Visualizing Augmented Images

Let's see how augmentation transforms our images.

In [None]:
plt.figure(figsize=(10, 10))
for images, _ in train_ds.take(1):
    for i in range(9):
        # Apply augmentation - each call produces different transformations
        augmented_images = data_augmentation(images, training=True)
        ax = plt.subplot(3, 3, i + 1)
        plt.imshow(augmented_images[0].numpy().astype("uint8"))
        plt.title(f'Augmented {i+1}')
        plt.axis("off")
plt.tight_layout()
plt.show()

print("Notice: Each transformation is applied randomly with some probability")

### Building a Model with Augmentation

We integrate augmentation into the model architecture. It's applied only during training, not during validation or prediction.

In [None]:
# Build model with augmentation at the beginning
inputs = keras.Input(shape=image_size + (3,), name='input')

# Data augmentation (only applied when training=True)
x = data_augmentation(inputs)

# Rescaling
x = layers.Rescaling(1./255)(x)

# Convolutional blocks (same architecture as before)
x = layers.Conv2D(32, 3, padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)

x = layers.Conv2D(64, 3, padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)

x = layers.Conv2D(128, 3, padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)

# Classifier
x = layers.Flatten()(x)
x = layers.Dense(128, activation='relu')(x)
outputs = layers.Dense(len(class_names), activation='softmax')(x)

model_aug = keras.Model(inputs=inputs, outputs=outputs, name='flower_cnn_aug')

# Compile
model_aug.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=['accuracy']
)

### Training with Augmentation

In [None]:
epochs = 15
history_aug = model_aug.fit(
    train_ds,
    validation_data=val_ds,
    epochs=epochs
)

### Comparing Results

In [None]:
plot_training_history(history_aug)

val_loss_aug, val_accuracy_aug = model_aug.evaluate(val_ds, verbose=0)
print(f'\nValidation Results:')
print(f'Without augmentation: {val_accuracy:.4f}')
print(f'With augmentation: {val_accuracy_aug:.4f}')

**Expected observations:**

- Training may be slower (more data variations to process)
- Training accuracy might be lower (task is harder with augmentation)
- Validation accuracy often improves (better generalization)
- Smaller gap between training and validation metrics

## Recap

**Why CNNs:**
- Fully connected networks have too many parameters for images
- CNNs use local connectivity and weight sharing for efficiency
- Translation invariance makes CNNs effective for vision tasks

**CNN Components:**
- **Conv2D**: Extracts features using learned filters
- **MaxPooling**: Reduces spatial dimensions, keeps strongest features
- **Padding='same'**: Maintains spatial dimensions
- **Stride**: Controls filter movement and output size
- **ReLU**: Introduces non-linearity
- **Dense layers**: Learns feature combinations for classification

**Keras Workflow:**
- Functional API for flexible model building
- `model.summary()` shows architecture and parameters
- `compile()`, `fit()`, `evaluate()` for training

**Data Augmentation:**
- Random transformations prevent overfitting
- Applied during training only (not validation/prediction)
- Improves generalization with limited data