# Deep Computer Vision with CNNs
## Chapter 14 - Convolutional Neural Networks Implementation Guide

## 1. Introduction to CNNs

Convolutional Neural Networks (CNNs) are specialized for processing grid-like data (images, videos). Key advantages:
- **Local connectivity**: Neurons connect only to local regions (receptive fields)
- **Parameter sharing**: Same weights used across spatial locations
- **Hierarchical features**: Learn from simple to complex patterns

### Biological Inspiration:
Based on visual cortex organization discovered by Hubel & Wiesel (1959-1968)

## 2. Core CNN Components

### 2.1 Convolutional Layers
- Apply filters/kernels that detect patterns
- Output feature maps highlight where patterns occur
- Key parameters:
  - `filters`: Number of output channels
  - `kernel_size`: Spatial dimensions of filters (e.g., 3×3)
  - `strides`: Step size for sliding window
  - `padding`: 'valid' (no padding) or 'same' (keep dimensions)

### 2.2 Pooling Layers
- Reduce spatial dimensions (downsampling)
- Types:
  - Max pooling: Takes maximum value in window
  - Average pooling: Takes average value in window

In [None]:
import tensorflow as tf
from tensorflow.keras import layers

# Basic CNN architecture example
model = tf.keras.Sequential([
    # Convolutional block 1
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    layers.MaxPooling2D((2, 2)),
    
    # Convolutional block 2
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    # Classifier head
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

model.summary()

## 3. CNN Architectures

### 3.1 LeNet-5 (1998)
- First successful CNN architecture for digit recognition
- Key features:
  - Alternating convolutions and pooling
  - Tanh activation functions
  - Fully connected final layers

### 3.2 AlexNet (2012)
- Breakthrough ImageNet performance
- Innovations:
  - ReLU activation
  - Dropout regularization
  - Data augmentation
  - GPU acceleration

### 3.3 ResNet (2015)
- Introduced residual connections
- Enabled training of very deep networks (100+ layers)
- Key concept: Skip connections help gradient flow

In [None]:
# ResNet-style residual block implementation
class ResidualBlock(tf.keras.layers.Layer):
    def __init__(self, filters, strides=1, activation="relu", **kwargs):
        super().__init__(**kwargs)
        self.activation = tf.keras.activations.get(activation)
        self.main_layers = [
            tf.keras.layers.Conv2D(filters, 3, strides=strides, padding="same", use_bias=False),
            tf.keras.layers.BatchNormalization(),
            self.activation,
            tf.keras.layers.Conv2D(filters, 3, strides=1, padding="same", use_bias=False),
            tf.keras.layers.BatchNormalization()
        ]
        self.skip_layers = []
        if strides > 1:
            self.skip_layers = [
                tf.keras.layers.Conv2D(filters, 1, strides=strides, padding="same", use_bias=False),
                tf.keras.layers.BatchNormalization()
            ]
    
    def call(self, inputs):
        Z = inputs
        for layer in self.main_layers:
            Z = layer(Z)
        skip_Z = inputs
        for layer in self.skip_layers:
            skip_Z = layer(skip_Z)
        return self.activation(Z + skip_Z)

# Example usage
inputs = tf.keras.Input(shape=(32, 32, 3))
x = ResidualBlock(64)(inputs)
x = ResidualBlock(128, strides=2)(x)
outputs = tf.keras.layers.GlobalAvgPool2D()(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.summary()

## 4. Transfer Learning with Pretrained Models

### 4.1 Using Keras Applications
Leverage models pretrained on ImageNet:

In [None]:
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np

# Load pretrained model
model = ResNet50(weights="imagenet")

# Prepare image
img_path = "example.jpg"  # Replace with your image
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

# Make prediction
preds = model.predict(x)
print("Top predictions:", decode_predictions(preds, top=3)[0])

### 4.2 Fine-tuning for Custom Tasks
Adapt pretrained models to new datasets:

In [None]:
# Flower classification example
import tensorflow_datasets as tfds

# Load dataset
dataset, info = tfds.load("tf_flowers", as_supervised=True, with_info=True)
n_classes = info.features["label"].num_classes

# Preprocess function
def preprocess(image, label):
    image = tf.image.resize(image, [224, 224])
    image = preprocess_input(image)
    return image, label

# Prepare datasets
batch_size = 32
train_set = dataset["train"].map(preprocess).batch(batch_size).prefetch(1)

# Create model
base_model = ResNet50(weights="imagenet", include_top=False)
avg = tf.keras.layers.GlobalAveragePooling2D()(base_model.output)
output = tf.keras.layers.Dense(n_classes, activation="softmax")(avg)
model = tf.keras.Model(inputs=base_model.input, outputs=output)

# Freeze base model
for layer in base_model.layers:
    layer.trainable = False

# Compile and train
model.compile(optimizer="adam",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])
model.fit(train_set, epochs=5)

## 5. Advanced CNN Architectures

### 5.1 Inception Modules
- Use multiple filter sizes in parallel
- Efficient "network within network" design
- Reduces parameters while capturing multi-scale features

### 5.2 Xception Architecture
- Extreme version of Inception
- Depthwise separable convolutions
- More efficient computation

### 5.3 Attention Mechanisms
- Squeeze-and-Excitation Networks (SENet)
- Channel-wise attention for feature recalibration

## 6. Computer Vision Tasks

### 6.1 Object Detection
- **YOLO (You Only Look Once)**: Fast real-time detection
- **Faster R-CNN**: High accuracy with region proposals

### 6.2 Semantic Segmentation
- Fully Convolutional Networks (FCNs)
- U-Net architecture with skip connections
- Transposed convolutions for upsampling

## 7. Exercises

1. Implement a CNN from scratch for CIFAR-10 classification
2. Fine-tune a pretrained model on a custom dataset
3. Visualize CNN feature maps to understand what layers learn
4. Compare performance of different CNN architectures
5. Implement data augmentation for improved generalization

## 8. Key Takeaways

- CNNs excel at processing grid-like data through local connectivity and parameter sharing
- Modern architectures use techniques like residual connections and attention mechanisms
- Transfer learning is powerful for custom tasks with limited data
- Different architectures suit different tasks (classification, detection, segmentation)
- Proper preprocessing and augmentation are crucial for performance