# Chapter 4 — Dipping Toes in Deep Learning
This chapter provides a practical introduction to three fundamental types of deep neural networks: Fully Connected Networks (FCNs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs). Each section includes implementation examples and real-world applications.

## 4.1 Fully Connected Networks (FCNs)
FCNs, also known as Multilayer Perceptrons (MLPs), form the foundation of deep learning. In this chapter, we explore autoencoders - a type of FCN used for unsupervised learning tasks like image reconstruction and denoising.

### 4.1.1 Autoencoder Architecture
Autoencoders learn to compress input data into a latent representation and then reconstruct it. They consist of:
- **Encoder**: Compresses input to latent space
- **Decoder**: Reconstructs input from latent space
- **Applications**: Denoising, dimensionality reduction, pretraining

**Key Concept**: The autoencoder takes corrupted images as input and learns to reconstruct the original images through compression and reconstruction phases.

In [None]:
# Implementation of Denoising Autoencoder
from tensorflow.keras import layers, models

autoencoder = models.Sequential([
    layers.Dense(64, activation='relu', input_shape=(784,)),
    layers.Dense(32, activation='relu'),
    layers.Dense(64, activation='relu'),
    layers.Dense(784, activation='tanh')
])

autoencoder.compile(loss='mse', optimizer='adam')
autoencoder.summary()

### 4.1.2 Training Process
- **Input**: Corrupted images (50% pixels randomly set to 0)
- **Target**: Original clean images
- **Loss**: Mean Squared Error (MSE)
- **Result**: Model learns to reconstruct original digits from corrupted versions
- **Dataset**: MNIST - 70,000 handwritten digit images (28×28 pixels)

In [None]:
# Data preprocessing for MNIST
import numpy as np
from tensorflow.keras.datasets.mnist import load_data

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = load_data()

def generate_masked_inputs(x, p, seed=None):
    """Generate corrupted images by randomly setting pixels to 0"""
    if seed:
        np.random.seed(seed)
    mask = np.random.binomial(n=1, p=p, size=x.shape).astype('float32')
    return x * mask

# Normalize and reshape MNIST data
norm_x_train = ((x_train - 128.0)/128.0).reshape([-1,784])
masked_x_train = generate_masked_inputs(norm_x_train, 0.5)

print(f"Original data shape: {x_train.shape}")
print(f"Normalized data shape: {norm_x_train.shape}")
print(f"Masked data shape: {masked_x_train.shape}")

### 4.1.3 Results and Applications
- **Performance**: Loss decreases from ~0.15 to ~0.078 over 10 epochs
- **Visual Results**: Model successfully reconstructs corrupted digit images
- **Real-world Use**: Photo restoration, feature learning, pretraining for supervised tasks

In [None]:
# Train the autoencoder
history = autoencoder.fit(masked_x_train, norm_x_train, 
                        batch_size=64, epochs=10, verbose=1)

# Test reconstruction
test_sample = x_train[:5]
norm_test = ((test_sample - 128.0)/128.0).reshape([-1,784])
masked_test = generate_masked_inputs(norm_test, 0.5, seed=2048)
reconstructed = autoencoder.predict(masked_test)

print("Reconstruction completed!")
print(f"Input shape: {masked_test.shape}")
print(f"Output shape: {reconstructed.shape}")

## 4.2 Convolutional Neural Networks (CNNs)
CNNs revolutionized computer vision by preserving spatial information and learning hierarchical features through convolution and pooling operations.

### 4.2.1 CNN Architecture Components
- **Convolution Layers**: Extract spatial features using learnable filters
- **Pooling Layers**: Reduce spatial dimensions (MaxPooling, AveragePooling)
- **Fully Connected Layers**: Final classification layers
- **Key Advantage**: Parameter efficiency through weight sharing

In [None]:
# CNN Implementation for CIFAR-10
from tensorflow.keras import layers, models
import tensorflow.keras.backend as K

K.clear_session()

cnn = models.Sequential([
    # First Conv-Pool Block
    layers.Conv2D(16, (3,3), strides=(2,2), activation='relu', 
                  padding='same', input_shape=(32,32,3)),
    layers.MaxPool2D(pool_size=(2,2), strides=(2,2), padding='same'),
    
    # Second Conv-Pool Block
    layers.Conv2D(32, (3,3), activation='relu', padding='same'),
    layers.MaxPool2D(pool_size=(2,2), strides=(2,2), padding='same'),
    
    # Fully Connected Layers
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(32, activation='relu'),
    layers.Dense(10, activation='softmax')
])

cnn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
cnn.summary()

### 4.2.2 Convolution Operation Details
Key hyperparameters that affect convolution output:

**Filters**: Number of output channels  
**Kernel Size**: Spatial dimensions of convolution window  
**Strides**: Step size for sliding the kernel  
**Padding**: 'same' (output size = input size) or 'valid' (no padding)

**Output Size Formula**:  
For valid padding: `output_size = (input_size - kernel_size) / stride + 1`  
For same padding: `output_size = input_size / stride` (rounded up)

In [None]:
# Calculate convolution output sizes
def calculate_output_size(input_size, kernel_size, stride, padding):
    """Calculate convolution output size to avoid errors"""
    if padding == 'valid':
        return (input_size - kernel_size) // stride + 1
    else:  # 'same'
        return (input_size + stride - 1) // stride

# Example: Check CNN architecture validity
input_size = 32
layers_config = [
    {'type': 'conv', 'kernel': 3, 'stride': 2, 'padding': 'same'},
    {'type': 'pool', 'kernel': 2, 'stride': 2, 'padding': 'same'},
    {'type': 'conv', 'kernel': 3, 'stride': 1, 'padding': 'same'},
    {'type': 'pool', 'kernel': 2, 'stride': 2, 'padding': 'same'}
]

current_size = input_size
for i, layer in enumerate(layers_config):
    current_size = calculate_output_size(current_size, layer['kernel'], layer['stride'], layer['padding'])
    print(f"After layer {i+1} ({layer['type']}): {current_size}")

print(f"\nFinal feature map size before flattening: {current_size}")

### 4.2.3 CIFAR-10 Implementation
- **Dataset**: 50,000 training and 10,000 test images across 10 classes
- **Image Size**: 32×32 RGB images
- **Classes**: Airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
- **Results**: ~72% training accuracy after 25 epochs
- **Application**: Vehicle detection feasibility study

In [None]:
# Data preparation for CIFAR-10
import tensorflow_datasets as tfds
import tensorflow as tf

def format_data(x, depth):
    """Convert images to float32 and labels to one-hot"""
    return (tf.cast(x["image"], 'float32'), tf.one_hot(x["label"], depth=depth))

# Load and prepare dataset
data = tfds.load('cifar10')
tr_data = data["train"].map(lambda x: format_data(x, depth=10)).batch(32)

# Inspect the data
for batch in tr_data.take(1):
    images, labels = batch
    print(f"Batch images shape: {images.shape}")
    print(f"Batch labels shape: {labels.shape}")
    print(f"Image range: [{tf.reduce_min(images):.2f}, {tf.reduce_max(images):.2f}]")
    break

In [None]:
# Train the CNN model
print("Training CNN on CIFAR-10...")
history = cnn.fit(tr_data, epochs=5, verbose=1)  # Using 5 epochs for demonstration

print("Training completed!")
final_accuracy = history.history['acc'][-1]
print(f"Final training accuracy: {final_accuracy:.2%}")

## 4.3 Recurrent Neural Networks (RNNs)
RNNs are designed for sequential data and time series analysis, maintaining memory of previous inputs through hidden states.

### 4.3.1 RNN vs Feed-Forward Networks
- **Feed-Forward**: Each input processed independently
- **RNN**: Maintains hidden state that captures information from previous time steps
- **Applications**: Time series forecasting, natural language processing, speech recognition
- **Key Concept**: RNNs use previous context to inform current predictions

### 4.3.2 CO2 Concentration Prediction
Practical application: Predicting future CO2 levels using historical data
- **Data Source**: Monthly CO2 concentration measurements since 1980
- **Task**: Predict next month's CO2 level using previous 12 months
- **Approach**: RNN learns temporal patterns in atmospheric CO2 data
- **Columns**: Date, Decimal Date, Average CO2, Trend

In [None]:
# Data Preparation for CO2 Prediction
import pandas as pd
import requests
import os

def download_co2_data():
    """Download CO2 concentration data"""
    save_dir = "data"
    save_path = os.path.join(save_dir, 'co2-mm-gl.csv')
    
    if not os.path.exists(save_dir):
        os.makedirs(save_dir)
    
    if not os.path.exists(save_path):
        url = "https://datahub.io/core/co2-ppm/r/co2-mm-gl.csv"
        r = requests.get(url)
        with open(save_path, 'wb') as f:
            f.write(r.content)
        print("Data downloaded successfully!")
    else:
        print("Data already exists!")
    
    return save_path

# Load and explore data
data_path = download_co2_data()
co2_data = pd.read_csv(data_path)

print("CO2 Data Sample:")
print(co2_data.head())
print(f"\nDataset shape: {co2_data.shape}")
print(f"\nColumns: {co2_data.columns.tolist()}")
print(f"\nDate range: {co2_data['Date'].min()} to {co2_data['Date'].max()}")
print(f"\nCO2 concentration range: {co2_data['Average'].min():.2f} to {co2_data['Average'].max():.2f}")

### 4.3.3 Sequence Processing in RNNs
RNNs process sequences by:
- Maintaining internal state across time steps
- Learning temporal dependencies
- Using previous predictions to inform future ones
- Handling variable-length sequences
- **Mathematical Form**: h_t = f(W * x_t + U * h_{t-1} + b)

In [None]:
# Simple RNN implementation for CO2 prediction
from tensorflow.keras import layers, models
import numpy as np

# Prepare sequential data
def create_sequences(data, sequence_length=12):
    """Create sequences for RNN training"""
    sequences = []
    targets = []
    
    for i in range(len(data) - sequence_length):
        seq = data[i:i + sequence_length]
        target = data[i + sequence_length]
        sequences.append(seq)
        targets.append(target)
    
    return np.array(sequences), np.array(targets)

# Use CO2 average values
co2_values = co2_data['Average'].values
sequences, targets = create_sequences(co2_values, sequence_length=12)

print(f"Sequences shape: {sequences.shape}")
print(f"Targets shape: {targets.shape}")
print(f"Sample sequence: {sequences[0]}")
print(f"Corresponding target: {targets[0]}")

## Key Network Comparisons

### Table 4.1 — Network Type Applications
| Network Type | Best For | Key Features | Example Applications |
|-------------|----------|--------------|---------------------|
| FCNs | Tabular data, Simple patterns | Fully connected layers, No spatial preservation | Autoencoders, Basic classification |
| CNNs | Image data, Spatial patterns | Convolution layers, Parameter sharing, Translation invariance | Image classification, Object detection |
| RNNs | Sequential data, Time series | Hidden states, Temporal dependencies | Forecasting, NLP, Speech recognition |

## Practical Considerations

### 4.4.1 Hyperparameter Optimization
- Deep learning models often use empirically chosen architectures
- Hyperparameter optimization is computationally expensive
- Common strategies: Transfer learning, following published architectures, rules of thumb
- **Rule of Thumb**: Reduce output size as you go deeper into the network

### 4.4.2 Performance Bottlenecks
- **CNNs**: First fully connected layer after convolution blocks often contains most parameters
- **Memory**: Large dense layers can cause out-of-memory errors
- **Example**: 8×8×256 input to 1024-node dense layer = 16.7M parameters
- **Solution**: Use global average pooling instead of flattening for large spatial dimensions

In [None]:
# Compare Flatten vs Global Average Pooling
from tensorflow.keras import layers

input_shape = (8, 8, 256)

# Method 1: Flatten (many parameters)
model_flatten = models.Sequential([
    layers.Input(shape=input_shape),
    layers.Flatten(),
    layers.Dense(1024, activation='relu')
])

# Method 2: Global Average Pooling (few parameters)
model_gap = models.Sequential([
    layers.Input(shape=input_shape),
    layers.GlobalAveragePooling2D(),
    layers.Dense(1024, activation='relu')
])

print("Flatten method:")
model_flatten.summary()

print("\nGlobal Average Pooling method:")
model_gap.summary()

## Exercises

### Exercise 1: Autoencoder Implementation
Implement an autoencoder with architecture: 512 → 32 → 16 → 512 using sigmoid activation for all layers.

In [None]:
# Exercise 1 Solution
def create_autoencoder():
    autoencoder = models.Sequential([
        layers.Dense(32, activation='sigmoid', input_shape=(512,)),
        layers.Dense(16, activation='sigmoid'),
        layers.Dense(32, activation='sigmoid'),
        layers.Dense(512, activation='sigmoid')
    ])
    
    autoencoder.compile(loss='mse', optimizer='adam')
    return autoencoder

exercise_autoencoder = create_autoencoder()
exercise_autoencoder.summary()

### Exercise 2: CNN Output Size Calculation
Calculate the final output size for the given CNN architecture (ignoring batch dimension):

In [None]:
# Exercise 2 Solution
def calculate_cnn_output():
    input_size = 64
    
    # Layer 1: Conv2D with valid padding
    size_after_conv1 = (input_size - 5) // 1 + 1  # 60
    
    # Layer 2: MaxPool with same padding
    size_after_pool1 = (size_after_conv1 + 2 - 1) // 2  # 30
    
    # Layer 3: Conv2D with same padding
    size_after_conv2 = (size_after_pool1 + 1 - 1) // 1  # 30
    
    # Layer 4: MaxPool with same padding
    size_after_pool2 = (size_after_conv2 + 2 - 1) // 2  # 15
    
    # Layer 5: Conv2D with same padding and stride 2
    size_after_conv3 = (size_after_pool2 + 2 - 1) // 2  # 8
    
    return size_after_conv3

final_size = calculate_cnn_output()
print(f"Final output size: {final_size}×{final_size}")
print(f"With 32 filters, final output shape would be: (None, {final_size}, {final_size}, 32)")

### Exercise 3: Data Pipeline Implementation
Create a data pipeline for the CO2 dataset that prepares sequences of 12 months for RNN training.

In [None]:
# Exercise 3 Solution
def create_co2_pipeline(data, sequence_length=12, batch_size=32):
    """Create a tf.data pipeline for CO2 sequences"""
    
    # Normalize the data
    co2_values = data['Average'].values
    co2_mean = co2_values.mean()
    co2_std = co2_values.std()
    normalized_co2 = (co2_values - co2_mean) / co2_std
    
    # Create sequences
    sequences, targets = create_sequences(normalized_co2, sequence_length)
    
    # Create tf.data.Dataset
    dataset = tf.data.Dataset.from_tensor_slices((sequences, targets))
    dataset = dataset.shuffle(1000).batch(batch_size)
    
    return dataset, co2_mean, co2_std

# Test the pipeline
co2_dataset, mean, std = create_co2_pipeline(co2_data)

print("CO2 Data Pipeline created successfully!")
print(f"Normalization - Mean: {mean:.2f}, Std: {std:.2f}")

for seq_batch, target_batch in co2_dataset.take(1):
    print(f"Sequence batch shape: {seq_batch.shape}")
    print(f"Target batch shape: {target_batch.shape}")
    break

## Chapter 4 Summary

This chapter provided hands-on experience with three fundamental deep learning architectures:

1. **Fully Connected Networks (Autoencoders)**: 
   - Unsupervised learning for image reconstruction
   - Compression and reconstruction phases
   - Applications in denoising and feature learning
   - MNIST dataset: 70,000 28×28 handwritten digit images

2. **Convolutional Neural Networks**:
   - Specialized for spatial data like images
   - Hierarchical feature learning through convolution and pooling
   - Parameter efficiency through weight sharing
   - CIFAR-10 dataset: 60,000 32×32 color images across 10 classes
   - Achieved ~72% training accuracy

3. **Recurrent Neural Networks**:
   - Designed for sequential and time-series data
   - Maintain memory through hidden states
   - Applications in forecasting and temporal pattern recognition
   - CO2 concentration prediction using historical data

**Key Takeaways**:
- Each network type has distinct strengths for different data types
- Proper parameter selection is crucial to avoid errors
- Understanding these fundamentals enables exploration of advanced architectures
- Real-world applications demonstrate practical utility of each approach

**Next Steps**:
- Explore advanced architectures (ResNet, LSTM, Transformers)
- Learn about regularization techniques to prevent overfitting
- Study transfer learning and fine-tuning pre-trained models
- Experiment with different optimization strategies and loss functions