# Chapter 4 — Dipping Toes in Deep Learning
This chapter provides a practical introduction to three fundamental types of deep neural networks: Fully Connected Networks (FCNs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs). Each section includes implementation examples and real-world applications.

## 4.1 Fully Connected Networks (FCNs)
FCNs, also known as Multilayer Perceptrons (MLPs), form the foundation of deep learning. In this chapter, we explore autoencoders - a type of FCN used for unsupervised learning tasks like image reconstruction and denoising.

### Figure 4.1 — MNIST Dataset Samples
<p align='left'><img src='./figure/figure4.1.png' width='60%'></p>
The MNIST dataset contains 70,000 handwritten digit images (60,000 training, 10,000 test). Each image is 28×28 pixels with labels 0-9.

### 4.1.1 Autoencoder Architecture
Autoencoders learn to compress input data into a latent representation and then reconstruct it. They consist of:
- **Encoder**: Compresses input to latent space
- **Decoder**: Reconstructs input from latent space
- **Applications**: Denoising, dimensionality reduction, pretraining

### Figure 4.2 — Autoencoder Structure
<p align='left'><img src='./figure/figure4.2.png' width='60%'></p>
The autoencoder takes corrupted images as input and learns to reconstruct the original images through compression and reconstruction phases.

In [None]:
# Implementation of Denoising Autoencoder
from tensorflow.keras import layers, models

autoencoder = models.Sequential([
    layers.Dense(64, activation='relu', input_shape=(784,)),
    layers.Dense(32, activation='relu'),
    layers.Dense(64, activation='relu'),
    layers.Dense(784, activation='tanh')
])

autoencoder.compile(loss='mse', optimizer='adam')
autoencoder.summary()

### 4.1.2 Training Process
- **Input**: Corrupted images (50% pixels randomly set to 0)
- **Target**: Original clean images
- **Loss**: Mean Squared Error (MSE)
- **Result**: Model learns to reconstruct original digits from corrupted versions

### Figure 4.3 — Image Reconstruction Results
<p align='left'><img src='./figure/figure4.3.png' width='60%'></p>
Comparison between corrupted input images (top row) and reconstructed outputs (bottom row) showing the autoencoder's denoising capability.

## 4.2 Convolutional Neural Networks (CNNs)
CNNs revolutionized computer vision by preserving spatial information and learning hierarchical features through convolution and pooling operations.

### 4.2.1 CNN Architecture Components
- **Convolution Layers**: Extract spatial features using learnable filters
- **Pooling Layers**: Reduce spatial dimensions (MaxPooling, AveragePooling)
- **Fully Connected Layers**: Final classification layers

### Figure 4.4 — CNN Architecture Overview
<p align='left'><img src='./figure/figure4.4.png' width='60%'></p>
Typical CNN structure showing convolution-pooling blocks followed by fully connected layers for classification.

In [None]:
# CNN Implementation for CIFAR-10
from tensorflow.keras import layers, models
import tensorflow.keras.backend as K

K.clear_session()

cnn = models.Sequential([
    # First Conv-Pool Block
    layers.Conv2D(16, (3,3), strides=(2,2), activation='relu', 
                  padding='same', input_shape=(32,32,3)),
    layers.MaxPool2D(pool_size=(2,2), strides=(2,2), padding='same'),
    
    # Second Conv-Pool Block
    layers.Conv2D(32, (3,3), activation='relu', padding='same'),
    layers.MaxPool2D(pool_size=(2,2), strides=(2,2), padding='same'),
    
    # Fully Connected Layers
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(32, activation='relu'),
    layers.Dense(10, activation='softmax')
])

cnn.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
cnn.summary()

### 4.2.2 Convolution Operation Details
Key hyperparameters that affect convolution output:
- **Filters**: Number of output channels
- **Kernel Size**: Spatial dimensions of convolution window
- **Strides**: Step size for sliding the kernel
- **Padding**: 'same' (output size = input size) or 'valid' (no padding)

### Figure 4.5 — Convolution Operation Visualization
<p align='left'><img src='./figure/figure4.5.png' width='60%'></p>
Illustration of how convolution kernels slide over input images to produce feature maps.

### Figure 4.6 — Feature Hierarchy in CNNs
<p align='left'><img src='./figure/figure4.6.png' width='60%'></p>
Deep CNN layers learn hierarchical features: early layers detect edges, middle layers detect patterns, and deeper layers detect complex objects.

### 4.2.3 CIFAR-10 Implementation
- **Dataset**: 50,000 training and 10,000 test images across 10 classes
- **Image Size**: 32×32 RGB images
- **Classes**: Airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck
- **Results**: ~72% training accuracy after 25 epochs

## 4.3 Recurrent Neural Networks (RNNs)
RNNs are designed for sequential data and time series analysis, maintaining memory of previous inputs through hidden states.

### 4.3.1 RNN vs Feed-Forward Networks
- **Feed-Forward**: Each input processed independently
- **RNN**: Maintains hidden state that captures information from previous time steps
- **Applications**: Time series forecasting, natural language processing, speech recognition

### Figure 4.7 — RNN Architecture
<p align='left'><img src='./figure/figure4.7.png' width='60%'></p>
RNN processing sequence data where each time step receives current input and previous hidden state to produce output and updated hidden state.

### 4.3.2 CO2 Concentration Prediction
Practical application: Predicting future CO2 levels using historical data
- **Data Source**: Monthly CO2 concentration measurements since 1980
- **Task**: Predict next month's CO2 level using previous 12 months
- **Approach**: RNN learns temporal patterns in atmospheric CO2 data

In [None]:
# Data Preparation for CO2 Prediction
import pandas as pd
import requests
import os

def download_co2_data():
    """Download CO2 concentration data"""
    save_dir = "data"
    save_path = os.path.join(save_dir, 'co2-mm-gl.csv')
    
    if not os.path.exists(save_dir):
        os.makedirs(save_dir)
    
    if not os.path.exists(save_path):
        url = "https://datahub.io/core/co2-ppm/r/co2-mm-gl.csv"
        r = requests.get(url)
        with open(save_path, 'wb') as f:
            f.write(r.content)
    
    return save_path

# Load and explore data
data_path = download_co2_data()
co2_data = pd.read_csv(data_path)
print(co2_data.head())

### Figure 4.8 — CO2 Data Sample
<p align='left'><img src='./figure/figure4.8.png' width='60%'></p>
Sample of CO2 dataset showing date, decimal date, average concentration, and trend columns.

### 4.3.3 Sequence Processing
RNNs process sequences by:
- Maintaining internal state across time steps
- Learning temporal dependencies
- Using previous predictions to inform future ones
- Handling variable-length sequences

## Key Network Comparisons

### Table 4.1 — Network Type Applications
| Network Type | Best For | Key Features | Example Applications |
|-------------|----------|--------------|---------------------|
| FCNs | Tabular data, Simple patterns | Fully connected layers, No spatial preservation | Autoencoders, Basic classification |
| CNNs | Image data, Spatial patterns | Convolution layers, Parameter sharing, Translation invariance | Image classification, Object detection |
| RNNs | Sequential data, Time series | Hidden states, Temporal dependencies | Forecasting, NLP, Speech recognition |

## Practical Considerations

### 4.4.1 Hyperparameter Optimization
- Deep learning models often use empirically chosen architectures
- Hyperparameter optimization is computationally expensive
- Common strategies: Transfer learning, following published architectures, rules of thumb

### 4.4.2 Performance Bottlenecks
- **CNNs**: First fully connected layer after convolution blocks often contains most parameters
- **Memory**: Large dense layers can cause out-of-memory errors
- **Solution**: Use global average pooling instead of flattening for large spatial dimensions

## Exercises

### Exercise 1: Autoencoder Implementation
Implement an autoencoder with architecture: 512 → 32 → 16 → 512 using sigmoid activation for all layers.

### Exercise 2: CNN Output Size Calculation
Calculate the final output size for the given CNN architecture (ignoring batch dimension):
```python
models.Sequential([
    layers.Conv2D(16, (5,5), padding='valid', input_shape=(64,64,3)),
    layers.MaxPool2D(pool_size=(3,3), strides=(2,2), padding='same'),
    layers.Conv2D(32, (3,3), activation='relu', padding='same'),
    layers.MaxPool2D(pool_size=(2,2), strides=(2,2), padding='same'),
    layers.Conv2D(32, (3,3), strides=(2,2), activation='relu', padding='same')
])
```

## Chapter 4 Summary

This chapter provided hands-on experience with three fundamental deep learning architectures:

1. **Fully Connected Networks (Autoencoders)**: 
   - Unsupervised learning for image reconstruction
   - Compression and reconstruction phases
   - Applications in denoising and feature learning

2. **Convolutional Neural Networks**:
   - Specialized for spatial data like images
   - Hierarchical feature learning through convolution and pooling
   - Parameter efficiency through weight sharing

3. **Recurrent Neural Networks**:
   - Designed for sequential and time-series data
   - Maintain memory through hidden states
   - Applications in forecasting and temporal pattern recognition

Each network type has distinct strengths and is suited for different types of data and tasks. Understanding these fundamental architectures provides the foundation for exploring more advanced deep learning models and applications.