# CNN Architecture Deep Dive: Understanding Every Layer

This notebook provides an exhaustive understanding of Convolutional Neural Networks (CNNs) with detailed explanations of every component.

## Complete Learning Objectives:
1. **Convolution Operation**: Mathematics, padding, stride, dilation
2. **Pooling Layers**: Max, average, global pooling mechanics
3. **Feature Maps**: What CNNs actually learn and see
4. **Receptive Fields**: How information flows through layers
5. **Architecture Patterns**: LeNet, AlexNet, VGG, ResNet principles
6. **Parameter Calculations**: Exact formulas for memory and computation
7. **Implementation**: From scratch understanding with TensorFlow

**Prerequisites**: Complete `01_deep_learning_foundations.ipynb` and `02_neural_network_fundamentals.ipynb`

In [1]:
# Cell 1: Comprehensive CNN Library Setup
"""
CNN-SPECIFIC LIBRARY EXPLANATIONS:

Computer Vision Libraries:
- tensorflow: Core deep learning framework with excellent CNN support
- opencv-python (cv2): Advanced image processing operations
- PIL/Pillow: Python Imaging Library for basic image operations
- imageio: Reading and writing various image formats

Scientific Computing for CNNs:
- numpy: Multi-dimensional array operations (essential for image tensors)
- scipy: Signal processing functions (convolution operations)
- matplotlib: Visualization of images, filters, feature maps
- seaborn: Statistical plots for model performance analysis

Specialized CNN Tools:
- tensorflow.keras.applications: Pre-trained models (VGG, ResNet, etc.)
- tensorflow.keras.preprocessing: Image augmentation and preprocessing
- tensorflow.keras.utils: Model visualization and utilities
"""

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, applications, preprocessing
from tensorflow.keras.utils import plot_model

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import ndimage
import cv2
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.decomposition import PCA

import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
tf.random.set_seed(RANDOM_SEED)

print(f"🔧 CNN ENVIRONMENT SETUP")
print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")
print(f"OpenCV version: {cv2.__version__}")
print(f"GPU available: {len(tf.config.list_physical_devices('GPU')) > 0}")
if len(tf.config.list_physical_devices('GPU')) > 0:
    print(f"GPU devices: {tf.config.list_physical_devices('GPU')}")
print(f"Random seed: {RANDOM_SEED}")

# Configure plotting for high-quality CNN visualizations
plt.style.use('seaborn-v0_8')
plt.rcParams['figure.figsize'] = (14, 10)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.grid'] = True
plt.rcParams['grid.alpha'] = 0.3

print("\n✅ All CNN-specific libraries imported and configured successfully!")
print("🎯 Ready for deep CNN exploration!")

ModuleNotFoundError: No module named 'tensorflow'

## 1. Convolution Operation: The Heart of CNNs

**What is Convolution?**
- **Mathematical operation**: Sliding a filter (kernel) over an input to detect features
- **Biological inspiration**: How visual cortex processes images
- **Key insight**: Local features (edges, textures) matter more than global position

**Convolution Mathematics:**
```
Output[i,j] = Σ Σ Input[i+m,j+n] * Filter[m,n]
               m n
```

**Critical Parameters:**
1. **Filter Size**: Usually 3x3, 5x5, 7x7 (odd numbers for symmetry)
2. **Stride**: How much to move filter each step (1, 2, 3...)
3. **Padding**: Add zeros around input to control output size
4. **Dilation**: Spacing between filter elements (for larger receptive field)

**Output Size Formula:**
```
Output_size = (Input_size + 2*Padding - Filter_size) / Stride + 1
```

In [None]:
# Cell 2: Convolution Operation Deep Analysis

print("=== CONVOLUTION OPERATION MATHEMATICS ===")

def manual_convolution_2d(input_array, filter_array, stride=1, padding=0):
    """
    Manual implementation of 2D convolution to understand the mathematics
    
    Args:
        input_array: 2D numpy array (height, width)
        filter_array: 2D numpy array (filter_height, filter_width)
        stride: Step size for filter movement
        padding: Number of zero-padding pixels around input
    
    Returns:
        output_array: 2D numpy array with convolution result
    """
    
    # Add padding if specified
    if padding > 0:
        input_array = np.pad(input_array, padding, mode='constant', constant_values=0)
    
    input_h, input_w = input_array.shape
    filter_h, filter_w = filter_array.shape
    
    # Calculate output dimensions
    output_h = (input_h - filter_h) // stride + 1
    output_w = (input_w - filter_w) // stride + 1
    
    # Initialize output array
    output_array = np.zeros((output_h, output_w))
    
    # Perform convolution
    for i in range(0, output_h):
        for j in range(0, output_w):
            # Extract the current window
            window = input_array[i*stride:i*stride+filter_h, j*stride:j*stride+filter_w]
            # Element-wise multiplication and sum
            output_array[i, j] = np.sum(window * filter_array)
    
    return output_array

# Create example input and filters
print("\n🖼️ EXAMPLE INPUT IMAGE (8x8):")
example_input = np.array([
    [1, 1, 1, 0, 0, 0, 1, 1],
    [1, 1, 1, 0, 0, 0, 1, 1],
    [1, 1, 1, 0, 0, 0, 1, 1],
    [0, 0, 0, 1, 1, 1, 0, 0],
    [0, 0, 0, 1, 1, 1, 0, 0],
    [0, 0, 0, 1, 1, 1, 0, 0],
    [1, 1, 1, 0, 0, 0, 1, 1],
    [1, 1, 1, 0, 0, 0, 1, 1]
], dtype=np.float32)

print(f"Input shape: {example_input.shape}")
print(f"Input pattern: Checkerboard-like pattern with clear edges")

# Define different types of filters
filters = {
    'vertical_edge': np.array([[-1, 0, 1],
                              [-1, 0, 1],
                              [-1, 0, 1]], dtype=np.float32),
    
    'horizontal_edge': np.array([[-1, -1, -1],
                                [ 0,  0,  0],
                                [ 1,  1,  1]], dtype=np.float32),
    
    'blur': np.array([[1, 1, 1],
                     [1, 1, 1],
                     [1, 1, 1]], dtype=np.float32) / 9,
    
    'sharpen': np.array([[ 0, -1,  0],
                        [-1,  5, -1],
                        [ 0, -1,  0]], dtype=np.float32),
    
    'diagonal_edge': np.array([[-1, -1,  0],
                              [-1,  0,  1],
                              [ 0,  1,  1]], dtype=np.float32)
}

print(f"\n🔧 DEFINED FILTERS:")
for name, filter_array in filters.items():
    print(f"  {name:15s}: {filter_array.shape} - {filter_array.sum():.2f} (sum)")

# Demonstrate convolution with different parameters
print(f"\n🧮 CONVOLUTION PARAMETER ANALYSIS:")

# Test different stride values
strides_to_test = [1, 2, 3]
padding_values = [0, 1, 2]

print(f"\nSTRIDE EFFECTS (using vertical_edge filter, no padding):")
for stride in strides_to_test:
    output = manual_convolution_2d(example_input, filters['vertical_edge'], stride=stride, padding=0)
    print(f"  Stride {stride}: Input {example_input.shape} → Output {output.shape}")
    print(f"           Formula: ({example_input.shape[0]} - {filters['vertical_edge'].shape[0]}) / {stride} + 1 = {output.shape[0]}")

print(f"\nPADDING EFFECTS (using vertical_edge filter, stride=1):")
for padding in padding_values:
    output = manual_convolution_2d(example_input, filters['vertical_edge'], stride=1, padding=padding)
    effective_input_size = example_input.shape[0] + 2 * padding
    print(f"  Padding {padding}: Effective input {effective_input_size}x{effective_input_size} → Output {output.shape}")
    print(f"            Formula: ({effective_input_size} - {filters['vertical_edge'].shape[0]}) / 1 + 1 = {output.shape[0]}")

# Visualize convolution operations
fig, axes = plt.subplots(3, 3, figsize=(18, 15))

# Plot original input
axes[0, 0].imshow(example_input, cmap='gray', interpolation='nearest')
axes[0, 0].set_title('Original Input\n(8x8 Checkerboard Pattern)')
axes[0, 0].set_xlabel('Width')
axes[0, 0].set_ylabel('Height')

# Add grid to show pixels clearly
for i in range(example_input.shape[0] + 1):
    axes[0, 0].axhline(i - 0.5, color='white', linewidth=0.5)
for j in range(example_input.shape[1] + 1):
    axes[0, 0].axvline(j - 0.5, color='white', linewidth=0.5)

# Plot different filter effects
filter_names = ['vertical_edge', 'horizontal_edge', 'blur', 'sharpen', 'diagonal_edge']
positions = [(0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]

for i, (name, (row, col)) in enumerate(zip(filter_names, positions)):
    filter_array = filters[name]
    output = manual_convolution_2d(example_input, filter_array, stride=1, padding=0)
    
    # Plot filter
    filter_plot = axes[row, col]
    im = filter_plot.imshow(filter_array, cmap='RdBu', interpolation='nearest', vmin=-1, vmax=1)
    filter_plot.set_title(f'{name.replace("_", " ").title()} Filter\n{filter_array.shape}')
    
    # Add values to filter visualization
    for fi in range(filter_array.shape[0]):
        for fj in range(filter_array.shape[1]):
            filter_plot.text(fj, fi, f'{filter_array[fi, fj]:.1f}', 
                           ha='center', va='center', color='white', fontweight='bold')
    
    # Plot convolution output
    output_plot = axes[2, i % 3] if i < 3 else axes[2, i - 3] if i == 3 else None
    if output_plot is not None:
        output_plot.imshow(output, cmap='viridis', interpolation='nearest')
        output_plot.set_title(f'Output: {name.replace("_", " ").title()}\n{output.shape}')
        
        # Show some output values
        if output.shape[0] <= 6:  # Only for small outputs
            for oi in range(min(3, output.shape[0])):
                for oj in range(min(3, output.shape[1])):
                    output_plot.text(oj, oi, f'{output[oi, oj]:.1f}', 
                                   ha='center', va='center', color='white', fontsize=8)

# Add colorbar for filters
plt.colorbar(im, ax=axes[0:2, :].ravel().tolist(), shrink=0.6, label='Filter Weight')

plt.tight_layout()
plt.suptitle('Convolution Operation: Different Filters and Their Effects', y=1.02, fontsize=16)
plt.show()

# Analyze what each filter detects
print(f"\n🔍 FILTER ANALYSIS:")
for name, filter_array in filters.items():
    output = manual_convolution_2d(example_input, filter_array, stride=1, padding=0)
    max_response = np.max(output)
    min_response = np.min(output)
    
    print(f"\n{name.upper()} FILTER:")
    print(f"  Purpose: {get_filter_purpose(name)}")
    print(f"  Response range: [{min_response:.2f}, {max_response:.2f}]")
    print(f"  Strong responses at: {np.unravel_index(np.argmax(output), output.shape)}")
    print(f"  Filter characteristics: {analyze_filter_characteristics(filter_array)}")

def get_filter_purpose(name):
    purposes = {
        'vertical_edge': 'Detects vertical edges and transitions',
        'horizontal_edge': 'Detects horizontal edges and transitions', 
        'blur': 'Smooths image by averaging neighboring pixels',
        'sharpen': 'Enhances edges by emphasizing differences',
        'diagonal_edge': 'Detects diagonal edges and corners'
    }
    return purposes.get(name, 'Unknown purpose')

def analyze_filter_characteristics(filter_array):
    characteristics = []
    
    # Check if filter is symmetric
    if np.allclose(filter_array, filter_array.T):
        characteristics.append('Symmetric')
    
    # Check filter sum (important for brightness preservation)
    filter_sum = np.sum(filter_array)
    if abs(filter_sum) < 0.1:
        characteristics.append('Zero-sum (edge detector)')
    elif filter_sum > 0.9:
        characteristics.append('Positive-sum (feature enhancer)')
    
    # Check for directionality
    if np.max(filter_array) - np.min(filter_array) > 1:
        characteristics.append('High contrast')
    
    return ', '.join(characteristics) if characteristics else 'Basic filter'

print(f"\n💡 CONVOLUTION KEY INSIGHTS:")
print(f"\n1. MATHEMATICAL OPERATION:")
print(f"   • Convolution = Element-wise multiplication + Sum")
print(f"   • Filter slides across entire input image")
print(f"   • Each position produces one output value")

print(f"\n2. PARAMETER EFFECTS:")
print(f"   • Larger stride → Smaller output, faster computation")
print(f"   • Padding → Control output size, preserve border information")
print(f"   • Filter size → Receptive field, computational cost")

print(f"\n3. FEATURE DETECTION:")
print(f"   • Different filters detect different features")
print(f"   • Zero-sum filters → Edge detectors")
print(f"   • Positive-sum filters → Feature enhancers")
print(f"   • CNNs learn optimal filter weights automatically!")

## 2. Pooling Operations: Spatial Dimension Reduction

**Why Pooling?**
- **Dimensionality reduction**: Reduce spatial size → Less computation
- **Translation invariance**: Small shifts in input don't change output much
- **Feature abstraction**: Focus on presence of features, not exact location

**Types of Pooling:**

1. **Max Pooling**: Take maximum value in each window
   - Most common in CNNs
   - Preserves strongest features
   - Creates translation invariance

2. **Average Pooling**: Take average value in each window
   - Smoother downsampling
   - Less aggressive feature selection
   - Better for fine-grained features

3. **Global Pooling**: Pool entire feature map to single value
   - Extreme dimensionality reduction
   - Often used before final classification layer
   - Completely position-invariant

**Pooling Mathematics:**
```
Max Pool: Output[i,j] = max(Input[i*stride:i*stride+pool_size, j*stride:j*stride+pool_size])
Avg Pool: Output[i,j] = mean(Input[i*stride:i*stride+pool_size, j*stride:j*stride+pool_size])
```

In [None]:
# Cell 3: Pooling Operations Comprehensive Analysis

print("=== POOLING OPERATIONS DEEP DIVE ===")

def manual_pooling_2d(input_array, pool_size=2, stride=None, pool_type='max'):
    """
    Manual implementation of pooling operations
    
    Args:
        input_array: 2D numpy array
        pool_size: Size of pooling window (assumes square)
        stride: Step size (defaults to pool_size for non-overlapping)
        pool_type: 'max', 'average', or 'min'
    
    Returns:
        pooled_array: 2D numpy array after pooling
    """
    
    if stride is None:
        stride = pool_size
    
    input_h, input_w = input_array.shape
    
    # Calculate output dimensions
    output_h = (input_h - pool_size) // stride + 1
    output_w = (input_w - pool_size) // stride + 1
    
    # Initialize output array
    pooled_array = np.zeros((output_h, output_w))
    
    # Perform pooling
    for i in range(output_h):
        for j in range(output_w):
            # Extract the current window
            window = input_array[i*stride:i*stride+pool_size, j*stride:j*stride+pool_size]
            
            # Apply pooling operation
            if pool_type == 'max':
                pooled_array[i, j] = np.max(window)
            elif pool_type == 'average':
                pooled_array[i, j] = np.mean(window)
            elif pool_type == 'min':
                pooled_array[i, j] = np.min(window)
            else:
                raise ValueError(f"Unknown pool_type: {pool_type}")
    
    return pooled_array

# Create a more complex test image
print("\n🖼️ CREATING TEST IMAGE FOR POOLING:")

# Create an 8x8 image with various patterns
test_image = np.array([
    [1.0, 0.8, 0.2, 0.1, 0.9, 0.7, 0.3, 0.0],
    [0.9, 1.0, 0.0, 0.3, 0.8, 1.0, 0.1, 0.2],
    [0.1, 0.2, 0.7, 0.9, 0.0, 0.3, 0.8, 0.6],
    [0.0, 0.4, 1.0, 0.8, 0.2, 0.1, 0.9, 1.0],
    [0.8, 0.9, 0.3, 0.0, 1.0, 0.6, 0.2, 0.4],
    [1.0, 0.7, 0.1, 0.5, 0.8, 1.0, 0.0, 0.3],
    [0.2, 0.0, 0.9, 1.0, 0.1, 0.4, 0.7, 0.8],
    [0.4, 0.3, 0.8, 0.6, 0.0, 0.2, 1.0, 0.9]
], dtype=np.float32)

print(f"Test image shape: {test_image.shape}")
print(f"Value range: [{test_image.min():.1f}, {test_image.max():.1f}]")
print(f"Contains: Random values simulating feature map activations")

# Test different pooling operations
pooling_types = ['max', 'average', 'min']
pool_sizes = [2, 3, 4]

print(f"\n📊 POOLING PARAMETER ANALYSIS:")

# Analyze effect of different pool sizes
print(f"\nPOOL SIZE EFFECTS (Max Pooling):")
for pool_size in pool_sizes:
    pooled = manual_pooling_2d(test_image, pool_size=pool_size, pool_type='max')
    reduction_ratio = (test_image.size / pooled.size)
    print(f"  Pool size {pool_size}x{pool_size}: {test_image.shape} → {pooled.shape} (reduction: {reduction_ratio:.1f}x)")

# Analyze different pooling types
print(f"\nPOOLING TYPE COMPARISON (2x2 pools):")
for pool_type in pooling_types:
    pooled = manual_pooling_2d(test_image, pool_size=2, pool_type=pool_type)
    mean_val = np.mean(pooled)
    std_val = np.std(pooled)
    print(f"  {pool_type:8s}: Mean={mean_val:.3f}, Std={std_val:.3f}, Range=[{pooled.min():.3f}, {pooled.max():.3f}]")

# Demonstrate overlapping vs non-overlapping pooling
print(f"\nOVERLAPPING vs NON-OVERLAPPING POOLING:")
non_overlapping = manual_pooling_2d(test_image, pool_size=2, stride=2, pool_type='max')
overlapping = manual_pooling_2d(test_image, pool_size=2, stride=1, pool_type='max')
print(f"  Non-overlapping (stride=pool_size): {test_image.shape} → {non_overlapping.shape}")
print(f"  Overlapping (stride=1):             {test_image.shape} → {overlapping.shape}")
print(f"  Overlap effect: More spatial resolution retained with overlapping")

# Comprehensive visualization
fig, axes = plt.subplots(3, 4, figsize=(20, 15))

# Plot original image
im0 = axes[0, 0].imshow(test_image, cmap='viridis', interpolation='nearest', vmin=0, vmax=1)
axes[0, 0].set_title('Original Image\n(8x8 Feature Map)')
axes[0, 0].set_xlabel('Width')
axes[0, 0].set_ylabel('Height')

# Add value annotations for clarity
for i in range(test_image.shape[0]):
    for j in range(test_image.shape[1]):
        axes[0, 0].text(j, i, f'{test_image[i, j]:.1f}', 
                       ha='center', va='center', color='white', fontsize=8)

# Plot different pooling types (2x2)
for idx, pool_type in enumerate(pooling_types):
    pooled = manual_pooling_2d(test_image, pool_size=2, pool_type=pool_type)
    
    im = axes[0, idx+1].imshow(pooled, cmap='viridis', interpolation='nearest', vmin=0, vmax=1)
    axes[0, idx+1].set_title(f'{pool_type.title()} Pooling 2x2\n{pooled.shape}')
    
    # Add value annotations
    for i in range(pooled.shape[0]):
        for j in range(pooled.shape[1]):
            axes[0, idx+1].text(j, i, f'{pooled[i, j]:.2f}', 
                               ha='center', va='center', color='white', fontsize=10)

# Plot different pool sizes (max pooling)
for idx, pool_size in enumerate([2, 3, 4]):
    pooled = manual_pooling_2d(test_image, pool_size=pool_size, pool_type='max')
    
    im = axes[1, idx].imshow(pooled, cmap='viridis', interpolation='nearest', vmin=0, vmax=1)
    axes[1, idx].set_title(f'Max Pool {pool_size}x{pool_size}\n{pooled.shape}')
    
    # Add value annotations
    for i in range(pooled.shape[0]):
        for j in range(pooled.shape[1]):
            axes[1, idx].text(j, i, f'{pooled[i, j]:.2f}', 
                             ha='center', va='center', color='white', fontsize=10)

# Demonstrate stride effects
stride_examples = [(2, 2), (2, 1), (3, 1)]
for idx, (pool_size, stride) in enumerate(stride_examples):
    pooled = manual_pooling_2d(test_image, pool_size=pool_size, stride=stride, pool_type='max')
    
    im = axes[1, 3] if idx == 0 else axes[2, idx-1] if idx < 3 else None
    if im is not None:
        img = im.imshow(pooled, cmap='viridis', interpolation='nearest', vmin=0, vmax=1)
        im.set_title(f'Pool {pool_size}x{pool_size}, Stride {stride}\n{pooled.shape}')
        
        # Add value annotations for smaller outputs
        if pooled.size <= 16:
            for i in range(pooled.shape[0]):
                for j in range(pooled.shape[1]):
                    im.text(j, i, f'{pooled[i, j]:.2f}', 
                           ha='center', va='center', color='white', fontsize=10)

# Global pooling demonstration
global_max = np.max(test_image)
global_avg = np.mean(test_image)
global_min = np.min(test_image)

global_values = np.array([[global_max], [global_avg], [global_min]])
im_global = axes[2, 2].imshow(global_values, cmap='viridis', interpolation='nearest', vmin=0, vmax=1)
axes[2, 2].set_title('Global Pooling\n(Max, Avg, Min)')
axes[2, 2].set_ylabel('Pool Type')
axes[2, 2].set_yticks([0, 1, 2])
axes[2, 2].set_yticklabels(['Max', 'Avg', 'Min'])
axes[2, 2].set_xticks([])

for i, val in enumerate([global_max, global_avg, global_min]):
    axes[2, 2].text(0, i, f'{val:.3f}', ha='center', va='center', color='white', fontsize=12)

# Information loss analysis
axes[2, 3].axis('off')
info_text = f"""
INFORMATION ANALYSIS:

Original: {test_image.size} values
Max Pool 2x2: {manual_pooling_2d(test_image, 2, pool_type='max').size} values
Compression: {test_image.size / manual_pooling_2d(test_image, 2, pool_type='max').size:.1f}x

Information Preserved:
• Max Pool: Strongest activations
• Avg Pool: General feature strength
• Min Pool: Weakest activations

Trade-offs:
• Smaller size → Faster computation
• Lost detail → Less precise localization
• Translation invariance gained
"""
axes[2, 3].text(0.05, 0.95, info_text, transform=axes[2, 3].transAxes, 
               verticalalignment='top', fontsize=10, fontfamily='monospace')

# Add colorbars
plt.colorbar(im0, ax=axes[0, :].ravel().tolist(), shrink=0.8, label='Activation Value')

plt.tight_layout()
plt.suptitle('Pooling Operations: Types, Sizes, and Effects', y=1.02, fontsize=16)
plt.show()

# Analyze pooling effects on different image patterns
print(f"\n🔍 POOLING EFFECTS ON DIFFERENT PATTERNS:")

# Create different test patterns
patterns = {
    'uniform': np.ones((4, 4)) * 0.5,
    'gradient': np.array([[i*j/9 for j in range(4)] for i in range(4)]),
    'checkerboard': np.array([[((i+j) % 2) for j in range(4)] for i in range(4)]),
    'sparse_features': np.zeros((4, 4))
}
patterns['sparse_features'][1, 1] = 1.0
patterns['sparse_features'][2, 3] = 0.8

for pattern_name, pattern in patterns.items():
    max_pooled = manual_pooling_2d(pattern, pool_size=2, pool_type='max')
    avg_pooled = manual_pooling_2d(pattern, pool_size=2, pool_type='average')
    
    original_info = np.sum(pattern > 0.1)  # Count significant activations
    max_info = np.sum(max_pooled > 0.1)
    avg_info = np.sum(avg_pooled > 0.1)
    
    print(f"\n{pattern_name.upper()} PATTERN:")
    print(f"  Original significant features: {original_info}")
    print(f"  Max pooling preserves: {max_info} features")
    print(f"  Avg pooling preserves: {avg_info} features")
    print(f"  Best pooling for this pattern: {'Max' if max_info >= avg_info else 'Average'}")

print(f"\n💡 POOLING KEY INSIGHTS:")
print(f"\n1. DIMENSIONALITY REDUCTION:")
print(f"   • Pool size 2x2 → 4x reduction in spatial dimensions")
print(f"   • Pool size 3x3 → 9x reduction in spatial dimensions")
print(f"   • Critical for computational efficiency in deep networks")

print(f"\n2. POOLING TYPE SELECTION:")
print(f"   • Max pooling: Best for sparse, distinct features (most common)")
print(f"   • Average pooling: Better for distributed, texture-like features")
print(f"   • Global pooling: Complete spatial invariance for classification")

print(f"\n3. TRANSLATION INVARIANCE:")
print(f"   • Small shifts in input → Same pooled output")
print(f"   • Essential for robust image recognition")
print(f"   • Trade-off: Lost spatial precision")

print(f"\n4. ARCHITECTURAL CONSIDERATIONS:")
print(f"   • Usually placed after convolution layers")
print(f"   • Stride typically equals pool size (non-overlapping)")
print(f"   • Modern architectures sometimes replace with strided convolutions")

## 3. Feature Maps and Receptive Fields: What CNNs Really See

**Feature Maps Explained:**
- **Output of convolution**: Each filter produces one feature map
- **Multiple filters**: Create multiple feature maps per layer
- **Feature hierarchy**: Early layers→edges, Later layers→complex objects
- **Spatial arrangement**: Feature maps preserve spatial relationships

**Receptive Field:**
- **Definition**: Input region that affects a particular output neuron
- **Growth through layers**: Deeper layers see larger input regions
- **Critical concept**: Determines what patterns the network can detect

**Receptive Field Calculation:**
```
RF = 1 (for first layer)
For each subsequent layer:
RF_new = RF_old + (kernel_size - 1) * stride_product
```

**Feature Learning Hierarchy:**
1. **Layer 1**: Edges, colors, simple textures
2. **Layer 2-3**: Shapes, patterns, object parts
3. **Layer 4-5**: Objects, complex features
4. **Final layers**: High-level semantic concepts

In [None]:
# Cell 4: Feature Maps and Receptive Field Analysis

print("=== FEATURE MAPS AND RECEPTIVE FIELD ANALYSIS ===")

def calculate_receptive_field(layers_config):
    """
    Calculate receptive field for a CNN architecture
    
    Args:
        layers_config: List of tuples (layer_type, kernel_size, stride, padding)
    
    Returns:
        List of receptive field sizes for each layer
    """
    receptive_fields = [1]  # Start with RF=1 for input
    stride_product = 1
    
    for layer_type, kernel_size, stride, padding in layers_config:
        if layer_type in ['conv', 'pool']:
            # Update receptive field
            current_rf = receptive_fields[-1] + (kernel_size - 1) * stride_product
            receptive_fields.append(current_rf)
            
            # Update stride product for next layer
            stride_product *= stride
        else:
            # For other layers (like dense), RF doesn't change
            receptive_fields.append(receptive_fields[-1])
    
    return receptive_fields, stride_product

# Define several CNN architectures to analyze
architectures = {
    'Simple CNN': [
        ('conv', 3, 1, 0),  # 3x3 conv, stride 1
        ('pool', 2, 2, 0),  # 2x2 max pool, stride 2
        ('conv', 3, 1, 0),  # 3x3 conv, stride 1  
        ('pool', 2, 2, 0),  # 2x2 max pool, stride 2
        ('conv', 3, 1, 0),  # 3x3 conv, stride 1
    ],
    
    'Deep CNN': [
        ('conv', 3, 1, 1),  # 3x3 conv, stride 1, padding 1
        ('conv', 3, 1, 1),  # 3x3 conv, stride 1, padding 1
        ('pool', 2, 2, 0),  # 2x2 max pool, stride 2
        ('conv', 3, 1, 1),  # 3x3 conv, stride 1, padding 1
        ('conv', 3, 1, 1),  # 3x3 conv, stride 1, padding 1
        ('pool', 2, 2, 0),  # 2x2 max pool, stride 2
        ('conv', 3, 1, 1),  # 3x3 conv, stride 1, padding 1
        ('conv', 3, 1, 1),  # 3x3 conv, stride 1, padding 1
        ('pool', 2, 2, 0),  # 2x2 max pool, stride 2
    ],
    
    'Large Kernel CNN': [
        ('conv', 7, 1, 0),  # 7x7 conv, stride 1
        ('pool', 2, 2, 0),  # 2x2 max pool, stride 2
        ('conv', 5, 1, 0),  # 5x5 conv, stride 1
        ('pool', 2, 2, 0),  # 2x2 max pool, stride 2
        ('conv', 3, 1, 0),  # 3x3 conv, stride 1
    ]
}

print(f"\n📐 RECEPTIVE FIELD ANALYSIS:")

# Analyze each architecture
rf_results = {}
for arch_name, config in architectures.items():
    rfs, final_stride = calculate_receptive_field(config)
    rf_results[arch_name] = {'rfs': rfs, 'stride': final_stride, 'config': config}
    
    print(f"\n{arch_name.upper()}:")
    print(f"  Layer-by-layer receptive field growth:")
    
    layer_names = ['Input'] + [f"{layer[0].title()} {i+1}" for i, layer in enumerate(config)]
    for i, (name, rf) in enumerate(zip(layer_names, rfs)):
        if i < len(config):
            layer_info = config[i]
            print(f"    {name:12s}: RF = {rf:2d}x{rf:2d} | Kernel: {layer_info[1]}x{layer_info[1]}, Stride: {layer_info[2]}")
        else:
            print(f"    {name:12s}: RF = {rf:2d}x{rf:2d}")
    
    print(f"  Final receptive field: {rfs[-1]}x{rfs[-1]} pixels")
    print(f"  Total downsampling: {final_stride}x")

# Visualize receptive field growth
plt.figure(figsize=(18, 12))

# Plot 1: Receptive field growth curves
plt.subplot(2, 3, 1)
for arch_name, results in rf_results.items():
    rfs = results['rfs']
    plt.plot(range(len(rfs)), rfs, marker='o', linewidth=2, label=arch_name)

plt.title('Receptive Field Growth Through Layers')
plt.xlabel('Layer Number')
plt.ylabel('Receptive Field Size (pixels)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.yscale('log')

# Plot 2: Final receptive field comparison
plt.subplot(2, 3, 2)
arch_names = list(rf_results.keys())
final_rfs = [results['rfs'][-1] for results in rf_results.values()]
num_layers = [len(results['config']) for results in rf_results.values()]

bars = plt.bar(arch_names, final_rfs, color=['skyblue', 'lightgreen', 'lightcoral'])
plt.title('Final Receptive Field Size')
plt.ylabel('Receptive Field Size (pixels)')
plt.xticks(rotation=45)

# Add value labels on bars
for bar, rf, layers in zip(bars, final_rfs, num_layers):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1,
             f'{rf}x{rf}\n({layers} layers)', ha='center', va='bottom', fontsize=9)

# Plot 3: Create a real CNN to visualize feature maps
print(f"\n🧠 BUILDING REAL CNN FOR FEATURE MAP VISUALIZATION:")

# Load CIFAR-10 for demonstration
(x_train_demo, y_train_demo), (x_test_demo, y_test_demo) = keras.datasets.cifar10.load_data()
x_train_demo = x_train_demo.astype('float32') / 255.0
x_test_demo = x_test_demo.astype('float32') / 255.0

# Build a simple CNN for feature visualization
demo_cnn = models.Sequential([
    layers.Conv2D(16, 3, activation='relu', input_shape=(32, 32, 3), name='conv1'),
    layers.Conv2D(32, 3, activation='relu', name='conv2'),
    layers.MaxPooling2D(2, name='pool1'),
    layers.Conv2D(64, 3, activation='relu', name='conv3'),
    layers.MaxPooling2D(2, name='pool2'),
    layers.Flatten(),
    layers.Dense(10, activation='softmax')
])

demo_cnn.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

print(f"Demo CNN architecture:")
for i, layer in enumerate(demo_cnn.layers):
    if hasattr(layer, 'kernel_size'):
        print(f"  Layer {i+1}: {layer.name:8s} - Filters: {layer.filters if hasattr(layer, 'filters') else 'N/A':2}, "
              f"Kernel: {layer.kernel_size if hasattr(layer, 'kernel_size') else 'N/A'}")
    else:
        print(f"  Layer {i+1}: {layer.name:8s} - {type(layer).__name__}")

# Train briefly to get meaningful filters
print(f"\nTraining briefly to learn meaningful filters...")
demo_cnn.fit(x_train_demo[:1000], y_train_demo[:1000], epochs=3, batch_size=32, verbose=0)

# Function to extract and visualize feature maps
def visualize_feature_maps(model, input_image, layer_names=None):
    """Extract and visualize feature maps from specified layers"""
    
    if layer_names is None:
        layer_names = [layer.name for layer in model.layers if 'conv' in layer.name]
    
    # Create a model that outputs feature maps
    layer_outputs = []
    for layer_name in layer_names:
        for layer in model.layers:
            if layer.name == layer_name:
                layer_outputs.append(layer.output)
                break
    
    if not layer_outputs:
        print("No matching layers found!")
        return
    
    activation_model = models.Model(inputs=model.input, outputs=layer_outputs)
    
    # Get activations
    activations = activation_model.predict(input_image[np.newaxis, ...], verbose=0)
    
    return activations, layer_names

# Select an interesting test image
test_idx = 0
test_image = x_test_demo[test_idx]
true_label = y_test_demo[test_idx][0]
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

print(f"\nAnalyzing image: {class_names[true_label]}")

# Extract feature maps
conv_layer_names = ['conv1', 'conv2', 'conv3']
activations, layer_names = visualize_feature_maps(demo_cnn, test_image, conv_layer_names)

# Plot original image
plt.subplot(2, 3, 3)
plt.imshow(test_image)
plt.title(f'Original Image\n{class_names[true_label]}')
plt.axis('off')

# Plot feature maps from different layers
for layer_idx, (activation, layer_name) in enumerate(zip(activations, layer_names)):
    if layer_idx < 3:  # Show first 3 layers
        plt.subplot(2, 3, 4 + layer_idx)
        
        # Show first 6 feature maps from this layer
        feature_maps_to_show = min(6, activation.shape[-1])
        
        # Create a grid to show multiple feature maps
        grid_size = int(np.ceil(np.sqrt(feature_maps_to_show)))
        combined_map = np.zeros((activation.shape[1] * grid_size, activation.shape[2] * grid_size))
        
        for i in range(feature_maps_to_show):
            row = i // grid_size
            col = i % grid_size
            start_row = row * activation.shape[1]
            end_row = start_row + activation.shape[1]
            start_col = col * activation.shape[2]
            end_col = start_col + activation.shape[2]
            
            feature_map = activation[0, :, :, i]
            # Normalize for visualization
            feature_map = (feature_map - feature_map.min()) / (feature_map.max() - feature_map.min() + 1e-8)
            combined_map[start_row:end_row, start_col:end_col] = feature_map
        
        plt.imshow(combined_map, cmap='viridis')
        plt.title(f'{layer_name} Feature Maps\n{activation.shape[1]}x{activation.shape[2]}x{activation.shape[3]}')
        plt.axis('off')

plt.tight_layout()
plt.show()

# Analyze feature map statistics
print(f"\n📊 FEATURE MAP ANALYSIS:")
for layer_idx, (activation, layer_name) in enumerate(zip(activations, layer_names)):
    mean_activation = np.mean(activation)
    std_activation = np.std(activation)
    sparsity = np.mean(activation == 0)  # Fraction of zero activations (due to ReLU)
    max_activation = np.max(activation)
    
    print(f"\n{layer_name.upper()}:")
    print(f"  Shape: {activation.shape[1:]} (H x W x Channels)")
    print(f"  Total activations: {activation.size:,}")
    print(f"  Mean activation: {mean_activation:.4f}")
    print(f"  Std activation: {std_activation:.4f}")
    print(f"  Sparsity (zeros): {sparsity:.2%}")
    print(f"  Max activation: {max_activation:.4f}")
    
    # Analyze what this layer might be detecting
    if layer_idx == 0:
        print(f"  Likely detects: Edges, colors, simple textures")
    elif layer_idx == 1:
        print(f"  Likely detects: Shapes, patterns, combinations of edges")
    else:
        print(f"  Likely detects: Object parts, complex features")

# Calculate actual receptive fields for our demo CNN
demo_config = [
    ('conv', 3, 1, 0),  # conv1
    ('conv', 3, 1, 0),  # conv2  
    ('pool', 2, 2, 0),  # pool1
    ('conv', 3, 1, 0),  # conv3
    ('pool', 2, 2, 0),  # pool2
]

demo_rfs, demo_stride = calculate_receptive_field(demo_config)

print(f"\n🔍 DEMO CNN RECEPTIVE FIELD ANALYSIS:")
layer_names_rf = ['Input', 'Conv1', 'Conv2', 'Pool1', 'Conv3', 'Pool2']
for name, rf in zip(layer_names_rf, demo_rfs):
    print(f"  {name:8s}: {rf:2d}x{rf:2d} pixels")

print(f"\nThis means:")
print(f"  • Conv1 neurons see {demo_rfs[1]}x{demo_rfs[1]} pixel regions")
print(f"  • Conv2 neurons see {demo_rfs[2]}x{demo_rfs[2]} pixel regions")
print(f"  • Conv3 neurons see {demo_rfs[4]}x{demo_rfs[4]} pixel regions")
print(f"  • Final layer sees {demo_rfs[-1]}x{demo_rfs[-1]} = {demo_rfs[-1]**2} pixels")
print(f"  • On 32x32 input, that's {(demo_rfs[-1]**2)/(32*32):.1%} of the image")

print(f"\n💡 FEATURE MAP AND RECEPTIVE FIELD INSIGHTS:")
print(f"\n1. FEATURE HIERARCHY:")
print(f"   • Early layers: Local features (edges, textures)")
print(f"   • Middle layers: Mid-level features (shapes, patterns)")
print(f"   • Late layers: High-level features (objects, concepts)")

print(f"\n2. RECEPTIVE FIELD GROWTH:")
print(f"   • Convolution: Adds (kernel_size - 1) to receptive field")
print(f"   • Pooling: Multiplies effective stride for subsequent layers")
print(f"   • Larger kernels: Faster receptive field growth")

print(f"\n3. ARCHITECTURAL DESIGN:")
print(f"   • Want large final receptive field to see whole objects")
print(f"   • But not too large early on (lose local detail)")
print(f"   • Balance: Gradual growth through multiple layers")

print(f"\n4. FEATURE MAP SPARSITY:")
print(f"   • ReLU activation → Many zero values (sparsity)")
print(f"   • Good: Efficient computation and storage")
print(f"   • Indicates selective feature detection")

## 4. CNN Architecture Patterns: From LeNet to Modern Networks

**Historical Evolution:**

1. **LeNet-5 (1998)**: First successful CNN
   - 2 conv layers, 2 fully connected
   - Sigmoid/tanh activations
   - Proved CNNs work for digit recognition

2. **AlexNet (2012)**: ImageNet breakthrough
   - 5 conv layers, 3 fully connected
   - ReLU activations, dropout, data augmentation
   - GPU acceleration

3. **VGG (2014)**: Deeper with small filters
   - Only 3x3 convolutions
   - 16-19 layers deep
   - Showed depth matters

4. **ResNet (2015)**: Skip connections
   - Residual learning
   - 50-152 layers
   - Solved vanishing gradient problem

**Key Design Patterns:**
- **Feature extraction**: Conv + Pool layers
- **Classification**: Global pooling + Dense layers
- **Regularization**: Dropout, batch normalization
- **Skip connections**: For very deep networks

In [None]:
# Cell 5: CNN Architecture Patterns Implementation and Analysis

print("=== CNN ARCHITECTURE PATTERNS THROUGH HISTORY ===")

def create_lenet_style(input_shape=(32, 32, 1), num_classes=10):
    """
    LeNet-5 style architecture (adapted for modern frameworks)
    Original: Designed for 32x32 grayscale images (MNIST-style)
    """
    model = models.Sequential([
        # Feature extraction
        layers.Conv2D(6, 5, activation='tanh', input_shape=input_shape, name='conv1'),
        layers.AveragePooling2D(2, name='pool1'),
        layers.Conv2D(16, 5, activation='tanh', name='conv2'),
        layers.AveragePooling2D(2, name='pool2'),
        
        # Classification
        layers.Flatten(),
        layers.Dense(120, activation='tanh', name='fc1'),
        layers.Dense(84, activation='tanh', name='fc2'),
        layers.Dense(num_classes, activation='softmax', name='output')
    ], name='LeNet_Style')
    
    return model

def create_alexnet_style(input_shape=(224, 224, 3), num_classes=10):
    """
    AlexNet style architecture (simplified for smaller inputs)
    Original: Designed for 224x224 color images (ImageNet)
    """
    model = models.Sequential([
        # Feature extraction
        layers.Conv2D(64, 11, strides=4, activation='relu', input_shape=input_shape, name='conv1'),
        layers.MaxPooling2D(3, strides=2, name='pool1'),
        
        layers.Conv2D(192, 5, padding='same', activation='relu', name='conv2'),
        layers.MaxPooling2D(3, strides=2, name='pool2'),
        
        layers.Conv2D(384, 3, padding='same', activation='relu', name='conv3'),
        layers.Conv2D(256, 3, padding='same', activation='relu', name='conv4'),
        layers.Conv2D(256, 3, padding='same', activation='relu', name='conv5'),
        layers.MaxPooling2D(3, strides=2, name='pool3'),
        
        # Classification
        layers.Flatten(),
        layers.Dense(4096, activation='relu', name='fc1'),
        layers.Dropout(0.5, name='dropout1'),
        layers.Dense(4096, activation='relu', name='fc2'),
        layers.Dropout(0.5, name='dropout2'),
        layers.Dense(num_classes, activation='softmax', name='output')
    ], name='AlexNet_Style')
    
    return model

def create_vgg_style(input_shape=(224, 224, 3), num_classes=10, depth='11'):
    """
    VGG style architecture (VGG-11 variant)
    Key innovation: Only 3x3 convolutions
    """
    model = models.Sequential(name=f'VGG_{depth}_Style')
    model.add(layers.Input(shape=input_shape))
    
    # VGG-11 configuration: [64, 'M', 128, 'M', 256, 256, 'M', 512, 512, 'M', 512, 512, 'M']
    # Where 'M' means MaxPooling
    
    # Block 1
    model.add(layers.Conv2D(64, 3, padding='same', activation='relu', name='conv1_1'))
    model.add(layers.MaxPooling2D(2, strides=2, name='pool1'))
    
    # Block 2  
    model.add(layers.Conv2D(128, 3, padding='same', activation='relu', name='conv2_1'))
    model.add(layers.MaxPooling2D(2, strides=2, name='pool2'))
    
    # Block 3
    model.add(layers.Conv2D(256, 3, padding='same', activation='relu', name='conv3_1'))
    model.add(layers.Conv2D(256, 3, padding='same', activation='relu', name='conv3_2'))
    model.add(layers.MaxPooling2D(2, strides=2, name='pool3'))
    
    # Block 4
    model.add(layers.Conv2D(512, 3, padding='same', activation='relu', name='conv4_1'))
    model.add(layers.Conv2D(512, 3, padding='same', activation='relu', name='conv4_2'))
    model.add(layers.MaxPooling2D(2, strides=2, name='pool4'))
    
    # Block 5
    model.add(layers.Conv2D(512, 3, padding='same', activation='relu', name='conv5_1'))
    model.add(layers.Conv2D(512, 3, padding='same', activation='relu', name='conv5_2'))
    model.add(layers.MaxPooling2D(2, strides=2, name='pool5'))
    
    # Classification
    model.add(layers.Flatten())
    model.add(layers.Dense(4096, activation='relu', name='fc1'))
    model.add(layers.Dropout(0.5, name='dropout1'))
    model.add(layers.Dense(4096, activation='relu', name='fc2'))
    model.add(layers.Dropout(0.5, name='dropout2'))
    model.add(layers.Dense(num_classes, activation='softmax', name='output'))
    
    return model

def create_modern_cnn(input_shape=(32, 32, 3), num_classes=10):
    """
    Modern CNN with best practices:
    - Batch normalization
    - Residual-like connections (simplified)
    - Global average pooling
    - Data augmentation layers
    """
    model = models.Sequential([
        # Data augmentation (built into model)
        layers.RandomFlip('horizontal'),
        layers.RandomRotation(0.1),
        layers.RandomZoom(0.1),
        
        # Block 1
        layers.Conv2D(32, 3, padding='same', input_shape=input_shape, name='conv1_1'),
        layers.BatchNormalization(name='bn1_1'),
        layers.Activation('relu', name='relu1_1'),
        layers.Conv2D(32, 3, padding='same', name='conv1_2'),
        layers.BatchNormalization(name='bn1_2'),
        layers.Activation('relu', name='relu1_2'),
        layers.MaxPooling2D(2, name='pool1'),
        layers.Dropout(0.25, name='dropout1'),
        
        # Block 2
        layers.Conv2D(64, 3, padding='same', name='conv2_1'),
        layers.BatchNormalization(name='bn2_1'),
        layers.Activation('relu', name='relu2_1'),
        layers.Conv2D(64, 3, padding='same', name='conv2_2'),
        layers.BatchNormalization(name='bn2_2'),
        layers.Activation('relu', name='relu2_2'),
        layers.MaxPooling2D(2, name='pool2'),
        layers.Dropout(0.25, name='dropout2'),
        
        # Block 3
        layers.Conv2D(128, 3, padding='same', name='conv3_1'),
        layers.BatchNormalization(name='bn3_1'),
        layers.Activation('relu', name='relu3_1'),
        layers.Conv2D(128, 3, padding='same', name='conv3_2'),
        layers.BatchNormalization(name='bn3_2'),
        layers.Activation('relu', name='relu3_2'),
        layers.MaxPooling2D(2, name='pool3'),
        layers.Dropout(0.25, name='dropout3'),
        
        # Global pooling instead of flatten + dense
        layers.GlobalAveragePooling2D(name='global_pool'),
        layers.Dropout(0.5, name='dropout_final'),
        layers.Dense(num_classes, activation='softmax', name='output')
    ], name='Modern_CNN')
    
    return model

# Create all architectures for comparison
print(f"\n🏗️ CREATING HISTORICAL CNN ARCHITECTURES:")

# Adjust input shapes for fair comparison (all using CIFAR-10 size)
input_shape_cifar = (32, 32, 3)
num_classes = 10

architectures_historical = {
    'LeNet-Style': create_lenet_style(input_shape_cifar, num_classes),
    'AlexNet-Style': create_alexnet_style(input_shape_cifar, num_classes), 
    'VGG-Style': create_vgg_style(input_shape_cifar, num_classes),
    'Modern-CNN': create_modern_cnn(input_shape_cifar, num_classes)
}

# Analyze each architecture
print(f"\n📊 ARCHITECTURE COMPARISON:")
analysis_data = []

for name, model in architectures_historical.items():
    # Count different types of layers
    conv_layers = sum(1 for layer in model.layers if isinstance(layer, layers.Conv2D))
    pool_layers = sum(1 for layer in model.layers if isinstance(layer, (layers.MaxPooling2D, layers.AveragePooling2D, layers.GlobalAveragePooling2D)))
    dense_layers = sum(1 for layer in model.layers if isinstance(layer, layers.Dense))
    total_params = model.count_params()
    
    analysis_data.append({
        'name': name,
        'conv_layers': conv_layers,
        'pool_layers': pool_layers, 
        'dense_layers': dense_layers,
        'total_layers': len(model.layers),
        'total_params': total_params
    })
    
    print(f"\n{name.upper()}:")
    print(f"  Total layers: {len(model.layers)}")
    print(f"  Conv layers: {conv_layers}")
    print(f"  Pooling layers: {pool_layers}")
    print(f"  Dense layers: {dense_layers}")
    print(f"  Total parameters: {total_params:,}")
    print(f"  Key innovations: {get_architecture_innovations(name)}")

def get_architecture_innovations(name):
    """Get key innovations for each architecture"""
    innovations = {
        'LeNet-Style': 'First successful CNN, proved concept works',
        'AlexNet-Style': 'ReLU activation, dropout, data augmentation, GPU training',
        'VGG-Style': 'Small 3x3 filters only, very deep networks',
        'Modern-CNN': 'Batch normalization, global pooling, built-in augmentation'
    }
    return innovations.get(name, 'Unknown innovations')

# Create comprehensive visualization
fig, axes = plt.subplots(3, 2, figsize=(16, 18))

# Plot 1: Parameter count comparison
names = [data['name'] for data in analysis_data]
param_counts = [data['total_params'] for data in analysis_data]

bars1 = axes[0, 0].bar(names, param_counts, color=['lightblue', 'lightgreen', 'lightcoral', 'gold'])
axes[0, 0].set_title('Total Parameters by Architecture')
axes[0, 0].set_ylabel('Number of Parameters')
axes[0, 0].tick_params(axis='x', rotation=45)
axes[0, 0].set_yscale('log')

# Add value labels
for bar, count in zip(bars1, param_counts):
    axes[0, 0].text(bar.get_x() + bar.get_width()/2, bar.get_height(),
                   f'{count:,}', ha='center', va='bottom', rotation=45, fontsize=8)

# Plot 2: Layer composition
layer_types = ['Conv Layers', 'Pool Layers', 'Dense Layers']
x = np.arange(len(names))
width = 0.25

conv_counts = [data['conv_layers'] for data in analysis_data]
pool_counts = [data['pool_layers'] for data in analysis_data]
dense_counts = [data['dense_layers'] for data in analysis_data]

axes[0, 1].bar(x - width, conv_counts, width, label='Conv Layers', color='skyblue')
axes[0, 1].bar(x, pool_counts, width, label='Pool Layers', color='lightgreen')
axes[0, 1].bar(x + width, dense_counts, width, label='Dense Layers', color='lightcoral')

axes[0, 1].set_title('Layer Composition by Architecture')
axes[0, 1].set_xlabel('Architecture')
axes[0, 1].set_ylabel('Number of Layers')
axes[0, 1].set_xticks(x)
axes[0, 1].set_xticklabels(names, rotation=45)
axes[0, 1].legend()

# Plot 3: Architecture evolution timeline
years = [1998, 2012, 2014, 2020]  # Approximate years
accuracies = [99.2, 84.7, 92.7, 95.0]  # Approximate accuracies on their respective datasets

axes[1, 0].plot(years, accuracies, 'o-', linewidth=3, markersize=8, color='darkblue')
axes[1, 0].set_title('CNN Evolution: Performance Over Time')
axes[1, 0].set_xlabel('Year')
axes[1, 0].set_ylabel('Approximate Accuracy (%)')
axes[1, 0].grid(True, alpha=0.3)

# Add architecture labels
arch_names_short = ['LeNet', 'AlexNet', 'VGG', 'Modern']
for year, acc, name in zip(years, accuracies, arch_names_short):
    axes[1, 0].annotate(name, (year, acc), xytext=(5, 5), 
                       textcoords='offset points', fontsize=10)

# Plot 4: Detailed architecture comparison table
axes[1, 1].axis('off')
table_data = []
for data in analysis_data:
    table_data.append([
        data['name'],
        str(data['conv_layers']),
        str(data['pool_layers']),
        str(data['dense_layers']),
        f"{data['total_params']:,}"
    ])

table = axes[1, 1].table(
    cellText=table_data,
    colLabels=['Architecture', 'Conv', 'Pool', 'Dense', 'Parameters'],
    cellLoc='center',
    loc='center'
)
table.auto_set_font_size(False)
table.set_fontsize(9)
table.scale(1.2, 1.5)
axes[1, 1].set_title('Architecture Comparison Table')

# Plot 5: Show detailed layer structure for one architecture
modern_cnn = architectures_historical['Modern-CNN']
layer_info = []
current_shape = input_shape_cifar

for i, layer in enumerate(modern_cnn.layers[:15]):  # Show first 15 layers
    layer_type = type(layer).__name__
    if hasattr(layer, 'output_shape') and layer.built:
        try:
            # This is a simplified shape calculation
            if 'Conv2D' in layer_type:
                if hasattr(layer, 'filters'):
                    current_shape = (*current_shape[:2], layer.filters)
            elif 'MaxPooling2D' in layer_type:
                if hasattr(layer, 'pool_size'):
                    pool_size = layer.pool_size[0] if isinstance(layer.pool_size, tuple) else layer.pool_size
                    current_shape = (current_shape[0]//pool_size, current_shape[1]//pool_size, current_shape[2])
        except:
            pass
    
    params = layer.count_params() if hasattr(layer, 'count_params') else 0
    layer_info.append((i+1, layer_type, str(current_shape), f"{params:,}"))

# Create layer structure visualization
axes[2, 0].axis('off')
layer_table = axes[2, 0].table(
    cellText=layer_info,
    colLabels=['#', 'Layer Type', 'Output Shape', 'Parameters'],
    cellLoc='center',
    loc='center'
)
layer_table.auto_set_font_size(False)
layer_table.set_fontsize(8)
layer_table.scale(1.2, 1.8)
axes[2, 0].set_title('Modern CNN Layer Structure (First 15 Layers)')

# Plot 6: Key insights and evolution
axes[2, 1].axis('off')
evolution_text = """
CNN EVOLUTION KEY INSIGHTS:

LeNet (1998):
• First successful CNN
• Sigmoid/Tanh activations
• Average pooling
• Small datasets (MNIST)

AlexNet (2012):
• ReLU breakthrough
• Dropout regularization
• Data augmentation
• GPU acceleration

VGG (2014):
• Small 3x3 filters only
• Very deep (16-19 layers)
• Systematic architecture
• Showed depth importance

Modern CNNs:
• Batch normalization
• Skip connections (ResNet)
• Global pooling
• Efficient architectures
"""

axes[2, 1].text(0.05, 0.95, evolution_text, transform=axes[2, 1].transAxes,
               verticalalignment='top', fontsize=9, fontfamily='monospace')

plt.tight_layout()
plt.show()

# Calculate memory requirements
print(f"\n💾 MEMORY ANALYSIS:")
for name, model in architectures_historical.items():
    # Estimate memory for batch size 32
    batch_size = 32
    
    # Parameters memory (float32)
    param_memory_mb = model.count_params() * 4 / (1024 * 1024)
    
    # Activation memory (approximate, for batch_size)
    # This is a rough estimate
    activation_elements = 0
    current_shape = list(input_shape_cifar)
    
    for layer in model.layers:
        if isinstance(layer, layers.Conv2D):
            if hasattr(layer, 'filters'):
                current_shape[2] = layer.filters
                activation_elements += np.prod(current_shape)
        elif isinstance(layer, layers.MaxPooling2D):
            if hasattr(layer, 'pool_size'):
                pool_size = layer.pool_size[0] if isinstance(layer.pool_size, tuple) else layer.pool_size
                current_shape[0] //= pool_size
                current_shape[1] //= pool_size
                activation_elements += np.prod(current_shape)
    
    activation_memory_mb = activation_elements * batch_size * 4 / (1024 * 1024)
    total_memory_mb = param_memory_mb + activation_memory_mb
    
    print(f"\n{name.upper()}:")
    print(f"  Parameters: {param_memory_mb:.1f} MB")
    print(f"  Activations (batch={batch_size}): {activation_memory_mb:.1f} MB")
    print(f"  Total memory: {total_memory_mb:.1f} MB")
    
    # Performance characteristics
    if 'LeNet' in name:
        print(f"  Best for: Small datasets, simple patterns")
    elif 'AlexNet' in name:
        print(f"  Best for: Medium datasets, breakthrough performance")
    elif 'VGG' in name:
        print(f"  Best for: High accuracy, systematic design")
    elif 'Modern' in name:
        print(f"  Best for: Production use, efficient training")

print(f"\n💡 ARCHITECTURE DESIGN INSIGHTS:")
print(f"\n1. HISTORICAL PROGRESSION:")
print(f"   • LeNet → AlexNet: ReLU + Dropout + Scale")
print(f"   • AlexNet → VGG: Systematic design + Depth")
print(f"   • VGG → ResNet: Skip connections for very deep networks")
print(f"   • Modern: Efficiency + Best practices")

print(f"\n2. KEY INNOVATIONS IMPACT:")
print(f"   • ReLU: Solved vanishing gradients, faster training")
print(f"   • Dropout: Reduced overfitting, better generalization")
print(f"   • Batch Norm: Stable training, higher learning rates")
print(f"   • Skip connections: Ultra-deep networks possible")

print(f"\n3. MODERN DESIGN PRINCIPLES:")
print(f"   • Start with proven architectures")
print(f"   • Use batch normalization after conv layers")
print(f"   • Prefer global pooling over large dense layers")
print(f"   • Add data augmentation for robustness")
print(f"   • Monitor parameter count vs. performance")