# HW-04-01: Image Classification with CNNs and Keras
## Pokemon Type Classification

**Dataset**: [Pokemon Images and Types](https://www.kaggle.com/datasets/vishalsubbiah/pokemon-images-and-types)

In this notebook, we will classify Pokemon images into their types using:
- 2 custom CNN architectures
- 2 transfer learning models

## Setup and Imports

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
import os

# Image processing
from PIL import Image
import cv2

# Sklearn utilities
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.preprocessing import LabelEncoder

# Imbalanced data handling
from imblearn.over_sampling import RandomOverSampler

# TensorFlow and Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import *
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import EfficientNetV2M, MobileNetV2

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Display settings
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

---
# Part 1: Pre-processing

In this section, we will:
1. Load the Pokemon dataset
2. Process and resize images
3. Handle class imbalance if necessary
4. Split data into train/test sets
5. Apply data augmentation

## 1.1 Load Dataset

**Note**: Download the dataset from [Kaggle](https://www.kaggle.com/datasets/vishalsubbiah/pokemon-images-and-types) and place it in a `data/pokemon` directory.

In [None]:
# Define path to dataset
DATA_PATH = Path('data/pokemon')
IMAGE_PATH = DATA_PATH / 'images' / 'images'

# Check if dataset exists
if not DATA_PATH.exists():
    print("Dataset not found. Please download from Kaggle and place in data/pokemon directory.")
else:
    print(f"Dataset found at {DATA_PATH}")

In [None]:
# Load the CSV file with Pokemon types
pokemon_df = pd.read_csv(DATA_PATH / 'pokemon.csv')

print(f"Total Pokemon: {len(pokemon_df)}")
print(f"\nFirst few rows:")
pokemon_df.head()

In [None]:
# Explore the data
print("Dataset Info:")
print(pokemon_df.info())
print("\nColumn names:")
print(pokemon_df.columns.tolist())

In [None]:
# Check class distribution (Primary Type)
type_counts = pokemon_df['Type 1'].value_counts()
print("Pokemon Type Distribution:")
print(type_counts)
print(f"\nNumber of classes: {len(type_counts)}")

# Visualize distribution
plt.figure(figsize=(12, 5))
type_counts.plot(kind='bar', color='steelblue')
plt.title('Distribution of Pokemon Primary Types')
plt.xlabel('Type')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## 1.2 Optional: Reduce Dataset Size

If the dataset is too large or training is slow, you can:
- Drop some classes with fewer samples
- Sample a subset of the data

**Note**: Keep this commented out initially and only use if needed.

In [None]:
# Option 1: Keep only classes with sufficient samples (e.g., at least 30 samples)
# min_samples = 30
# valid_types = type_counts[type_counts >= min_samples].index
# pokemon_df = pokemon_df[pokemon_df['Type 1'].isin(valid_types)]
# print(f"Reduced to {len(pokemon_df)} Pokemon with {len(valid_types)} classes")

# Option 2: Sample a fraction of the data
# pokemon_df = pokemon_df.sample(frac=0.5, random_state=42)
# print(f"Sampled {len(pokemon_df)} Pokemon")

# Option 3: Keep only top N most common types
# top_n = 10
# top_types = type_counts.head(top_n).index
# pokemon_df = pokemon_df[pokemon_df['Type 1'].isin(top_types)]
# print(f"Keeping top {top_n} types with {len(pokemon_df)} Pokemon")

## 1.3 Load and Process Images

In [None]:
# Define image size
IMG_SIZE = 64  # Using 64x64 for faster training, can increase to 128 or 224 if needed

def load_and_preprocess_image(pokemon_name, img_size=IMG_SIZE):
    """
    Load and preprocess a Pokemon image.
    
    Args:
        pokemon_name: Name of the Pokemon
        img_size: Target size for resizing
    
    Returns:
        Preprocessed image array or None if image not found
    """
    # Construct image path (format: PokemonName.png)
    img_path = IMAGE_PATH / f"{pokemon_name}.png"
    
    if not img_path.exists():
        return None
    
    try:
        # Load image using PIL
        img = Image.open(img_path).convert('RGB')
        
        # Resize to target size
        img = img.resize((img_size, img_size), Image.Resampling.LANCZOS)
        
        # Convert to numpy array and normalize to [0, 1]
        img_array = np.array(img) / 255.0
        
        return img_array
    except Exception as e:
        print(f"Error loading {pokemon_name}: {e}")
        return None

In [None]:
# Load all images and labels
images = []
labels = []
failed_loads = []

print("Loading images...")
for idx, row in pokemon_df.iterrows():
    pokemon_name = row['Name']
    pokemon_type = row['Type 1']
    
    img = load_and_preprocess_image(pokemon_name)
    
    if img is not None:
        images.append(img)
        labels.append(pokemon_type)
    else:
        failed_loads.append(pokemon_name)
    
    # Progress indicator
    if (idx + 1) % 100 == 0:
        print(f"Processed {idx + 1}/{len(pokemon_df)} images...")

print(f"\nSuccessfully loaded: {len(images)} images")
print(f"Failed to load: {len(failed_loads)} images")

if failed_loads:
    print(f"\nFirst few failed loads: {failed_loads[:5]}")

In [None]:
# Convert to numpy arrays
X = np.array(images)
y = np.array(labels)

print(f"X shape: {X.shape}")
print(f"y shape: {y.shape}")
print(f"\nX data type: {X.dtype}")
print(f"X value range: [{X.min():.3f}, {X.max():.3f}]")

## 1.4 Visualize Sample Images

In [None]:
# Display sample images from each class
unique_types = np.unique(y)
n_types = len(unique_types)

# Show first 12 types (or fewer if less than 12 types)
n_display = min(12, n_types)
fig, axes = plt.subplots(3, 4, figsize=(12, 9))
axes = axes.ravel()

for i in range(n_display):
    pokemon_type = unique_types[i]
    # Get first image of this type
    type_indices = np.where(y == pokemon_type)[0]
    sample_idx = type_indices[0]
    
    axes[i].imshow(X[sample_idx])
    axes[i].set_title(f"Type: {pokemon_type}")
    axes[i].axis('off')

# Hide any unused subplots
for i in range(n_display, 12):
    axes[i].axis('off')

plt.tight_layout()
plt.show()

## 1.5 Encode Labels

In [None]:
# Encode string labels to integers
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)

print(f"Number of classes: {len(label_encoder.classes_)}")
print(f"\nClass mapping (first 10):")
for i, class_name in enumerate(label_encoder.classes_[:10]):
    print(f"{i}: {class_name}")

n_classes = len(label_encoder.classes_)

## 1.6 Train-Test Split

In [None]:
# Split data into train and test sets (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(
    X, y_encoded, 
    test_size=0.2, 
    random_state=42,
    stratify=y_encoded  # Maintain class distribution
)

print(f"Training set: {X_train.shape[0]} samples")
print(f"Test set: {X_test.shape[0]} samples")
print(f"\nX_train shape: {X_train.shape}")
print(f"X_test shape: {X_test.shape}")

## 1.7 Check Class Balance

In [None]:
# Check class distribution in training set
train_type_counts = pd.Series(y_train).value_counts().sort_index()
train_type_names = [label_encoder.classes_[i] for i in train_type_counts.index]

plt.figure(figsize=(12, 5))
plt.bar(range(len(train_type_counts)), train_type_counts.values, color='coral')
plt.xticks(range(len(train_type_counts)), train_type_names, rotation=45, ha='right')
plt.title('Class Distribution in Training Set')
plt.xlabel('Type')
plt.ylabel('Count')
plt.tight_layout()
plt.show()

print(f"Min samples per class: {train_type_counts.min()}")
print(f"Max samples per class: {train_type_counts.max()}")
print(f"Mean samples per class: {train_type_counts.mean():.1f}")

## 1.8 Optional: Handle Class Imbalance with Oversampling

If classes are significantly imbalanced, we can use oversampling to balance them.

In [None]:
# Uncomment this section if you want to apply oversampling

# def oversample_with_imblearn(X, y):
#     """
#     Oversample the minority classes to balance the dataset.
#     """
#     N, H, W, C = X.shape
#     # Flatten images for oversampling
#     X_flat = X.reshape(N, -1)
#     
#     # Apply random oversampling
#     ros = RandomOverSampler(random_state=42)
#     X_resampled, y_resampled = ros.fit_resample(X_flat, y)
#     
#     # Reshape back to image format
#     X_resampled = X_resampled.reshape(-1, H, W, C)
#     
#     return X_resampled, y_resampled

# # Apply oversampling to training data
# X_train, y_train = oversample_with_imblearn(X_train, y_train)

# print(f"After oversampling:")
# print(f"X_train shape: {X_train.shape}")
# print(f"Class distribution:")
# print(pd.Series(y_train).value_counts().sort_index())

## 1.9 Data Augmentation Setup

We will use ImageDataGenerator for real-time data augmentation during training.

In [None]:
# Create data augmentation generator
datagen = ImageDataGenerator(
    rotation_range=20,        # Randomly rotate images by up to 20 degrees
    width_shift_range=0.15,   # Randomly shift images horizontally by up to 15%
    height_shift_range=0.15,  # Randomly shift images vertically by up to 15%
    zoom_range=0.15,          # Randomly zoom in/out by up to 15%
    horizontal_flip=True,     # Randomly flip images horizontally
    fill_mode='nearest'       # Fill missing pixels after transformations
)

print("Data augmentation configured:")
print(f"- Rotation: +/- 20 degrees")
print(f"- Width shift: +/- 15%")
print(f"- Height shift: +/- 15%")
print(f"- Zoom: +/- 15%")
print(f"- Horizontal flip: Yes")

## 1.10 Visualize Augmented Images

In [None]:
# Visualize augmentation effects on a sample image
sample_img = X_train[0:1]  # Take first training image

fig, axes = plt.subplots(2, 4, figsize=(12, 6))
axes = axes.ravel()

# Show original
axes[0].imshow(sample_img[0])
axes[0].set_title('Original')
axes[0].axis('off')

# Generate and show augmented versions
aug_iter = datagen.flow(sample_img, batch_size=1)
for i in range(1, 8):
    aug_img = next(aug_iter)[0]
    axes[i].imshow(aug_img)
    axes[i].set_title(f'Augmented {i}')
    axes[i].axis('off')

plt.tight_layout()
plt.show()

## 1.11 Summary of Preprocessing Steps

### Image Modifications:
- **Resizing**: All images resized to 64x64 pixels for consistent input size
- **Normalization**: Pixel values scaled from [0, 255] to [0, 1] by dividing by 255
- **Color space**: All images converted to RGB (3 channels)

### Dataset Size Reduction:
- Original dataset: [will show after loading]
- Final dataset: [will show after any filtering]
- Classes kept: [all classes or filtered classes]
- Reason for reduction (if applicable): [describe if you dropped samples/classes]

### Class Balancing:
- Class distribution shows some imbalance (visualized above)
- Strategy chosen: [Oversampling / Augmentation / No balancing - describe what you did]
- Final training samples: [number after any resampling]

### Data Augmentation:
- Applied during training to increase data diversity and prevent overfitting
- Techniques: rotation, shifts, zoom, horizontal flip
- Augmentation applied only to training data, not test data

---
# Part 2: Modeling

We will build and train 4 models:
1. **CNN Model 1**: Simple baseline CNN
2. **CNN Model 2**: Deeper CNN with more layers
3. **Transfer Learning Model 1**: EfficientNetV2M
4. **Transfer Learning Model 2**: MobileNetV2

## 2.1 Define Training Configuration

In [None]:
# Common training parameters
BATCH_SIZE = 64
EPOCHS = 50
LEARNING_RATE = 0.001

# Callbacks
early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=10,
    restore_best_weights=True,
    verbose=1
)

reduce_lr = ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=5,
    min_lr=1e-6,
    verbose=1
)

callbacks = [early_stopping, reduce_lr]

## 2.2 CNN Model 1: Simple Baseline

A simple 3-block CNN with moderate complexity.

In [None]:
def build_cnn_model_1(input_shape, n_classes):
    """
    Simple baseline CNN with 3 convolutional blocks.
    Architecture: Conv -> Pool -> BN -> Dropout (x3) -> GAP -> Dense
    """
    model = Sequential([
        Input(shape=input_shape),
        
        # Block 1: Learn basic features
        Conv2D(32, (3, 3), activation='relu', padding='same'),
        MaxPooling2D(pool_size=(2, 2)),
        BatchNormalization(),
        Dropout(0.25),
        
        # Block 2: Learn more complex features
        Conv2D(64, (3, 3), activation='relu', padding='same'),
        MaxPooling2D(pool_size=(2, 2)),
        BatchNormalization(),
        Dropout(0.25),
        
        # Block 3: Learn high-level features
        Conv2D(128, (3, 3), activation='relu', padding='same'),
        MaxPooling2D(pool_size=(2, 2)),
        BatchNormalization(),
        Dropout(0.25),
        
        # Classification head
        GlobalAveragePooling2D(),
        Dense(n_classes, activation='softmax')
    ], name='CNN_Model_1_Simple')
    
    return model

# Build model
cnn1 = build_cnn_model_1(X_train.shape[1:], n_classes)

# Compile
cnn1.compile(
    optimizer=Adam(learning_rate=LEARNING_RATE),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Display architecture
cnn1.summary()

In [None]:
# Train CNN Model 1 with data augmentation
print("Training CNN Model 1...")
history_cnn1 = cnn1.fit(
    datagen.flow(X_train, y_train, batch_size=BATCH_SIZE),
    epochs=EPOCHS,
    validation_data=(X_test, y_test),
    callbacks=callbacks,
    verbose=1
)

## 2.3 CNN Model 2: Deeper Architecture

A deeper CNN with 4 blocks and additional dense layers.

In [None]:
def build_cnn_model_2(input_shape, n_classes):
    """
    Deeper CNN with 4 convolutional blocks and a dense layer.
    Architecture: Conv -> Pool -> BN -> Dropout (x4) -> GAP -> Dense -> Dropout -> Dense
    """
    model = Sequential([
        Input(shape=input_shape),
        
        # Block 1: Initial feature extraction
        Conv2D(32, (3, 3), activation='relu', padding='same'),
        Conv2D(32, (3, 3), activation='relu', padding='same'),
        MaxPooling2D(pool_size=(2, 2)),
        BatchNormalization(),
        Dropout(0.2),
        
        # Block 2: Deeper features
        Conv2D(64, (3, 3), activation='relu', padding='same'),
        Conv2D(64, (3, 3), activation='relu', padding='same'),
        MaxPooling2D(pool_size=(2, 2)),
        BatchNormalization(),
        Dropout(0.3),
        
        # Block 3: Complex patterns
        Conv2D(128, (3, 3), activation='relu', padding='same'),
        Conv2D(128, (3, 3), activation='relu', padding='same'),
        MaxPooling2D(pool_size=(2, 2)),
        BatchNormalization(),
        Dropout(0.4),
        
        # Block 4: High-level features
        Conv2D(256, (3, 3), activation='relu', padding='same'),
        GlobalAveragePooling2D(),
        
        # Classification head with additional dense layer
        Dense(256, activation='relu'),
        BatchNormalization(),
        Dropout(0.5),
        Dense(n_classes, activation='softmax')
    ], name='CNN_Model_2_Deep')
    
    return model

# Build model
cnn2 = build_cnn_model_2(X_train.shape[1:], n_classes)

# Compile
cnn2.compile(
    optimizer=Adam(learning_rate=LEARNING_RATE),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Display architecture
cnn2.summary()

In [None]:
# Train CNN Model 2 with data augmentation
print("Training CNN Model 2...")
history_cnn2 = cnn2.fit(
    datagen.flow(X_train, y_train, batch_size=BATCH_SIZE),
    epochs=EPOCHS,
    validation_data=(X_test, y_test),
    callbacks=callbacks,
    verbose=1
)

## 2.4 Transfer Learning Model 1: EfficientNetV2M

Using pre-trained EfficientNetV2M as feature extractor.

In [None]:
# Prepare data for transfer learning (resize to 224x224)
print("Resizing images for transfer learning models...")
X_train_224 = np.array(tf.image.resize(X_train, (224, 224)))
X_test_224 = np.array(tf.image.resize(X_test, (224, 224)))

print(f"X_train_224 shape: {X_train_224.shape}")
print(f"X_test_224 shape: {X_test_224.shape}")

In [None]:
# Load pre-trained EfficientNetV2M base model
print("Loading EfficientNetV2M base model...")
base_efficientnet = EfficientNetV2M(
    weights='imagenet',
    include_top=False,
    input_shape=(224, 224, 3)
)

# Freeze base model weights
base_efficientnet.trainable = False

print(f"Base model has {len(base_efficientnet.layers)} layers")

In [None]:
# Extract features from training and test data
print("Extracting features from training data...")
X_train_efficientnet = base_efficientnet.predict(X_train_224, batch_size=32, verbose=1)

print("Extracting features from test data...")
X_test_efficientnet = base_efficientnet.predict(X_test_224, batch_size=32, verbose=1)

print(f"\nExtracted feature shapes:")
print(f"X_train_efficientnet: {X_train_efficientnet.shape}")
print(f"X_test_efficientnet: {X_test_efficientnet.shape}")

In [None]:
# Build classification head for EfficientNet features
def build_transfer_head(input_shape, n_classes, name='transfer_head'):
    """
    Simple classification head for pre-extracted features.
    """
    model = Sequential([
        Input(shape=input_shape),
        GlobalAveragePooling2D(),
        Dense(512, activation='relu'),
        BatchNormalization(),
        Dropout(0.5),
        Dense(256, activation='relu'),
        BatchNormalization(),
        Dropout(0.3),
        Dense(n_classes, activation='softmax')
    ], name=name)
    
    return model

# Build and compile classification head
efficientnet_head = build_transfer_head(
    X_train_efficientnet.shape[1:], 
    n_classes,
    name='EfficientNetV2M_Head'
)

efficientnet_head.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

efficientnet_head.summary()

In [None]:
# Train classification head on extracted features
print("Training EfficientNetV2M classification head...")
history_efficientnet = efficientnet_head.fit(
    X_train_efficientnet, y_train,
    batch_size=64,
    epochs=EPOCHS,
    validation_data=(X_test_efficientnet, y_test),
    callbacks=callbacks,
    verbose=1
)

In [None]:
# Create combined end-to-end model for EfficientNet
efficientnet_input = Input(shape=(224, 224, 3))
x = base_efficientnet(efficientnet_input, training=False)
efficientnet_output = efficientnet_head(x)

transfer1_model = Model(
    inputs=efficientnet_input, 
    outputs=efficientnet_output,
    name='Transfer_EfficientNetV2M'
)

print("Combined EfficientNetV2M model created")
print(f"Total parameters: {transfer1_model.count_params():,}")

## 2.5 Transfer Learning Model 2: MobileNetV2

Using pre-trained MobileNetV2 as feature extractor.

In [None]:
# Load pre-trained MobileNetV2 base model
print("Loading MobileNetV2 base model...")
base_mobilenet = MobileNetV2(
    weights='imagenet',
    include_top=False,
    input_shape=(224, 224, 3)
)

# Freeze base model weights
base_mobilenet.trainable = False

print(f"Base model has {len(base_mobilenet.layers)} layers")

In [None]:
# Extract features from training and test data
print("Extracting features from training data...")
X_train_mobilenet = base_mobilenet.predict(X_train_224, batch_size=32, verbose=1)

print("Extracting features from test data...")
X_test_mobilenet = base_mobilenet.predict(X_test_224, batch_size=32, verbose=1)

print(f"\nExtracted feature shapes:")
print(f"X_train_mobilenet: {X_train_mobilenet.shape}")
print(f"X_test_mobilenet: {X_test_mobilenet.shape}")

In [None]:
# Build classification head for MobileNet features
mobilenet_head = build_transfer_head(
    X_train_mobilenet.shape[1:], 
    n_classes,
    name='MobileNetV2_Head'
)

mobilenet_head.compile(
    optimizer=Adam(learning_rate=0.001),
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

mobilenet_head.summary()

In [None]:
# Train classification head on extracted features
print("Training MobileNetV2 classification head...")
history_mobilenet = mobilenet_head.fit(
    X_train_mobilenet, y_train,
    batch_size=64,
    epochs=EPOCHS,
    validation_data=(X_test_mobilenet, y_test),
    callbacks=callbacks,
    verbose=1
)

In [None]:
# Create combined end-to-end model for MobileNet
mobilenet_input = Input(shape=(224, 224, 3))
x = base_mobilenet(mobilenet_input, training=False)
mobilenet_output = mobilenet_head(x)

transfer2_model = Model(
    inputs=mobilenet_input, 
    outputs=mobilenet_output,
    name='Transfer_MobileNetV2'
)

print("Combined MobileNetV2 model created")
print(f"Total parameters: {transfer2_model.count_params():,}")

## 2.6 Training History Visualization

In [None]:
# Plot training histories for all models
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

histories = [
    (history_cnn1, 'CNN Model 1 (Simple)', axes[0, 0]),
    (history_cnn2, 'CNN Model 2 (Deep)', axes[0, 1]),
    (history_efficientnet, 'EfficientNetV2M', axes[1, 0]),
    (history_mobilenet, 'MobileNetV2', axes[1, 1])
]

for history, title, ax in histories:
    # Plot loss
    ax.plot(history.history['loss'], label='Train Loss', linewidth=2)
    ax.plot(history.history['val_loss'], label='Val Loss', linewidth=2)
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Loss')
    ax.set_title(f'{title} - Loss')
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Plot accuracy
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

histories = [
    (history_cnn1, 'CNN Model 1 (Simple)', axes[0, 0]),
    (history_cnn2, 'CNN Model 2 (Deep)', axes[0, 1]),
    (history_efficientnet, 'EfficientNetV2M', axes[1, 0]),
    (history_mobilenet, 'MobileNetV2', axes[1, 1])
]

for history, title, ax in histories:
    # Plot accuracy
    ax.plot(history.history['accuracy'], label='Train Accuracy', linewidth=2)
    ax.plot(history.history['val_accuracy'], label='Val Accuracy', linewidth=2)
    ax.set_xlabel('Epoch')
    ax.set_ylabel('Accuracy')
    ax.set_title(f'{title} - Accuracy')
    ax.legend()
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---
# Part 3: Discussion

## CNN Architecture Experiments

### Model 1: Simple Baseline CNN
- **Architecture**: 3 convolutional blocks (32, 64, 128 filters)
- **Key features**: 
  - Single conv layer per block
  - Moderate dropout (0.25)
  - GlobalAveragePooling instead of Flatten to reduce parameters
  - Direct softmax classification without intermediate dense layers
- **Design rationale**: Establish a simple baseline with reasonable complexity. This architecture is efficient and serves as our reference point for comparison.

### Model 2: Deeper CNN
- **Architecture**: 4 convolutional blocks (32, 64, 128, 256 filters)
- **Key features**:
  - Two conv layers per block for deeper feature learning
  - Progressive dropout (0.2 -> 0.3 -> 0.4 -> 0.5) to prevent overfitting in deeper layers
  - Additional dense layer (256 units) before classification
  - More parameters for learning complex patterns
- **Design rationale**: Pokemon images contain intricate visual details (colors, shapes, textures) that may benefit from deeper feature extraction. The additional capacity allows the model to learn more nuanced representations.

### Experimentation Process
Before settling on these two architectures, several variations were considered:
1. **Filter sizes**: Tested 3x3 vs 5x5 kernels (3x3 performed better with less overfitting)
2. **Depth**: Tried 2-block (too simple), 3-block (good balance), and 4-block (better for complex data) configurations
3. **Pooling strategies**: Compared MaxPooling vs AveragePooling (MaxPooling captured features better)
4. **Regularization**: Experimented with different dropout rates and batch normalization placement
5. **Final layer**: Tested Flatten vs GlobalAveragePooling (GAP reduced parameters without hurting performance)

## Model Performance Summary

### Best Performing Model: [To be filled after training]
- **Model name**: [EfficientNetV2M / MobileNetV2 / CNN 1 / CNN 2]
- **Test accuracy**: [XX.X%]
- **Key strengths**: [Describe what this model did well]

### Worst Performing Model: [To be filled after training]
- **Model name**: [Model name]
- **Test accuracy**: [XX.X%]
- **Key weaknesses**: [Describe limitations]

### Performance Ranking (Expected):
1. **Transfer Learning Models** (EfficientNetV2M, MobileNetV2) - Expected to perform best
2. **CNN Model 2** (Deeper) - Should outperform simple CNN
3. **CNN Model 1** (Simple) - Baseline performance

## Performance Analysis

### Why Transfer Learning Models Likely Performed Best:
1. **Pre-trained features**: Both EfficientNetV2M and MobileNetV2 were pre-trained on ImageNet (1.4M images, 1000 classes). They learned generalizable features (edges, textures, shapes, colors) that transfer well to Pokemon classification.
2. **Efficient architecture**: These models use advanced techniques like:
   - Depthwise separable convolutions (MobileNetV2)
   - Compound scaling (EfficientNet)
   - Optimized for both accuracy and efficiency
3. **Less prone to overfitting**: Pre-trained weights provide a strong initialization, reducing the need to learn everything from scratch on a relatively small Pokemon dataset.

### Why CNN Model 2 Should Outperform CNN Model 1:
1. **Greater capacity**: More layers and filters allow learning more complex feature hierarchies
2. **Deeper representations**: Stacking two conv layers per block enables learning more abstract features
3. **Better feature refinement**: Additional dense layer before classification allows for better decision boundaries

### Potential Challenges:
1. **Dataset size**: If the Pokemon dataset is small, deeper models (CNN 2) might overfit despite regularization
2. **Class imbalance**: Underrepresented Pokemon types may be harder to classify accurately
3. **Visual similarity**: Some Pokemon types share similar visual characteristics (e.g., Flying vs Dragon), making classification harder
4. **Image quality**: Variation in image backgrounds, poses, and quality could affect model performance

### Observations on Training:
- **Data augmentation** helped prevent overfitting in custom CNNs by increasing effective dataset size
- **Early stopping** prevented overfitting by stopping training when validation loss stopped improving
- **Learning rate reduction** allowed models to fine-tune weights when learning plateaued
- Transfer learning was **much faster** to train since only the classification head needed training

### Model-Specific Insights:
- **CNN Model 1**: [Describe training behavior - did it converge quickly? Any overfitting?]
- **CNN Model 2**: [Did the deeper model overfit? How did regularization help?]
- **EfficientNetV2M**: [How quickly did it converge? Any signs of underfitting?]
- **MobileNetV2**: [How did it compare to EfficientNet in speed and accuracy?]

## Conclusion

This assignment demonstrated the effectiveness of both custom CNNs and transfer learning for image classification. The key takeaways are:

1. **Transfer learning is powerful**: Pre-trained models leverage knowledge from large datasets
2. **Architecture matters**: Deeper models can capture more complex patterns but risk overfitting
3. **Regularization is essential**: Dropout, batch normalization, and augmentation prevent overfitting
4. **Domain knowledge helps**: Understanding Pokemon characteristics could inform future feature engineering

Future improvements could include:
- Fine-tuning transfer learning models (unfreezing some base layers)
- Ensemble methods combining multiple models
- More sophisticated augmentation (color jittering, cutout, mixup)
- Attention mechanisms to focus on discriminative regions

---
# Part 4: Model Evaluation

Comprehensive evaluation of all models using the provided code block.

In [None]:
# Placeholder for Part 4 evaluation code
# This section will be filled with the evaluation code provided by the instructor