# Facial Expression Recognition (FER) Model Training
## Using FER 2013 Dataset

This notebook builds and trains a Convolutional Neural Network (CNN) for facial expression recognition using the FER 2013 dataset. It's optimized to run on Google Colab with GPU acceleration.

### Emotions Recognized:
- 0: Angry 😠
- 1: Disgust 🤢
- 2: Fear 😨
- 3: Happy 😊
- 4: Sad 😢
- 5: Surprise 😲
- 6: Neutral 😐

## 1. Setup and Environment Check

In [None]:
# Check if running on Google Colab
try:
    import google.colab
    IN_COLAB = True
    print("✅ Running on Google Colab")
except:
    IN_COLAB = False
    print("❌ Not running on Google Colab")

print("📝 Note: GPU availability will be checked after package installation")

## 2. Install Required Packages

In [None]:
# Install required packages with compatible versions
# First, uninstall existing torch packages to avoid conflicts
!pip uninstall -y torch torchvision torchaudio

# Check current environment
import subprocess
import sys

try:
    # Get CUDA version if available
    result = subprocess.run(['nvcc', '--version'], capture_output=True, text=True)
    if result.returncode == 0:
        print("CUDA version info:")
        print(result.stdout)
        
        # Extract CUDA version
        if "release 12" in result.stdout:
            cuda_version = "cu121"
        elif "release 11.8" in result.stdout:
            cuda_version = "cu118"
        else:
            cuda_version = "cu121"  # Default to latest
    else:
        cuda_version = "cu121"  # Default
        
except Exception as e:
    print(f"Could not detect CUDA version: {e}")
    cuda_version = "cu121"  # Default to latest

print(f"Installing PyTorch with CUDA support: {cuda_version}")

# Install PyTorch with appropriate CUDA version
if cuda_version == "cu118":
    !pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
else:
    !pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install other required packages
!pip install opencv-python-headless
!pip install matplotlib seaborn
!pip install scikit-learn
!pip install pandas numpy
!pip install pillow
!pip install tqdm

print("✅ All packages installed successfully!")

In [None]:
# Verify installation and compatibility
try:
    import torch
    import torchvision
    print(f"✅ PyTorch version: {torch.__version__}")
    print(f"✅ Torchvision version: {torchvision.__version__}")
    
    # Check CUDA availability and set device
    if torch.cuda.is_available():
        print(f"✅ CUDA available: {torch.version.cuda}")
        print(f"✅ GPU device: {torch.cuda.get_device_name(0)}")
        device = torch.device('cuda')
    else:
        print("⚠️ CUDA not available, will use CPU")
        device = torch.device('cpu')
    
    print(f"🎯 Using device: {device}")
    
    # Test basic functionality
    x = torch.randn(1, 3, 224, 224)
    if torch.cuda.is_available():
        x = x.cuda()
    print("✅ PyTorch tensor operations working correctly")
    
    # Test torchvision transforms
    import torchvision.transforms as transforms
    transform = transforms.Compose([transforms.Resize((224, 224))])
    print("✅ Torchvision transforms working correctly")
    
    # Make device available globally
    globals()['device'] = device
    
    print("\n🎉 All installations verified successfully!")
    print(f"🔧 Device variable set globally: {device}")
    
except Exception as e:
    print(f"❌ Installation verification failed: {e}")
    print("\nIf you see this error, please restart the runtime and try again:")
    print("Runtime → Restart runtime, then run the installation cell again.")
    # Set CPU as fallback
    device = torch.device('cpu')
    globals()['device'] = device

## 3. Import Libraries

In [None]:
# Import Libraries with error handling
import warnings
warnings.filterwarnings('ignore')

# PyTorch should already be imported from verification cell
try:
    # Check if torch is already available
    if 'torch' in globals():
        print("✅ PyTorch already imported from verification cell")
    else:
        import torch
        print("✅ PyTorch imported")
    
    import torch.nn as nn
    import torch.optim as optim
    import torch.nn.functional as F
    from torch.utils.data import Dataset, DataLoader
    print("✅ PyTorch modules imported successfully")
    
except ImportError as e:
    print(f"❌ PyTorch import error: {e}")
    print("Please restart runtime and run the installation cell again")
    raise

try:
    import torchvision.transforms as transforms
    print("✅ Torchvision transforms imported successfully")
except ImportError as e:
    print(f"❌ Torchvision import error: {e}")
    print("Please restart runtime and run the installation cell again")
    raise

try:
    import pandas as pd
    import numpy as np
    import cv2
    import matplotlib.pyplot as plt
    import seaborn as sns
    from PIL import Image
    from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
    from tqdm import tqdm
    import os
    print("✅ Other libraries imported successfully")
except ImportError as e:
    print(f"❌ Other library import error: {e}")
    print("Please run the installation cell again")
    raise

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Ensure device variable is available
if 'device' not in globals():
    print("⚠️ Device variable not found, setting to CPU")
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    globals()['device'] = device

# Final verification
print("\n🎉 All libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"Device available: {device}")
print(f"CUDA available: {torch.cuda.is_available()}")

## 4. Download and Setup FER 2013 Dataset

In [None]:
# Download and Setup FER 2013 Dataset
if IN_COLAB:
    # Mount Google Drive if needed
    from google.colab import drive
    drive.mount('/content/drive')
    
    # Install Kaggle API
    !pip install kaggle -q
    
    # Create directories
    os.makedirs('/content/data', exist_ok=True)
    
    print("📁 Setting up dataset directories...")
    print("\n🔗 To download FER 2013 dataset automatically:")
    print("1. Go to https://www.kaggle.com/account")
    print("2. Create a new API token (kaggle.json)")
    print("3. Upload kaggle.json to Colab files")
    print("4. Run the following commands:")
    print("   !mkdir -p ~/.kaggle")
    print("   !cp kaggle.json ~/.kaggle/")
    print("   !chmod 600 ~/.kaggle/kaggle.json")
    print("   !kaggle datasets download -d msambare/fer2013 -p /content/data")
    print("   !unzip /content/data/fer2013.zip -d /content/data/")
    
    print("\n🌐 Alternative: Manual download from:")
    print("   https://www.kaggle.com/datasets/msambare/fer2013")
    print("   Extract to: /content/data/fer2013/")
    
    dataset_path = '/content/data/fer2013'
else:
    # Local setup
    dataset_path = './data/fer2013'
    os.makedirs(dataset_path, exist_ok=True)
    print(f"📁 Local dataset path: {dataset_path}")
    print("🌐 Please download FER 2013 dataset from:")
    print("   https://www.kaggle.com/datasets/msambare/fer2013")
    print(f"   Extract to: {dataset_path}")

print(f"\n📍 Dataset path: {dataset_path}")

# Check if dataset exists
if os.path.exists(dataset_path):
    subdirs = os.listdir(dataset_path)
    print(f"✅ Dataset directory found with contents: {subdirs}")
else:
    print("⚠️ Dataset directory not found - will use dummy data for demonstration")

## 5. Custom Dataset Class for FER 2013

In [None]:
class FER2013Dataset(Dataset):
    def __init__(self, data_dir, split='train', transform=None):
        """
        FER 2013 Dataset class
        
        Args:
            data_dir (str): Path to the dataset directory
            split (str): 'train', 'test', or 'validation'
            transform: Data transformations to apply
        """
        self.data_dir = data_dir
        self.split = split
        self.transform = transform
        
        # Emotion labels
        self.emotion_labels = {
            'angry': 0, 'disgust': 1, 'fear': 2, 'happy': 3,
            'sad': 4, 'surprise': 5, 'neutral': 6
        }
        
        self.label_to_emotion = {v: k for k, v in self.emotion_labels.items()}
        
        # Load image paths and labels
        self.images = []
        self.labels = []
        
        split_dir = os.path.join(data_dir, split)
        
        if os.path.exists(split_dir):
            for emotion in self.emotion_labels.keys():
                emotion_dir = os.path.join(split_dir, emotion)
                if os.path.exists(emotion_dir):
                    for img_file in os.listdir(emotion_dir):
                        if img_file.lower().endswith(('.png', '.jpg', '.jpeg')):
                            self.images.append(os.path.join(emotion_dir, img_file))
                            self.labels.append(self.emotion_labels[emotion])
        
        print(f"Loaded {len(self.images)} images for {split} split")
    
    def __len__(self):
        return len(self.images)
    
    def __getitem__(self, idx):
        img_path = self.images[idx]
        label = self.labels[idx]
        
        # Load image
        image = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
        
        if image is None:
            # Create a dummy image if loading fails
            image = np.zeros((48, 48), dtype=np.uint8)
        
        # Resize to 48x48 if needed
        if image.shape != (48, 48):
            image = cv2.resize(image, (48, 48))
        
        # Convert to PIL Image for transforms
        image = Image.fromarray(image)
        
        if self.transform:
            image = self.transform(image)
        
        return image, label

print("✅ FER2013Dataset class defined")

## 6. Data Transformations and Augmentation

In [None]:
# Define data transformations
train_transform = transforms.Compose([
    transforms.Resize((48, 48)),
    transforms.RandomHorizontalFlip(p=0.5),
    transforms.RandomRotation(degrees=10),
    transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5], std=[0.5])  # Normalize to [-1, 1]
])

val_test_transform = transforms.Compose([
    transforms.Resize((48, 48)),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5], std=[0.5])
])

print("✅ Data transformations defined")

## 7. Load Datasets

In [None]:
# Create datasets
try:
    train_dataset = FER2013Dataset(dataset_path, split='train', transform=train_transform)
    val_dataset = FER2013Dataset(dataset_path, split='validation', transform=val_test_transform)
    test_dataset = FER2013Dataset(dataset_path, split='test', transform=val_test_transform)
    
    print(f"Train dataset size: {len(train_dataset)}")
    print(f"Validation dataset size: {len(val_dataset)}")
    print(f"Test dataset size: {len(test_dataset)}")
    
except Exception as e:
    print(f"Error loading dataset: {e}")
    print("Please make sure the FER 2013 dataset is properly downloaded and extracted.")
    
    # Create dummy datasets for demonstration
    print("Creating dummy datasets for demonstration...")
    
    class DummyDataset(Dataset):
        def __init__(self, size=1000, transform=None):
            self.size = size
            self.transform = transform
        
        def __len__(self):
            return self.size
        
        def __getitem__(self, idx):
            # Generate random grayscale image
            image = torch.randn(1, 48, 48)
            label = torch.randint(0, 7, (1,)).item()
            return image, label
    
    train_dataset = DummyDataset(size=20000)
    val_dataset = DummyDataset(size=3000)
    test_dataset = DummyDataset(size=3000)

## 8. Create Data Loaders

In [None]:
# Load Datasets with robust error handling for FER2013 structure
def create_dummy_dataset(split_name, num_samples):
    """Create a dummy dataset for demonstration when real data isn't available"""
    
    class DummyFERDataset(Dataset):
        def __init__(self, num_samples, transform=None):
            self.num_samples = num_samples
            self.transform = transform
            self.emotion_labels = {
                'angry': 0, 'disgust': 1, 'fear': 2, 'happy': 3,
                'sad': 4, 'surprise': 5, 'neutral': 6
            }
            
        def __len__(self):
            return self.num_samples
        
        def __getitem__(self, idx):
            # Generate realistic-looking face image (48x48 grayscale)
            np.random.seed(idx)  # Consistent images for same index
            
            # Create a face-like pattern
            image = np.random.randint(50, 200, (48, 48), dtype=np.uint8)
            
            # Add some face-like features
            # Eyes
            cv2.circle(image, (15, 18), 3, 100, -1)
            cv2.circle(image, (33, 18), 3, 100, -1)
            
            # Nose
            cv2.circle(image, (24, 28), 2, 120, -1)
            
            # Mouth (varies by emotion)
            emotion = idx % 7
            if emotion == 3:  # Happy
                cv2.ellipse(image, (24, 35), (8, 4), 0, 0, 180, 80, -1)
            elif emotion == 4:  # Sad
                cv2.ellipse(image, (24, 38), (8, 4), 0, 180, 360, 80, -1)
            else:
                cv2.line(image, (18, 36), (30, 36), 90, 2)
            
            # Convert to PIL Image
            image = Image.fromarray(image)
            
            if self.transform:
                image = self.transform(image)
            
            label = emotion
            return image, label
    
    return DummyFERDataset(num_samples)

# Try to load real datasets first - FER2013 typically has train/test structure
dataset_loaded = False
try:
    if os.path.exists(dataset_path):
        # Check for available directories
        available_dirs = []
        for split in ['train', 'test', 'validation']:
            if os.path.exists(os.path.join(dataset_path, split)):
                available_dirs.append(split)
        
        print(f"📂 Found dataset directories: {available_dirs}")
        
        # Strategy 1: train + test directories (most common FER2013 structure)
        if 'train' in available_dirs and 'test' in available_dirs:
            print("🎯 Using FER2013 standard structure: train + test")
            
            # Load full training dataset
            full_train_dataset = FER2013Dataset(dataset_path, split='train', transform=train_transform)
            test_dataset = FER2013Dataset(dataset_path, split='test', transform=val_test_transform)
            
            if len(full_train_dataset) > 0:
                # Split training data into train/validation (80/20 split)
                from torch.utils.data import random_split
                train_size = int(0.8 * len(full_train_dataset))
                val_size = len(full_train_dataset) - train_size
                
                # Set generator for reproducible splits
                generator = torch.Generator().manual_seed(42)
                train_dataset, val_dataset = random_split(
                    full_train_dataset, 
                    [train_size, val_size], 
                    generator=generator
                )
                
                dataset_loaded = True
                print(f"✅ Successfully loaded and split FER 2013 dataset!")
                print(f"   Original train: {len(full_train_dataset)} samples")
                print(f"   Split train: {len(train_dataset)} samples (80%)")
                print(f"   Validation: {len(val_dataset)} samples (20%)") 
                print(f"   Test: {len(test_dataset)} samples")
                
        # Strategy 2: train + test + validation directories (if validation exists)
        elif 'train' in available_dirs and 'test' in available_dirs and 'validation' in available_dirs:
            print("🎯 Using FER2013 extended structure: train + validation + test")
            
            train_dataset = FER2013Dataset(dataset_path, split='train', transform=train_transform)
            val_dataset = FER2013Dataset(dataset_path, split='validation', transform=val_test_transform)
            test_dataset = FER2013Dataset(dataset_path, split='test', transform=val_test_transform)
            
            if len(train_dataset) > 0:
                dataset_loaded = True
                print(f"✅ Successfully loaded FER 2013 dataset with existing validation!")
                print(f"   Train: {len(train_dataset)} samples")
                print(f"   Validation: {len(val_dataset)} samples") 
                print(f"   Test: {len(test_dataset)} samples")
        
        # Strategy 3: Only training data available
        elif 'train' in available_dirs:
            print("🎯 Only training data found, splitting into train/val/test")
            
            full_dataset = FER2013Dataset(dataset_path, split='train', transform=train_transform)
            
            if len(full_dataset) > 0:
                # Split: 70% train, 15% validation, 15% test
                total_size = len(full_dataset)
                train_size = int(0.7 * total_size)
                val_size = int(0.15 * total_size)
                test_size = total_size - train_size - val_size
                
                generator = torch.Generator().manual_seed(42)
                train_dataset, val_dataset, test_dataset = random_split(
                    full_dataset, 
                    [train_size, val_size, test_size],
                    generator=generator
                )
                
                dataset_loaded = True
                print(f"✅ Split single dataset into train/val/test!")
                print(f"   Train: {len(train_dataset)} samples (70%)")
                print(f"   Validation: {len(val_dataset)} samples (15%)") 
                print(f"   Test: {len(test_dataset)} samples (15%)")
        
        else:
            print("❌ No valid dataset structure found")
            print("Expected: 'train' directory (required), 'test' directory (optional)")
    else:
        print("❌ Dataset directory does not exist")
        
except Exception as e:
    print(f"❌ Error loading real dataset: {e}")

# Create dummy datasets if real data not available
if not dataset_loaded:
    print("\n🎭 Creating dummy datasets for demonstration...")
    print("   (These contain synthetic face-like images with basic emotion patterns)")
    
    train_dataset = create_dummy_dataset('train', 20000)
    val_dataset = create_dummy_dataset('validation', 3000)  
    test_dataset = create_dummy_dataset('test', 3000)
    
    # Apply transforms to dummy datasets
    train_dataset.transform = train_transform
    val_dataset.transform = val_test_transform
    test_dataset.transform = val_test_transform
    
    print(f"✅ Dummy datasets created:")
    print(f"   Train: {len(train_dataset)} samples")
    print(f"   Validation: {len(val_dataset)} samples")
    print(f"   Test: {len(test_dataset)} samples")
    print("\n💡 To use real data, download FER 2013 dataset and rerun this cell")

# Final verification
print(f"\n📊 Final dataset sizes:")
print(f"   Training: {len(train_dataset)}")
print(f"   Validation: {len(val_dataset)}")
print(f"   Test: {len(test_dataset)}")

# Verify splits are balanced (for real datasets)
if dataset_loaded and hasattr(train_dataset, 'dataset'):
    print(f"\n🔄 Dataset split summary:")
    print(f"   Original dataset: {len(train_dataset.dataset) if hasattr(train_dataset, 'dataset') else 'N/A'}")
    print(f"   Train split: {len(train_dataset)} ({len(train_dataset)/(len(train_dataset)+len(val_dataset))*100:.1f}%)")
    print(f"   Validation split: {len(val_dataset)} ({len(val_dataset)/(len(train_dataset)+len(val_dataset))*100:.1f}%)")

if len(train_dataset) == 0:
    raise ValueError("Training dataset is empty! Please check dataset setup.")

## 9. Visualize Sample Data

In [None]:
# Create Data Loaders with validation
# Verify datasets are not empty
for dataset_name, dataset in [('train', train_dataset), ('validation', val_dataset), ('test', test_dataset)]:
    if len(dataset) == 0:
        raise ValueError(f"{dataset_name} dataset is empty! Cannot create data loaders.")
    print(f"✅ {dataset_name} dataset verified: {len(dataset)} samples")

# Set batch size based on available memory and dataset size
min_samples = min(len(train_dataset), len(val_dataset), len(test_dataset))
if device.type == 'cuda':
    BATCH_SIZE = min(64, min_samples)  # Don't exceed available samples
else:
    BATCH_SIZE = min(32, min_samples)

# Ensure batch size is reasonable
BATCH_SIZE = max(1, min(BATCH_SIZE, 64))

NUM_WORKERS = 2 if IN_COLAB else 0

print(f"📦 Using batch size: {BATCH_SIZE}")
print(f"👥 Using {NUM_WORKERS} workers")

# Create data loaders with error handling
try:
    train_loader = DataLoader(
        train_dataset, 
        batch_size=BATCH_SIZE, 
        shuffle=True, 
        num_workers=NUM_WORKERS,
        pin_memory=True if device.type == 'cuda' else False,
        drop_last=True if len(train_dataset) > BATCH_SIZE else False
    )

    val_loader = DataLoader(
        val_dataset, 
        batch_size=BATCH_SIZE, 
        shuffle=False, 
        num_workers=NUM_WORKERS,
        pin_memory=True if device.type == 'cuda' else False,
        drop_last=False
    )

    test_loader = DataLoader(
        test_dataset, 
        batch_size=BATCH_SIZE, 
        shuffle=False, 
        num_workers=NUM_WORKERS,
        pin_memory=True if device.type == 'cuda' else False,
        drop_last=False
    )

    print(f"✅ Data loaders created successfully!")
    print(f"📊 Training batches: {len(train_loader)}")
    print(f"📊 Validation batches: {len(val_loader)}")
    print(f"📊 Test batches: {len(test_loader)}")
    
    # Test data loader functionality
    print("\n🧪 Testing data loaders...")
    try:
        # Test training loader
        train_iter = iter(train_loader)
        batch_data, batch_labels = next(train_iter)
        print(f"✅ Train batch shape: {batch_data.shape}, Labels shape: {batch_labels.shape}")
        
        # Test validation loader  
        val_iter = iter(val_loader)
        batch_data, batch_labels = next(val_iter)
        print(f"✅ Validation batch shape: {batch_data.shape}, Labels shape: {batch_labels.shape}")
        
        print("🎉 All data loaders working correctly!")
        
    except Exception as e:
        print(f"⚠️ Data loader test failed: {e}")
        
except Exception as e:
    print(f"❌ Error creating data loaders: {e}")
    print("This might be due to dataset issues. Please check the dataset loading cell.")
    raise

## 10. Define CNN Model Architecture

## 9. Visualize Sample Data

In [None]:
# Visualize Sample Data
# Emotion labels for visualization
emotion_names = ['Angry', 'Disgust', 'Fear', 'Happy', 'Sad', 'Surprise', 'Neutral']
emotion_emojis = ['😠', '🤢', '😨', '😊', '😢', '😲', '😐']

def visualize_samples(data_loader, num_samples=8, title="Sample Images"):
    """Visualize sample images from a data loader"""
    try:
        data_iter = iter(data_loader)
        images, labels = next(data_iter)
        
        # Handle case where batch size is smaller than num_samples
        actual_samples = min(num_samples, len(images))
        
        # Calculate subplot dimensions
        cols = 4
        rows = (actual_samples + cols - 1) // cols  # Ceiling division
        
        fig, axes = plt.subplots(rows, cols, figsize=(12, 3*rows))
        if rows == 1:
            axes = axes.reshape(1, -1)
        
        for i in range(actual_samples):
            row, col = i // cols, i % cols
            
            # Convert tensor to numpy and denormalize
            img = images[i].squeeze().numpy()
            
            # Handle different normalization ranges
            if img.min() < 0:  # If normalized to [-1, 1]
                img = (img + 1) / 2  # Convert to [0, 1]
            elif img.max() > 1:  # If not normalized
                img = img / 255.0   # Convert to [0, 1]
            
            # Ensure values are in valid range
            img = np.clip(img, 0, 1)
            
            axes[row, col].imshow(img, cmap='gray')
            axes[row, col].set_title(f'{emotion_names[labels[i]]} {emotion_emojis[labels[i]]}')
            axes[row, col].axis('off')
        
        # Hide empty subplots
        for i in range(actual_samples, rows * cols):
            row, col = i // cols, i % cols
            axes[row, col].axis('off')
        
        plt.suptitle(title, fontsize=14, fontweight='bold')
        plt.tight_layout()
        plt.show()
        
        # Show label distribution in this batch
        unique_labels, counts = np.unique(labels.numpy(), return_counts=True)
        print(f"📊 Labels in batch: {dict(zip([emotion_names[l] for l in unique_labels], counts))}")
        
    except Exception as e:
        print(f"❌ Visualization error: {e}")
        print("This might be due to data format issues.")

# Visualize samples from different sets
print("🖼️ Sample images from training set:")
visualize_samples(train_loader, title="Training Set Samples")

print("\n🖼️ Sample images from validation set:")
visualize_samples(val_loader, num_samples=4, title="Validation Set Samples")

# Show overall dataset statistics
print("\n📈 Dataset Statistics:")
print("-" * 40)

def get_label_distribution(dataset):
    """Get label distribution for a dataset"""
    labels = []
    
    # Sample a subset for efficiency (especially for large datasets)
    sample_size = min(1000, len(dataset))
    indices = np.random.choice(len(dataset), sample_size, replace=False)
    
    for idx in indices:
        _, label = dataset[idx]
        labels.append(label)
    
    unique_labels, counts = np.unique(labels, return_counts=True)
    return dict(zip([emotion_names[l] for l in unique_labels], counts))

try:
    train_dist = get_label_distribution(train_dataset)
    print(f"Training set distribution: {train_dist}")
    
    val_dist = get_label_distribution(val_dataset)
    print(f"Validation set distribution: {val_dist}")
    
except Exception as e:
    print(f"Could not calculate distribution: {e}")

print(f"\n✅ Data visualization complete!")
print(f"Ready to proceed with model training on {len(train_dataset)} training samples.")

In [None]:
class EmotionCNN(nn.Module):
    def __init__(self, num_classes=7, dropout_rate=0.5):
        super(EmotionCNN, self).__init__()
        
        # Convolutional layers
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        self.conv4 = nn.Conv2d(128, 256, kernel_size=3, padding=1)
        self.bn4 = nn.BatchNorm2d(256)
        
        # Pooling
        self.pool = nn.MaxPool2d(2, 2)
        self.adaptive_pool = nn.AdaptiveAvgPool2d((1, 1))
        
        # Dropout
        self.dropout = nn.Dropout(dropout_rate)
        
        # Fully connected layers
        self.fc1 = nn.Linear(256, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, num_classes)
        
    def forward(self, x):
        # First conv block
        x = self.pool(F.relu(self.bn1(self.conv1(x))))
        x = self.pool(F.relu(self.bn2(self.conv2(x))))
        
        # Second conv block
        x = self.pool(F.relu(self.bn3(self.conv3(x))))
        x = self.pool(F.relu(self.bn4(self.conv4(x))))
        
        # Global average pooling
        x = self.adaptive_pool(x)
        x = x.view(x.size(0), -1)
        
        # Fully connected layers
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = F.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.fc3(x)
        
        return x

# Create model instance
model = EmotionCNN(num_classes=7, dropout_rate=0.5)
model = model.to(device)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

print(f"✅ Model created and moved to {device}")
print(f"Total parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

# Model summary
print("\nModel Architecture:")
print(model)

## 11. Define Loss Function and Optimizer

In [None]:
# Define Loss Function and Optimizer
# Check if model is defined
try:
    if 'model' not in locals():
        raise NameError("Model not found. Please run the model definition cell first.")
    
    # Loss function
    criterion = nn.CrossEntropyLoss()

    # Optimizer
    optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)

    # Learning rate scheduler
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(
        optimizer, mode='min', factor=0.5, patience=5, verbose=True
    )

    print("✅ Loss function, optimizer, and scheduler defined")
    print(f"Loss function: {criterion}")
    print(f"Optimizer: Adam with lr=0.001, weight_decay=1e-4")
    print(f"Scheduler: ReduceLROnPlateau with patience=5")
    
except NameError as e:
    print(f"❌ Error: {e}")
    print("\n🔧 To fix this:")
    print("1. Make sure you've run all previous cells in order")
    print("2. Especially run the 'Define CNN Model Architecture' cell")
    print("3. Check that the model was created successfully")
    raise
    
except Exception as e:
    print(f"❌ Unexpected error: {e}")
    print("Please check the model definition and try again.")
    raise

## 12. Training Functions

In [None]:
def train_epoch(model, train_loader, criterion, optimizer, device):
    """Train the model for one epoch"""
    model.train()
    running_loss = 0.0
    correct = 0
    total = 0
    
    progress_bar = tqdm(train_loader, desc='Training')
    
    for batch_idx, (data, target) in enumerate(progress_bar):
        data, target = data.to(device), target.to(device)
        
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
        _, predicted = torch.max(output.data, 1)
        total += target.size(0)
        correct += (predicted == target).sum().item()
        
        # Update progress bar
        progress_bar.set_postfix({
            'Loss': f'{running_loss/(batch_idx+1):.4f}',
            'Acc': f'{100.*correct/total:.2f}%'
        })
    
    epoch_loss = running_loss / len(train_loader)
    epoch_acc = 100. * correct / total
    
    return epoch_loss, epoch_acc

def validate_epoch(model, val_loader, criterion, device):
    """Validate the model"""
    model.eval()
    running_loss = 0.0
    correct = 0
    total = 0
    
    with torch.no_grad():
        progress_bar = tqdm(val_loader, desc='Validation')
        
        for batch_idx, (data, target) in enumerate(progress_bar):
            data, target = data.to(device), target.to(device)
            
            output = model(data)
            loss = criterion(output, target)
            
            running_loss += loss.item()
            _, predicted = torch.max(output.data, 1)
            total += target.size(0)
            correct += (predicted == target).sum().item()
            
            # Update progress bar
            progress_bar.set_postfix({
                'Loss': f'{running_loss/(batch_idx+1):.4f}',
                'Acc': f'{100.*correct/total:.2f}%'
            })
    
    epoch_loss = running_loss / len(val_loader)
    epoch_acc = 100. * correct / total
    
    return epoch_loss, epoch_acc

print("✅ Training and validation functions defined")

## 13. Training Loop

In [None]:
# Training parameters
NUM_EPOCHS = 50
EARLY_STOPPING_PATIENCE = 10
BEST_VAL_LOSS = float('inf')
PATIENCE_COUNTER = 0

# Lists to store training history
train_losses = []
train_accuracies = []
val_losses = []
val_accuracies = []

print(f"Starting training for {NUM_EPOCHS} epochs...")
print(f"Device: {device}")
print(f"Batch size: {BATCH_SIZE}")
print("-" * 60)

for epoch in range(NUM_EPOCHS):
    print(f"\nEpoch {epoch+1}/{NUM_EPOCHS}")
    print("-" * 40)
    
    # Training phase
    train_loss, train_acc = train_epoch(model, train_loader, criterion, optimizer, device)
    
    # Validation phase
    val_loss, val_acc = validate_epoch(model, val_loader, criterion, device)
    
    # Update learning rate scheduler
    scheduler.step(val_loss)
    
    # Store metrics
    train_losses.append(train_loss)
    train_accuracies.append(train_acc)
    val_losses.append(val_loss)
    val_accuracies.append(val_acc)
    
    # Print epoch results
    print(f"Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%")
    print(f"Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.2f}%")
    print(f"Learning Rate: {optimizer.param_groups[0]['lr']:.6f}")
    
    # Early stopping and model saving
    if val_loss < BEST_VAL_LOSS:
        BEST_VAL_LOSS = val_loss
        PATIENCE_COUNTER = 0
        
        # Save best model
        if IN_COLAB:
            torch.save(model.state_dict(), '/content/best_fer_model.pth')
        else:
            torch.save(model.state_dict(), 'best_fer_model.pth')
        
        print(f"✅ New best model saved! Val Loss: {val_loss:.4f}")
    else:
        PATIENCE_COUNTER += 1
        print(f"⏳ No improvement. Patience: {PATIENCE_COUNTER}/{EARLY_STOPPING_PATIENCE}")
    
    # Early stopping
    if PATIENCE_COUNTER >= EARLY_STOPPING_PATIENCE:
        print(f"\n🛑 Early stopping triggered after {epoch+1} epochs")
        break

print("\n🎉 Training completed!")
print(f"Best validation loss: {BEST_VAL_LOSS:.4f}")

## 14. Plot Training History

In [None]:
# Plot training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Plot loss
ax1.plot(train_losses, label='Training Loss', color='blue')
ax1.plot(val_losses, label='Validation Loss', color='red')
ax1.set_title('Model Loss')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.legend()
ax1.grid(True)

# Plot accuracy
ax2.plot(train_accuracies, label='Training Accuracy', color='blue')
ax2.plot(val_accuracies, label='Validation Accuracy', color='red')
ax2.set_title('Model Accuracy')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy (%)')
ax2.legend()
ax2.grid(True)

plt.tight_layout()
plt.show()

# Print final metrics
print(f"Final Training Accuracy: {train_accuracies[-1]:.2f}%")
print(f"Final Validation Accuracy: {val_accuracies[-1]:.2f}%")
print(f"Best Validation Loss: {min(val_losses):.4f}")
print(f"Best Validation Accuracy: {max(val_accuracies):.2f}%")

## 15. Load Best Model and Test

In [None]:
# Load the best model
try:
    if IN_COLAB:
        model.load_state_dict(torch.load('/content/best_fer_model.pth'))
    else:
        model.load_state_dict(torch.load('best_fer_model.pth'))
    print("✅ Best model loaded successfully")
except:
    print("⚠️ Could not load saved model, using current model")

# Test the model
def test_model(model, test_loader, device):
    model.eval()
    all_predictions = []
    all_targets = []
    
    with torch.no_grad():
        progress_bar = tqdm(test_loader, desc='Testing')
        
        for data, target in progress_bar:
            data, target = data.to(device), target.to(device)
            output = model(data)
            _, predicted = torch.max(output, 1)
            
            all_predictions.extend(predicted.cpu().numpy())
            all_targets.extend(target.cpu().numpy())
    
    return np.array(all_predictions), np.array(all_targets)

# Run test
print("Testing the model...")
test_predictions, test_targets = test_model(model, test_loader, device)

# Calculate test accuracy
test_accuracy = accuracy_score(test_targets, test_predictions)
print(f"\n🎯 Test Accuracy: {test_accuracy*100:.2f}%")

## 16. Detailed Evaluation Metrics

In [None]:
# Classification report
print("📊 Classification Report:")
print("-" * 50)
class_report = classification_report(
    test_targets, 
    test_predictions, 
    target_names=emotion_names,
    digits=4
)
print(class_report)

# Confusion Matrix
cm = confusion_matrix(test_targets, test_predictions)

# Plot confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(
    cm, 
    annot=True, 
    fmt='d', 
    cmap='Blues',
    xticklabels=[f'{name}\n{emoji}' for name, emoji in zip(emotion_names, emotion_emojis)],
    yticklabels=[f'{name}\n{emoji}' for name, emoji in zip(emotion_names, emotion_emojis)]
)
plt.title('Confusion Matrix - Facial Expression Recognition')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.tight_layout()
plt.show()

# Per-class accuracy
print("\n📈 Per-class Accuracy:")
print("-" * 30)
for i, emotion in enumerate(emotion_names):
    class_mask = test_targets == i
    if np.sum(class_mask) > 0:
        class_acc = np.sum(test_predictions[class_mask] == i) / np.sum(class_mask)
        print(f"{emotion} {emotion_emojis[i]}: {class_acc*100:.2f}%")

## 17. Sample Predictions Visualization

In [None]:
def visualize_predictions(model, test_loader, device, num_samples=12):
    """Visualize sample predictions"""
    model.eval()
    
    # Get a batch of test data
    data_iter = iter(test_loader)
    images, labels = next(data_iter)
    images, labels = images.to(device), labels.to(device)
    
    with torch.no_grad():
        outputs = model(images)
        _, predictions = torch.max(outputs, 1)
        probabilities = F.softmax(outputs, dim=1)
    
    # Move back to CPU for visualization
    images = images.cpu()
    labels = labels.cpu()
    predictions = predictions.cpu()
    probabilities = probabilities.cpu()
    
    # Create subplot
    fig, axes = plt.subplots(3, 4, figsize=(16, 12))
    axes = axes.ravel()
    
    for i in range(min(num_samples, len(images))):
        # Denormalize image
        img = images[i].squeeze().numpy()
        img = (img + 1) / 2  # Convert from [-1, 1] to [0, 1]
        
        # Get prediction info
        true_label = labels[i].item()
        pred_label = predictions[i].item()
        confidence = probabilities[i][pred_label].item()
        
        # Set title color based on correctness
        color = 'green' if true_label == pred_label else 'red'
        
        # Plot image
        axes[i].imshow(img, cmap='gray')
        axes[i].set_title(
            f'True: {emotion_names[true_label]} {emotion_emojis[true_label]}\n'
            f'Pred: {emotion_names[pred_label]} {emotion_emojis[pred_label]}\n'
            f'Conf: {confidence:.2f}',
            color=color,
            fontsize=10
        )
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

print("Sample predictions (Green = Correct, Red = Incorrect):")
visualize_predictions(model, test_loader, device)

## 18. Save Final Model and Results

In [None]:
# Save the final model
if IN_COLAB:
    model_path = '/content/fer2013_final_model.pth'
    results_path = '/content/training_results.json'
else:
    model_path = 'fer2013_final_model.pth'
    results_path = 'training_results.json'

# Save model
torch.save({
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'test_accuracy': test_accuracy,
    'best_val_loss': BEST_VAL_LOSS,
    'emotion_labels': emotion_names,
    'model_architecture': str(model)
}, model_path)

print(f"✅ Model saved to: {model_path}")

# Save training results
import json

results = {
    'test_accuracy': float(test_accuracy),
    'best_validation_loss': float(BEST_VAL_LOSS),
    'best_validation_accuracy': float(max(val_accuracies)),
    'final_training_accuracy': float(train_accuracies[-1]),
    'final_validation_accuracy': float(val_accuracies[-1]),
    'epochs_trained': len(train_losses),
    'train_losses': [float(x) for x in train_losses],
    'train_accuracies': [float(x) for x in train_accuracies],
    'val_losses': [float(x) for x in val_losses],
    'val_accuracies': [float(x) for x in val_accuracies],
    'emotion_labels': emotion_names,
    'model_parameters': total_params
}

with open(results_path, 'w') as f:
    json.dump(results, f, indent=2)

print(f"✅ Results saved to: {results_path}")

# Summary
print("\n" + "="*60)
print("🎉 TRAINING SUMMARY")
print("="*60)
print(f"📊 Test Accuracy: {test_accuracy*100:.2f}%")
print(f"📈 Best Validation Accuracy: {max(val_accuracies):.2f}%")
print(f"📉 Best Validation Loss: {BEST_VAL_LOSS:.4f}")
print(f"🕐 Epochs Trained: {len(train_losses)}")
print(f"🔧 Model Parameters: {total_params:,}")
print(f"💾 Model saved to: {model_path}")
print(f"📋 Results saved to: {results_path}")
print("="*60)

## 19. Model Inference Function

In [None]:
def predict_emotion(model, image_path, transform, device):
    """
    Predict emotion from a single image
    
    Args:
        model: Trained emotion recognition model
        image_path: Path to the image file
        transform: Image preprocessing transforms
        device: Device to run inference on
    
    Returns:
        predicted_emotion: Predicted emotion name
        confidence: Prediction confidence
        probabilities: All class probabilities
    """
    model.eval()
    
    # Load and preprocess image
    image = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    if image is None:
        raise ValueError(f"Could not load image from {image_path}")
    
    # Resize to 48x48
    image = cv2.resize(image, (48, 48))
    
    # Convert to PIL Image and apply transforms
    image = Image.fromarray(image)
    image_tensor = transform(image).unsqueeze(0).to(device)
    
    # Make prediction
    with torch.no_grad():
        output = model(image_tensor)
        probabilities = F.softmax(output, dim=1)
        confidence, predicted = torch.max(probabilities, 1)
    
    predicted_emotion = emotion_names[predicted.item()]
    confidence_score = confidence.item()
    all_probs = probabilities.cpu().numpy()[0]
    
    return predicted_emotion, confidence_score, all_probs

def display_prediction_results(emotion, confidence, probabilities):
    """
    Display prediction results in a formatted way
    """
    print(f"\n🎯 Predicted Emotion: {emotion} {emotion_emojis[emotion_names.index(emotion)]}")
    print(f"📊 Confidence: {confidence:.4f} ({confidence*100:.2f}%)")
    print("\n📈 All Probabilities:")
    print("-" * 40)
    
    for i, (emotion_name, emoji, prob) in enumerate(zip(emotion_names, emotion_emojis, probabilities)):
        print(f"{emotion_name} {emoji}: {prob:.4f} ({prob*100:.2f}%)")

print("✅ Inference functions defined")
print("\nTo use the model for inference on a new image:")
print("```python")
print("emotion, confidence, probs = predict_emotion(model, 'path/to/image.jpg', val_test_transform, device)")
print("display_prediction_results(emotion, confidence, probs)")
print("```")

## 20. Instructions for Using the Trained Model

### 🚀 How to Use Your Trained FER Model

#### **Loading the Model:**
```python
# Create model instance
model = EmotionCNN(num_classes=7)
model.load_state_dict(torch.load('fer2013_final_model.pth')['model_state_dict'])
model.eval()
```

#### **Making Predictions:**
```python
# For a single image
emotion, confidence, probs = predict_emotion(
    model, 
    'path/to/your/image.jpg', 
    val_test_transform, 
    device
)
display_prediction_results(emotion, confidence, probs)
```

#### **Integration Tips:**
- The model expects 48x48 grayscale images
- Use the same preprocessing transforms as during training
- The model outputs 7 emotion classes: Angry, Disgust, Fear, Happy, Sad, Surprise, Neutral
- For real-time applications, consider using GPU acceleration

#### **Performance Expectations:**
- Test accuracy achieved: **{test_accuracy*100:.2f}%**
- Best validation accuracy: **{max(val_accuracies):.2f}%**
- Model size: **{total_params:,} parameters**

#### **Next Steps:**
1. **Fine-tuning**: Retrain on your specific domain data
2. **Deployment**: Convert to ONNX for production use
3. **Integration**: Combine with face detection for end-to-end emotion recognition
4. **Evaluation**: Test on your specific use case data

---

### 📚 Additional Resources:
- [PyTorch Documentation](https://pytorch.org/docs/)
- [FER 2013 Dataset](https://www.kaggle.com/datasets/msambare/fer2013)
- [OpenCV Face Detection](https://docs.opencv.org/4.x/db/d28/tutorial_cascade_classifier.html)

### 🎉 Congratulations!
You have successfully trained a Facial Expression Recognition model! The model is ready for inference and can be integrated into your applications.