# Handwritten Names Recognition - Google Colab Training

This notebook trains a Transformer model for handwritten name recognition using Google Colab's GPU.

**Before running:**
1. Get your Kaggle API key: Go to kaggle.com → Account → Create New API Token
2. Set runtime to GPU: `Runtime > Change runtime type > T4 GPU`
3. Run cells in order

**Dataset will be downloaded directly from Kaggle - no need to upload!**

**Model:** Using Vision Transformer (ViT) with patch embeddings (~5.8M parameters). To switch to CRNN, change `USE_TRANSFORMER = False` in the config cell below.

## Setup

In [None]:
# Mount Google Drive (only for saving checkpoints)
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Clone repository
!git clone https://github.com/sdswitz/handwritten-names.git
%cd handwritten-names

In [None]:
# Install dependencies
!pip install -q torch torchvision pillow pandas numpy tqdm python-Levenshtein kaggle

## Download Dataset from Kaggle

Upload your `kaggle.json` file when prompted below.

**To get kaggle.json:**
1. Go to https://www.kaggle.com/
2. Click your profile picture → Account
3. Scroll to API section
4. Click "Create New API Token"
5. Upload the downloaded file below

In [None]:
# Upload kaggle.json
from google.colab import files
uploaded = files.upload()

# Setup Kaggle credentials
!mkdir -p ~/.kaggle
!mv kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

print("✓ Kaggle credentials configured")

In [None]:
# Download dataset from Kaggle
print("Downloading dataset from Kaggle...")
!kaggle datasets download -d landlord/handwriting-recognition

print("\nExtracting dataset...")
!unzip -q handwriting-recognition.zip -d /content/data

print("\n✓ Dataset ready!")
!ls -lh /content/data/

## Configure for Colab

In [None]:
# Overwrite config.py with Colab-specific settings
config_updates = """import torch

class Config:
    # Paths - data downloaded from Kaggle to /content/data/
    DATA_DIR = '/content/data/'
    TRAIN_CSV = 'written_name_train_v2.csv'
    VAL_CSV = 'written_name_validation_v2.csv'
    TEST_CSV = 'written_name_test_v2.csv'
    TRAIN_IMG_DIR = 'train_v2/train'
    VAL_IMG_DIR = 'validation_v2/validation'
    TEST_IMG_DIR = 'test_v2/test'

    # Model checkpoints - saved to Google Drive
    CHECKPOINT_DIR = '/content/drive/MyDrive/handwritten-names/checkpoints'

    # Image settings
    IMG_HEIGHT = 128
    IMG_WIDTH = 512
    NUM_CHANNELS = 1

    # Character vocabulary
    CHARS = ' ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
    BLANK_LABEL = len(CHARS)
    NUM_CLASSES = len(CHARS) + 1

    # Model selection
    USE_TRANSFORMER = True  # Toggle between CRNN and Transformer

    # CRNN Model architecture (not in use - fallback option)
    # Uncomment these if USE_TRANSFORMER = False
    # CNN_OUTPUT_CHANNELS = 512
    # RNN_HIDDEN_SIZE = 256
    # RNN_NUM_LAYERS = 2
    # RNN_DROPOUT = 0.2

    # Transformer architecture settings (active by default)
    PATCH_SIZE = 64  # Size of each patch (64x64)
    EMBED_DIM = 256  # Embedding dimension
    TRANSFORMER_LAYERS = 6  # Number of encoder layers
    TRANSFORMER_HEADS = 8   # Number of attention heads
    TRANSFORMER_DIM_FF = 1024  # Feed-forward dimension
    TRANSFORMER_DROPOUT = 0.1

    # Training hyperparameters
    BATCH_SIZE = 32
    NUM_EPOCHS = 50
    LEARNING_RATE = 0.001
    WEIGHT_DECAY = 1e-5

    # Device - will use GPU on Colab
    DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

    # DataLoader
    NUM_WORKERS = 2
    PIN_MEMORY = True

    # Early stopping
    PATIENCE = 5

    # Logging
    PRINT_FREQ = 100
    SAVE_FREQ = 5  # Save every 5 epochs
"""

with open('config.py', 'w') as f:
    f.write(config_updates)

print("✓ Config updated for Colab")
print(f"✓ Using {'Transformer' if 'USE_TRANSFORMER = True' in config_updates else 'CRNN'} architecture")

In [None]:
# Verify GPU is available
import torch
print(f"GPU Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU Name: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")
else:
    print("⚠️ WARNING: No GPU detected! Training will be very slow.")
    print("Go to Runtime > Change runtime type > Select GPU")

## Training

This will train the model and save checkpoints to your Google Drive.
- **Model:** Vision Transformer with patch embeddings (~5.8M parameters)
- **Dataset:** Downloaded from Kaggle (in Colab's temp storage)
- **Checkpoints:** Saved to `MyDrive/handwritten-names/checkpoints/best_model.pth`
- **Training time:** 1-3 hours depending on GPU (Transformer is faster than CRNN due to parallelization)
- You can close your browser - training continues in background

**Note:** To switch to CRNN architecture (~15M parameters), change `USE_TRANSFORMER = True` to `False` in the config cell above and re-run it before training.

In [None]:
# Start training
!python train.py

## Evaluation (Optional)

After training completes, evaluate the model on the validation set

In [None]:
# Evaluate the trained model
!python evaluate.py

## Test Single Image (Optional)

Test the model on a single image

In [None]:
# Test on a single image
!python inference.py --image /content/data/test_v2/test/TEST_00001.jpg

## Download Model Weights (Optional)

Your model is already saved in Google Drive, but you can also download it directly

In [None]:
# Download best model to your local machine
from google.colab import files
files.download('/content/drive/MyDrive/handwritten-names/checkpoints/best_model.pth')

## View Training History (Optional)

In [None]:
# Plot training curves
import pandas as pd
import matplotlib.pyplot as plt

history = pd.read_csv('/content/drive/MyDrive/handwritten-names/checkpoints/training_history.csv')

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Loss
axes[0, 0].plot(history['train_loss'], label='Train Loss')
axes[0, 0].plot(history['val_loss'], label='Val Loss')
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Loss')
axes[0, 0].set_title('Loss')
axes[0, 0].legend()
axes[0, 0].grid(True)

# CER
axes[0, 1].plot(history['train_cer'], label='Train CER')
axes[0, 1].plot(history['val_cer'], label='Val CER')
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('CER')
axes[0, 1].set_title('Character Error Rate')
axes[0, 1].legend()
axes[0, 1].grid(True)

# WER
axes[1, 0].plot(history['train_wer'], label='Train WER')
axes[1, 0].plot(history['val_wer'], label='Val WER')
axes[1, 0].set_xlabel('Epoch')
axes[1, 0].set_ylabel('WER')
axes[1, 0].set_title('Word Error Rate')
axes[1, 0].legend()
axes[1, 0].grid(True)

# Accuracy
axes[1, 1].plot(history['train_acc'], label='Train Accuracy')
axes[1, 1].plot(history['val_acc'], label='Val Accuracy')
axes[1, 1].set_xlabel('Epoch')
axes[1, 1].set_ylabel('Accuracy')
axes[1, 1].set_title('Accuracy')
axes[1, 1].legend()
axes[1, 1].grid(True)

plt.tight_layout()
plt.show()

print(f"Best validation CER: {history['val_cer'].min():.4f}")
print(f"Best validation accuracy: {history['val_acc'].max():.4f}")