# OCR Model Fine-Tuning on Google Colab

This notebook walks through the process of fine-tuning a TrOCR model on the IAM Handwriting dataset using Google Colab's T4 GPU.

**Project Overview:** We'll fine-tune a vision-encoder-decoder model for optical character recognition (OCR) on handwritten text. The model combines a vision transformer (ViT) as encoder and a language model as decoder.

**Make sure you have GPU acceleration enabled!**
To check: Runtime > Change runtime type > Hardware accelerator > GPU

In [None]:
# Check for GPU availability
!nvidia-smi

import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    device_name = torch.cuda.get_device_name(0)
    print(f"GPU: {device_name}")
    
    # Print total GPU memory
    total_memory = torch.cuda.get_device_properties(0).total_memory / (1024**3)  # Convert to GB
    print(f"Total GPU memory: {total_memory:.2f} GB")
else:
    print("No GPU available. Please enable GPU acceleration in Runtime > Change runtime type.")

## Step 1: Mount Google Drive

Mount your Google Drive to save model checkpoints and results. This ensures your trained model persists after the Colab session ends.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# Create directories for saving model outputs
!mkdir -p /content/drive/MyDrive/ocr_output
!mkdir -p /content/drive/MyDrive/ocr_output/checkpoints
!mkdir -p /content/drive/MyDrive/ocr_output/final_model
!mkdir -p /content/drive/MyDrive/ocr_output/evaluation

print("Google Drive mounted. Model checkpoints and results will be saved to /content/drive/MyDrive/ocr_output")

## Step 2: Get the Code

There are two options to get the code:

1. Clone from GitHub repository (preferred)
2. Upload required files directly to Colab

In [None]:
# Option 1: Clone from GitHub
# Replace with your actual repository URL
!git clone https://github.com/yourusername/ocr-finetuning.git
# Change to project directory
%cd ocr-finetuning

**Option 2: Upload files directly**

If you don't have a GitHub repository, you can upload the files directly to Colab. You'll need to upload the following files:
- `config.py`: Configuration settings
- `data_utils.py`: Data handling utilities
- `model.py`: Model definition
- `train.py`: Training script
- `evaluate.py`: Evaluation script
- `demo.py`: Inference script
- `setup_colab.py`: Colab setup script
- `verify_setup.py`: Verification script
- `run_test.py`: Test script

**IMPORTANT**: Only run the code cell below if you're uploading files manually. If you cloned from GitHub, skip this cell.

In [None]:
# Option 2: Upload files directly (only run if you didn't clone from GitHub)
# This will allow you to upload files
from google.colab import files

print("Please upload the project files. You can select multiple files at once.")
uploaded = files.upload()

print("Files uploaded:")
for filename in uploaded.keys():
    print(f"- {filename}")

## Step 3: Install Dependencies

Install the required packages for the project. We'll use the setup_colab.py script to handle this.

In [None]:
!python setup_colab.py

## Step 4: Verify Setup

Make sure everything is properly set up before proceeding with training.

In [None]:
!python verify_setup.py

## Step 5: Run a Quick Test

Before starting the full training, run a quick test with a small subset of data to make sure everything works.

In [None]:
!python run_test.py --num_samples 5 --batch_size 2 --num_epochs 1

## Step 6: Start Training

Now that we've verified everything is working, we can start the full training process.

**Note**: You may want to edit `config.py` to adjust training parameters for your needs before starting full training.

In [None]:
# View current config
!cat config.py

In [None]:
# Start training
# Set use_wandb to True if you want to track the training with Weights & Biases
!python train.py --use_wandb False

## Step 7: Evaluate the Model

After training, evaluate the model to assess its performance.

In [None]:
!python evaluate.py --model_path ./model_output --visualize --save_results

## Step 8: Run the Demo

Test the model on some sample images to see how it performs on real-world tasks.

In [None]:
# Download a sample image from IAM dataset
!mkdir -p sample_images
!wget https://fki.tic.heia-fr.ch/static/img/a01-122-02.jpg -O sample_images/sample1.jpg

# Display the image
from PIL import Image
import matplotlib.pyplot as plt

img = Image.open('sample_images/sample1.jpg')
plt.figure(figsize=(10, 4))
plt.imshow(img)
plt.axis('off')
plt.title('Sample Image')
plt.show()

In [None]:
# Run demo on the sample image
!python demo.py --image_path sample_images/sample1.jpg --use_beam_search

## Colab Tips

### Preventing Disconnection
Google Colab may disconnect after a period of inactivity. To prevent this, you can use the code below which keeps the session active.

**Note**: Only use this when running long training sessions and when you're actively monitoring the notebook, as it consumes resources.

### Managing RAM
If you're experiencing out-of-memory errors, you can clear the runtime memory with the following code.

### Session Duration
Remember that Colab sessions have a limited duration (usually 12 hours). Save your model checkpoints to Google Drive regularly.

In [None]:
# Prevent Colab from disconnecting due to inactivity
# Only run this cell when necessary (during long training runs)

from IPython.display import display, Javascript
import time

def keep_alive():
    display(Javascript('''
    function click() {
        document.querySelector("colab-toolbar-button#connect").click()
    }
    setInterval(click, 60000)
    '''))

keep_alive()  # This will automatically click the "Connect" button every 60 seconds

In [None]:
# Clear memory if needed
import gc
import torch

gc.collect()
torch.cuda.empty_cache()
print("Memory cleared!")