# MobileTeXOCR: Train HME Recognition on Google Colab

This notebook trains the Handwritten Mathematical Expression (HME) recognition model.

**Before running:**
1. Go to Runtime → Change runtime type → Select **T4 GPU** (or better)
2. Run all cells in order

**Model Variants:**
| Variant | Training Time | Model Size | Expected Accuracy |
|---------|--------------|------------|-------------------|
| Ultra-light | ~2-3 hours | <10MB | ~55% ExpRate |
| Balanced | ~6-8 hours | 15-30MB | ~62% ExpRate |
| Accuracy | ~12-16 hours | 50-80MB | ~65% ExpRate |


In [None]:
# Check GPU availability
!nvidia-smi


## 1. Install PaddlePaddle


In [None]:
# Install PaddlePaddle GPU version
%pip install paddlepaddle-gpu==2.6.1 -i https://pypi.tuna.tsinghua.edu.cn/simple

# Verify installation
import paddle
print(f"PaddlePaddle version: {paddle.__version__}")
print(f"GPU available: {paddle.device.is_compiled_with_cuda()}")
print(f"GPU count: {paddle.device.cuda.device_count()}")


## 2. Clone Repository & Install Dependencies


In [None]:
# Option A: Clone from GitHub (update with your repo URL)
# !git clone https://github.com/YOUR_USERNAME/MobileTeXOCR.git
# %cd MobileTeXOCR

# Option B: Upload zip file from local machine
from google.colab import files
print("Upload MobileTeXOCR.zip (create it with: zip -r MobileTeXOCR.zip MobileTeXOCR/)")
uploaded = files.upload()

!unzip -q MobileTeXOCR.zip
%cd MobileTeXOCR


In [None]:
# Install dependencies
%pip install -q -r requirements.txt
%pip install -q visualdl shapely pyclipper lmdb


## 3. Download CROHME Dataset


In [None]:
!python tools/download_hme_datasets.py --dataset crohme --data_dir ./train_data


## 4. Select Model Variant & Train


In [None]:
# Choose your model variant: "ultralight", "balanced", or "accuracy"
MODEL_VARIANT = "balanced"

config_map = {
    "ultralight": "configs/rec/hme_latex_ocr_ultralight.yml",
    "balanced": "configs/rec/hme_latex_ocr_balanced.yml",
    "accuracy": "configs/rec/hme_latex_ocr_accuracy.yml",
}

CONFIG_PATH = config_map[MODEL_VARIANT]
print(f"Training {MODEL_VARIANT} model with config: {CONFIG_PATH}")


In [None]:
# Start training!
!python tools/train.py -c {CONFIG_PATH}


## 5. Export & Download Trained Model


In [None]:
# Find best checkpoint and export
import os
output_dir = f"./output/rec/hme_{MODEL_VARIANT}/"
best_ckpt = os.path.join(output_dir, "best_accuracy")

# Export to inference model
!python tools/export_model.py -c {CONFIG_PATH} \
    -o Global.checkpoints={best_ckpt} \
    Global.save_inference_dir=./inference/hme_{MODEL_VARIANT}/

# Check model size
!du -sh ./inference/hme_{MODEL_VARIANT}/


In [None]:
# Zip and download trained model
!zip -r hme_model.zip ./output/rec/hme_{MODEL_VARIANT}/ ./inference/hme_{MODEL_VARIANT}/

from google.colab import files
files.download('hme_model.zip')
