# 🚀 Enhanced Khmer OCR Hyperparameter Tuning

**✨ NEW FEATURES:**
- 💾 **Google Drive Integration** - No more lost models!
- 🔄 **Resumable Training** - Continue from where you left off
- 📊 **Persistent Results** - All data saved to Drive
- 🛡️ **Crash Recovery** - Automatically resume after disconnection

## 📋 Quick Start:
1. ✅ Enable GPU: Runtime → Change runtime type → GPU
2. ✅ Run all cells in order
3. ✅ Results automatically saved to Drive!


In [None]:
# 🔧 Initial Setup
import torch
import sys
import os
from google.colab import drive

print(f"🐍 Python: {sys.version}")
print(f"🔥 PyTorch: {torch.__version__}")
print(f"⚡ CUDA: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"🎮 GPU: {torch.cuda.get_device_name(0)}")
    print(f"💾 GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")

# Install dependencies
!pip install -q opencv-python-headless albumentations tensorboard efficientnet-pytorch Pillow pyyaml tqdm
print("✅ Dependencies installed!")

In [None]:
# 💾 Google Drive Integration
from google.colab import drive
import shutil

print("🔐 Mounting Google Drive...")
drive.mount('/content/drive')

# Setup Drive paths
DRIVE_ROOT = '/content/drive/MyDrive'
PROJECT_DRIVE_PATH = f'{DRIVE_ROOT}/Khmer_OCR_Experiments'
MODELS_DRIVE_PATH = f'{PROJECT_DRIVE_PATH}/training_output'
RESULTS_DRIVE_PATH = f'{PROJECT_DRIVE_PATH}/results'

# Create directories
for path in [PROJECT_DRIVE_PATH, MODELS_DRIVE_PATH, RESULTS_DRIVE_PATH]:
    os.makedirs(path, exist_ok=True)

print(f"✅ Drive mounted: {PROJECT_DRIVE_PATH}")
print(f"🏗️ Models: {MODELS_DRIVE_PATH}")
print(f"📊 Results: {RESULTS_DRIVE_PATH}")

# Create symlinks
for link in ['drive_training_output', 'drive_results']:
    if os.path.exists(link):
        os.unlink(link)
        
os.symlink(MODELS_DRIVE_PATH, 'drive_training_output')
os.symlink(RESULTS_DRIVE_PATH, 'drive_results')
print("🔗 Symlinks created")

## 📁 Upload Project Files

**Option 1: Upload ZIP file** (Recommended)
- Compress your entire project folder
- Upload and extract using the cell below

**Option 2: Clone from GitHub** (if your project is on GitHub)
- Use the git clone cell below

In [None]:
# Option 1: Upload ZIP file
from google.colab import files
import zipfile

print("📁 Upload your project ZIP file:")
uploaded = files.upload()

# Extract the uploaded ZIP
for filename in uploaded.keys():
    if filename.endswith('.zip'):
        print(f"📦 Extracting {filename}...")
        with zipfile.ZipFile(filename, 'r') as zip_ref:
            zip_ref.extractall('.')
        print(f"✅ Extracted {filename}")

# List contents to verify
print("\n📂 Current directory contents:")
!ls -la

In [None]:
# Option 2: Clone from GitHub (uncomment and modify URL)
# !git clone https://github.com/yourusername/khmer-ocr-digits.git
# %cd khmer-ocr-digits
# print("✅ Project cloned from GitHub")

In [None]:
# Setup project paths
import sys
from pathlib import Path

# Find project root (adjust path if needed)
project_root = None
for root in ['.', './khmer-ocr-digits', '../']:
    if os.path.exists(os.path.join(root, 'src')):
        project_root = Path(root).resolve()
        break

if project_root:
    os.chdir(project_root)
    sys.path.append(str(project_root / 'src'))
    print(f"✅ Project root: {project_root}")
else:
    print("❌ Could not find project root. Please check your upload.")
    !ls -la

## ✅ Verify Setup

In [None]:
# Test imports to verify everything is working
try:
    from modules.data_utils import KhmerDigitsDataset
    from models import create_model
    from modules.trainers import OCRTrainer
    from modules.trainers.utils import setup_training_environment, TrainingConfig
    print("✅ All imports successful!")
except ImportError as e:
    print(f"❌ Import error: {e}")
    print("Please check your project structure and file uploads.")

In [None]:
# Check if data files exist
required_files = [
    'generated_data/metadata.yaml',
    'config/phase3_training_configs.yaml',
    'config/model_config.yaml'
]

print("📋 Checking required files:")
all_files_exist = True
for file_path in required_files:
    if os.path.exists(file_path):
        print(f"✅ {file_path}")
    else:
        print(f"❌ {file_path} - NOT FOUND")
        all_files_exist = False

if all_files_exist:
    print("\n🎉 All required files found! Ready to start training.")
else:
    print("\n⚠️ Some files are missing. Please check your upload.")

## 🔍 Check for Resumable Experiments

In [None]:
# Check for existing experiments that can be resumed
import glob
import json
from IPython.display import display, HTML

def find_resumable_experiments():
    """Find experiments that can be resumed."""
    resumable = []
    exp_dirs = glob.glob(f'{MODELS_DRIVE_PATH}/*')
    
    for exp_dir in exp_dirs:
        if os.path.isdir(exp_dir):
            exp_name = os.path.basename(exp_dir)
            checkpoint_dir = os.path.join(exp_dir, 'checkpoints')
            
            if os.path.exists(checkpoint_dir):
                checkpoints = glob.glob(f'{checkpoint_dir}/checkpoint_epoch_*.pth')
                if checkpoints:
                    latest_checkpoint = sorted(checkpoints)[-1]
                    epoch_num = int(latest_checkpoint.split('_epoch_')[1].split('.pth')[0])
                    
                    # Check config for total epochs
                    total_epochs = 50  # Default
                    config_file = os.path.join(exp_dir, 'config.yaml')
                    if os.path.exists(config_file):
                        try:
                            import yaml
                            with open(config_file, 'r') as f:
                                config = yaml.safe_load(f)
                                total_epochs = config.get('num_epochs', 50)
                        except:
                            pass
                    
                    is_complete = epoch_num >= total_epochs
                    
                    resumable.append({
                        'experiment_name': exp_name,
                        'latest_epoch': epoch_num,
                        'total_epochs': total_epochs,
                        'is_complete': is_complete,
                        'checkpoint_path': latest_checkpoint,
                        'experiment_dir': exp_dir,
                        'progress': f"{epoch_num}/{total_epochs}"
                    })
    
    return resumable

def display_resumable_table(experiments):
    """Display resumable experiments in HTML table."""
    if not experiments:
        print("📂 No previous experiments found in Drive.")
        return
        
    html = """
    <div style="border: 2px solid #2196F3; padding: 15px; margin: 10px 0; border-radius: 10px; background: #f8f9fa;">
        <h3>🔄 Found Resumable Experiments</h3>
        <table style="width: 100%; border-collapse: collapse; border: 1px solid #ddd;">
            <thead>
                <tr style="background: #2196F3; color: white;">
                    <th style="border: 1px solid #ddd; padding: 8px;">Experiment</th>
                    <th style="border: 1px solid #ddd; padding: 8px;">Progress</th>
                    <th style="border: 1px solid #ddd; padding: 8px;">Status</th>
                </tr>
            </thead>
            <tbody>
    """
    
    for exp in experiments:
        status = "✅ Complete" if exp['is_complete'] else "🔄 Resumable"
        bg_color = "#e8f5e8" if exp['is_complete'] else "#fff3cd"
        
        html += f"""
            <tr style="background: {bg_color};">
                <td style="border: 1px solid #ddd; padding: 8px;">{exp['experiment_name']}</td>
                <td style="border: 1px solid #ddd; padding: 8px; text-align: center;">{exp['progress']}</td>
                <td style="border: 1px solid #ddd; padding: 8px; text-align: center;">{status}</td>
            </tr>
        """
    
    html += """
            </tbody>
        </table>
        <p style="margin-top: 10px; color: #666; font-style: italic;">
            💡 Incomplete experiments will automatically resume from their last checkpoint.
        </p>
    </div>
    """
    
    display(HTML(html))

# Find and display resumable experiments
resumable_experiments = find_resumable_experiments()
display_resumable_table(resumable_experiments)

# Save for later use
with open('resumable_experiments.json', 'w') as f:
    json.dump(resumable_experiments, f, indent=2)

print(f"💾 Found {len(resumable_experiments)} existing experiments")

## 🚀 Enhanced Hyperparameter Tuning

**⚠️ IMPORTANT:** For the complete enhanced functionality, use the enhanced script instead:

```python
# Load the complete enhanced trainer
exec(open('src/sample_scripts/enhanced_colab_trainer.py').read())

# Use enhanced tuner
tuner = EnhancedColabHyperparameterTuner(
    drive_models_path=MODELS_DRIVE_PATH,
    drive_results_path=RESULTS_DRIVE_PATH
)
tuner.run_experiments()
```

Below is a simplified version for demonstration:

In [None]:
# Load the complete enhanced trainer script
try:
    print("🔧 Loading enhanced trainer script...")
    exec(open('src/sample_scripts/enhanced_colab_trainer.py').read())
    print("✅ Enhanced trainer loaded successfully!")
    
    # Initialize the enhanced tuner
    tuner = EnhancedColabHyperparameterTuner(
        drive_models_path=MODELS_DRIVE_PATH,
        drive_results_path=RESULTS_DRIVE_PATH,
        auto_resume=True
    )
    
    print("\n🎯 Enhanced tuner ready! Usage options:")
    print("\n1. Run all experiments:")
    print("   tuner.run_experiments()")
    print("\n2. Run specific experiments:")
    print("   tuner.run_experiments(['conservative_small', 'baseline_optimized'])")
    print("\n3. Quick test (single experiment):")
    print("   tuner.run_experiments(['conservative_small'])")
    
except FileNotFoundError:
    print("❌ Enhanced trainer script not found.")
    print("Please ensure 'src/sample_scripts/enhanced_colab_trainer.py' exists in your project.")
    print("\n💡 You can still use the basic functionality below.")
except Exception as e:
    print(f"❌ Error loading enhanced trainer: {e}")
    print("\n💡 You can still use the basic functionality below.")

## 🏃‍♂️ Quick Start Training

**If enhanced script loaded successfully, run this:**

In [None]:
# Quick test with one experiment (if enhanced tuner is available)
if 'tuner' in globals():
    print("🧪 Running quick test with conservative_small configuration...")
    print("This will automatically resume if the experiment was previously started.")
    
    # Run just one experiment for testing
    tuner.run_experiments(['conservative_small'])
    
    # Save results
    results_file = tuner.save_results()
    print(f"\n💾 Results saved to: {results_file}")
    
else:
    print("⚠️ Enhanced tuner not available. Please load the enhanced script first.")

In [None]:
# Run all experiments (if enhanced tuner is available)
if 'tuner' in globals():
    print("🚀 Starting full hyperparameter tuning...")
    print("⏰ This will take 45-60 minutes with GPU acceleration.")
    print("💡 Training will automatically resume if interrupted.")
    
    # Run all experiments with auto-resume
    tuner.run_experiments()
    
    # Save final results
    results_file = tuner.save_results()
    
    print(f"\n🎉 Hyperparameter tuning completed!")
    if tuner.best_result:
        print(f"🏆 Best result: {tuner.best_result['experiment_name']}")
        print(f"📊 Character accuracy: {tuner.best_result['best_val_char_accuracy']:.1%}")
        print(f"📊 Sequence accuracy: {tuner.best_result['best_val_seq_accuracy']:.1%}")
    
    print(f"\n💾 Results saved to: {results_file}")
    print(f"📁 Models saved to: {MODELS_DRIVE_PATH}")
    
else:
    print("⚠️ Enhanced tuner not available. Please load the enhanced script first.")

## 🔧 Basic Fallback (If Enhanced Script Not Available)

This provides basic functionality without the enhanced features:

In [None]:
# Basic trainer fallback (without enhanced features)
import yaml
import json
import logging
from datetime import datetime

# Only run if enhanced tuner is not available
if 'tuner' not in globals():
    print("🔧 Setting up basic trainer...")
    
    # Load config
    with open('config/phase3_training_configs.yaml', 'r') as f:
        config = yaml.safe_load(f)
    
    print(f"📋 Found {len(config['experiments'])} experiments:")
    for exp_name in config['experiments'].keys():
        print(f"  - {exp_name}")
    
    print("\n💡 To run experiments manually:")
    print("1. Load the enhanced script: exec(open('src/sample_scripts/enhanced_colab_trainer.py').read())")
    print("2. Or use the original notebook: colab_hyperparameter_tuning.ipynb")
    
else:
    print("✅ Enhanced tuner is available - no need for basic fallback.")

## 📥 Download Results

In [None]:
# Download results and model checkpoints
from google.colab import files
import zipfile
from datetime import datetime

# Find result files
result_files = glob.glob(f"{RESULTS_DRIVE_PATH}/colab_hyperparameter_results_*.json")
checkpoint_dirs = glob.glob(f"{MODELS_DRIVE_PATH}/*")

print("📁 Available for download:")
print(f"\n📊 Results files: {len(result_files)}")
for f in result_files:
    print(f"  - {os.path.basename(f)}")

print(f"\n🏗️ Experiment directories: {len(checkpoint_dirs)}")
for d in checkpoint_dirs:
    if os.path.isdir(d):
        print(f"  - {os.path.basename(d)}/")

# Download latest results
if result_files:
    latest_results = sorted(result_files)[-1]
    print(f"\n📥 Downloading latest results: {os.path.basename(latest_results)}")
    files.download(latest_results)

# Create and download summary zip
if checkpoint_dirs:
    zip_filename = f"khmer_ocr_training_summary_{datetime.now().strftime('%Y%m%d_%H%M%S')}.zip"
    
    with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
        # Add result files
        for f in result_files:
            zipf.write(f, os.path.basename(f))
        
        # Add best models only to save space
        for exp_dir in checkpoint_dirs:
            if os.path.isdir(exp_dir):
                exp_name = os.path.basename(exp_dir)
                best_model = os.path.join(exp_dir, 'checkpoints', 'best_model.pth')
                if os.path.exists(best_model):
                    zipf.write(best_model, f"{exp_name}_best_model.pth")
    
    print(f"\n📦 Downloading summary package: {zip_filename}")
    files.download(zip_filename)

print("\n✅ Download completed!")
print(f"🔗 All files remain available in your Drive: {PROJECT_DRIVE_PATH}")

## 📋 Summary & Next Steps

### 🎉 What You've Accomplished:
- ✅ **Google Drive Integration**: All models and results safely stored
- ✅ **Resumable Training**: Can continue after any disconnection
- ✅ **GPU Acceleration**: ~10x faster training than CPU
- ✅ **Persistent Results**: Everything saved permanently to Drive

### 📊 Expected Results:
- **Best Configuration**: conservative_small
- **Expected Character Accuracy**: 45-60% (with full epochs)
- **Target Goal**: 85% character accuracy

### 🔄 If Training Was Interrupted:
Simply re-run the notebook - it will automatically detect and resume incomplete experiments!

### 🚀 Next Steps:
1. **Analyze Results**: Check which configuration performed best
2. **Fine-tune**: Run refined experiments around the best parameters
3. **Deploy**: Use the best model for production OCR tasks
4. **Share**: Your Drive folder can be shared with collaborators

### 📁 Your Drive Structure:
```
📁 MyDrive/Khmer_OCR_Experiments/
├── 📁 training_output/     (All model checkpoints)
└── 📁 results/            (JSON results files)
```

🎯 **Happy Training with Enhanced Colab!** 🚀
