# XiYan-SQL Training on Google Colab

This notebook provides a complete step-by-step guide to train the XiYan-SQL model on Google Colab.

## Prerequisites
- Upload your model files to Google Drive (e.g., `Qwen2.5-Coder-3B-Instruct` folder)
- Upload your dataset files to Google Drive (raw data, processed data, or both)
- Enable GPU runtime in Colab (Runtime ‚Üí Change runtime type ‚Üí GPU)

## Step 1: Install Dependencies

Install all required packages for XiYan-SQL training.

In [None]:
# Install system dependencies
!apt-get update -qq
!apt-get install -y -qq libaio-dev  # Required for DeepSpeed

# Install Python packages
!pip install -q accelerate>=1.12.0
!pip install -q datasets>=3.0.0
!pip install -q deepspeed>=0.18.4
!pip install -q llama-index>=0.9.6.post2
!pip install -q markupsafe==2.1.3  # Pin to <3.0
!pip install -q modelscope>=1.33.0
!pip install -q mysql-connector-python>=9.5.0
!pip install -q ninja>=1.13.0
!pip install -q "numpy>=1.23.0,<2.0"
!pip install -q packaging>=24.1
!pip install -q pandas>=2.3.3
!pip install -q peft==0.11.1
!pip install -q "protobuf>=6.33.3"
!pip install -q psycopg2-binary>=2.9.11
!pip install -q sentencepiece>=0.2.1
!pip install -q setuptools>=70.2.0
!pip install -q sqlalchemy>=2.0.45
!pip install -q sqlglot>=28.5.0
!pip install -q swanlab>=0.7.6
!pip install -q textdistance>=4.6.3
!pip install -q "torch==2.9.0" --index-url https://download.pytorch.org/whl/cu126
!pip install -q "torchaudio==2.9.0" --index-url https://download.pytorch.org/whl/cu126
!pip install -q "torchvision==0.24.0" --index-url https://download.pytorch.org/whl/cu126
!pip install -q transformers==4.42.3
!pip install -q wheel>=0.45.1

# Install flash-attn (optional, for faster attention)
# Note: This may take a while to compile
try:
    !pip install -q flash-attn --no-build-isolation
    print("‚úÖ flash-attn installed successfully")
except:
    print("‚ö†Ô∏è  flash-attn installation failed, continuing without it")

print("\n‚úÖ All dependencies installed!")

## Step 2: Clone Repository

Clone the XiYan-SQL repository to Colab.

In [None]:
# Change to content directory
import os
import sys
os.chdir('/content')

# Clone the repository
# Replace with your repository URL
REPO_URL = "https://github.com/rezaarrazi/XiYan-SQL.git"  # ‚ö†Ô∏è UPDATE THIS

if not os.path.exists('XiYan-SQL'):
    os.system(f'git clone {REPO_URL}')
    print("‚úÖ Repository cloned successfully")
else:
    print("‚úÖ Repository already exists")

# Navigate to training directory
os.chdir('XiYan-SQL/XiYan-SQLTraining')

# Add to Python path so imports work correctly
TRAINING_DIR = os.getcwd()
if TRAINING_DIR not in sys.path:
    sys.path.insert(0, TRAINING_DIR)
if os.path.dirname(TRAINING_DIR) not in sys.path:
    sys.path.insert(0, os.path.dirname(TRAINING_DIR))

print(f"\nüìÅ Current directory: {os.getcwd()}")
print(f"‚úÖ Python path configured")

## Step 3: Mount Google Drive

Mount your Google Drive to access model and dataset files.

In [None]:
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

print("‚úÖ Google Drive mounted successfully")
print("\nüìÇ Drive path: /content/drive/MyDrive")

## Step 4: Copy Model from Google Drive

Copy your pre-downloaded model from Google Drive to the model directory.

**Configured Path:** `My Drive/Xiyan-SQL/Models/Qwen/`

The script will automatically detect and copy the model folder(s) from this location.

In [None]:
import shutil
import os

# Path to your model in Google Drive
MODEL_DRIVE_PATH = "/content/drive/MyDrive/Xiyan-SQL/Models/Qwen"

# Target directory in the repository
MODEL_TARGET_DIR = "train/model/Qwen"

# Create target directory if it doesn't exist
os.makedirs(MODEL_TARGET_DIR, exist_ok=True)

# Check if model directory exists in Drive
if os.path.exists(MODEL_DRIVE_PATH):
    print(f"üì• Found model directory at {MODEL_DRIVE_PATH}")
    
    # List contents to see what's inside
    contents = os.listdir(MODEL_DRIVE_PATH)
    print(f"üìÅ Contents: {contents}")
    
    # Check if it's a single model folder or contains multiple model folders
    model_folders = [item for item in contents if os.path.isdir(os.path.join(MODEL_DRIVE_PATH, item))]
    
    if len(model_folders) == 1:
        # Single model folder - copy it directly
        model_name = model_folders[0]
        source_path = os.path.join(MODEL_DRIVE_PATH, model_name)
        target_path = os.path.join(MODEL_TARGET_DIR, model_name)
        
        if os.path.exists(target_path):
            print(f"‚ö†Ô∏è  Model already exists at {target_path}")
            print("Skipping copy (delete manually if you want to re-copy)")
        else:
            print(f"üì• Copying model '{model_name}' from {source_path}...")
            shutil.copytree(source_path, target_path)
            print(f"‚úÖ Model copied to {target_path}")
        
        MODEL_PATH = target_path
    else:
        # Multiple folders or files - copy the entire Qwen directory
        target_path = MODEL_TARGET_DIR
        if os.path.exists(target_path) and os.listdir(target_path):
            print(f"‚ö†Ô∏è  Model directory already exists at {target_path}")
            print("Skipping copy (delete manually if you want to re-copy)")
        else:
            print(f"üì• Copying all models from {MODEL_DRIVE_PATH}...")
            for item in contents:
                source_item = os.path.join(MODEL_DRIVE_PATH, item)
                target_item = os.path.join(target_path, item)
                if os.path.isdir(source_item):
                    if not os.path.exists(target_item):
                        shutil.copytree(source_item, target_item)
                        print(f"  ‚úÖ Copied {item}")
                else:
                    if not os.path.exists(target_item):
                        shutil.copy2(source_item, target_item)
                        print(f"  ‚úÖ Copied {item}")
            print(f"‚úÖ All models copied to {target_path}")
        
        # Set MODEL_PATH to the first model folder found, or let user specify
        if model_folders:
            MODEL_PATH = os.path.join(MODEL_TARGET_DIR, model_folders[0])
            print(f"\nüìå Using model: {MODEL_PATH}")
            print(f"üí° If you want to use a different model, update MODEL_PATH in Step 7")
        else:
            MODEL_PATH = MODEL_TARGET_DIR
            print(f"\nüìå Model directory: {MODEL_PATH}")
            print(f"üí° Please specify the exact model folder name in Step 7")
    
    print(f"\nüìå Model path for training: {MODEL_PATH}")
else:
    print(f"‚ùå Model not found at {MODEL_DRIVE_PATH}")
    print("\nPlease check:")
    print("1. Google Drive is mounted correctly")
    print("2. The path 'My Drive/Xiyan-SQL/Models/Qwen/' exists in your Drive")
    MODEL_PATH = None

## Step 5: Verify Training Dataset

The English training dataset should already be in the repository (via Git LFS).

**Expected file:** `train/datasets/nl2sql_standard_train_en.json` (55MB)

If the file is not present, you can download it from Google Drive as a backup.

In [None]:
import os
import json

# Check if training dataset exists in repository
TRAIN_DATASET_PATH = "train/datasets/nl2sql_standard_train_en.json"

if os.path.exists(TRAIN_DATASET_PATH):
    print(f"‚úÖ Training dataset found in repository!")
    print(f"   Path: {TRAIN_DATASET_PATH}")
    
    size_mb = os.path.getsize(TRAIN_DATASET_PATH) / (1024 * 1024)
    print(f"   Size: {size_mb:.1f} MB")
    
    # Quick verification
    with open(TRAIN_DATASET_PATH, 'r') as f:
        data = json.load(f)
        print(f"   Samples: {len(data)}")
        
        # Check if English
        if data and data[0].get('conversations'):
            prompt = data[0]['conversations'][0]['content']
            if prompt.startswith("You are a SQLite expert"):
                print(f"   Language: ‚úÖ English")
            else:
                print(f"   Language: ‚ö†Ô∏è Not English")
    
    print("\nüéâ Ready to start training! Skip to Step 6.")
    
else:
    print(f"‚ö†Ô∏è  Training dataset not found in repository")
    print(f"   Expected: {TRAIN_DATASET_PATH}")
    print("\nüì• Attempting to download from Google Drive as backup...")
    
    # Backup: Download from Google Drive
    DRIVE_DATASET_PATH = "/content/drive/MyDrive/Xiyan-SQL/Dataset/nl2sql_standard_train_en.json"
    
    if os.path.exists(DRIVE_DATASET_PATH):
        import shutil
        os.makedirs("train/datasets", exist_ok=True)
        shutil.copy2(DRIVE_DATASET_PATH, TRAIN_DATASET_PATH)
        print(f"‚úÖ Dataset copied from Google Drive")
        
        size_mb = os.path.getsize(TRAIN_DATASET_PATH) / (1024 * 1024)
        print(f"   Size: {size_mb:.1f} MB")
    else:
        print(f"‚ùå Dataset not found in Google Drive either")
        print(f"   Expected: {DRIVE_DATASET_PATH}")
        print("\nüí° Options:")
        print("1. Make sure Git LFS pulled the dataset when cloning")
        print("2. Upload nl2sql_standard_train_en.json to Google Drive")
        print("3. Or run data processing from BIRD raw data")

## Step 6: Configure Training Parameters

Optimized for Google Colab with 15GB GPU (T4 or better).

Adjust these parameters based on your GPU memory and requirements.

In [None]:
# Check available GPU memory
import subprocess
import re

try:
    result = subprocess.run(['nvidia-smi', '--query-gpu=memory.total', '--format=csv,noheader,nounits'], 
                          capture_output=True, text=True)
    gpu_memory_mb = int(result.stdout.strip())
    gpu_memory_gb = gpu_memory_mb / 1024
    print(f"üéÆ Detected GPU Memory: {gpu_memory_gb:.1f} GB")
except:
    gpu_memory_gb = 15.0  # Default assumption
    print(f"‚ö†Ô∏è  Could not detect GPU, assuming {gpu_memory_gb} GB")

# Auto-configure based on GPU memory
if gpu_memory_gb >= 14:
    # 15GB GPU (T4, L4, etc.) - Optimized settings
    MAX_LENGTH = 8192
    LORA_R = 128
    BATCH_SIZE = 2
    GRAD_ACC = 16
    print(f"üìä Using HIGH MEMORY config for {gpu_memory_gb:.1f}GB GPU")
elif gpu_memory_gb >= 10:
    # 12GB GPU (T4 with less memory) - Moderate settings
    MAX_LENGTH = 4096
    LORA_R = 64
    BATCH_SIZE = 1
    GRAD_ACC = 32
    print(f"üìä Using MEDIUM MEMORY config for {gpu_memory_gb:.1f}GB GPU")
else:
    # 8GB GPU (limited memory) - Conservative settings
    MAX_LENGTH = 2048
    LORA_R = 32
    BATCH_SIZE = 1
    GRAD_ACC = 64
    print(f"üìä Using LOW MEMORY config for {gpu_memory_gb:.1f}GB GPU")

TRAINING_CONFIG = {
    # Experiment ID
    "expr_id": "nl2sql_3b_colab_en",
    
    # Model path (set in Step 4)
    "model_path": MODEL_PATH if 'MODEL_PATH' in globals() else "train/model/Qwen/Qwen2.5-Coder-3B-Instruct",
    
    # Dataset path - Using English version
    "data_path": "train/datasets/nl2sql_standard_train_en.json",
    
    # Output directory
    "output_dir": "train/output/dense/nl2sql_3b_colab_en/",
    
    # Training hyperparameters
    "epochs": 3,  # Reduced for faster training in Colab
    "learning_rate": 2e-5,
    "weight_decay": 0.1,
    "max_length": MAX_LENGTH,
    
    # LoRA configuration
    "use_lora": True,
    "lora_r": LORA_R,
    "lora_alpha": LORA_R * 2,
    
    # Batch configuration
    "batch_size": BATCH_SIZE,
    "gradient_accumulation_steps": GRAD_ACC,
    
    # Other settings
    "save_steps": 200,
    "group_by_length": True,
    "shuffle": True,
    "use_flash_attention": True,
    "bf16": True,
}

print("\nüìã Training Configuration:")
print(f"  Experiment ID: {TRAINING_CONFIG['expr_id']}")
print(f"  Dataset: {TRAINING_CONFIG['data_path']}")
print(f"  Max Length: {TRAINING_CONFIG['max_length']} tokens")
print(f"  LoRA Rank: {TRAINING_CONFIG['lora_r']}")
print(f"  Batch Size: {TRAINING_CONFIG['batch_size']}")
print(f"  Gradient Accumulation: {TRAINING_CONFIG['gradient_accumulation_steps']}")
print(f"  Effective Batch Size: {TRAINING_CONFIG['batch_size'] * TRAINING_CONFIG['gradient_accumulation_steps']}")
print(f"  Epochs: {TRAINING_CONFIG['epochs']}")
print(f"  Learning Rate: {TRAINING_CONFIG['learning_rate']}")

print("\nüí° Estimated Training Time:")
samples = 9431
steps_per_epoch = samples // (TRAINING_CONFIG['batch_size'] * TRAINING_CONFIG['gradient_accumulation_steps'])
total_steps = steps_per_epoch * TRAINING_CONFIG['epochs']
time_per_step_sec = 3  # Conservative estimate
total_hours = (total_steps * time_per_step_sec) / 3600
print(f"  Steps per epoch: ~{steps_per_epoch}")
print(f"  Total steps: ~{total_steps}")
print(f"  Estimated time: ~{total_hours:.1f} hours")

print("\n‚ö†Ô∏è  Colab Tips:")
print("  - Free tier: 12 hour runtime limit")
print("  - Keep browser tab active to prevent disconnection")
print("  - Consider Colab Pro for longer sessions")

## Step 7: Start Training

Run the training with your optimized configuration.

import os
import subprocess
import json

# Set training directory
TRAINING_DIR = "/content/XiYan-SQL/XiYan-SQLTraining"
os.chdir(TRAINING_DIR)

# Create DeepSpeed config for single GPU
ds_config = {
    "compute_environment": "LOCAL_MACHINE",
    "distributed_type": "DEEPSPEED",
    "deepspeed_config": {
        "gradient_accumulation_steps": TRAINING_CONFIG["gradient_accumulation_steps"],
        "gradient_clipping": 1.0,
        "offload_optimizer_device": "cpu",
        "offload_param_device": "cpu",
        "zero3_init_flag": False,
        "zero3_save_16bit_model": False,
        "zero_stage": 2,
        "bf16": {
            "enabled": True
        }
    },
    "machine_rank": 0,
    "main_process_ip": None,
    "main_process_port": None,
    "num_machines": 1,
    "num_processes": 1,
    "rdzv_backend": "static",
    "same_network": True,
    "tpu_env": [],
    "tpu_use_cluster": False,
    "tpu_use_sudo": False,
    "use_cpu": False
}

# Save DeepSpeed config
os.makedirs("train/config", exist_ok=True)
ds_config_path = "train/config/colab_zero2.json"
with open(ds_config_path, 'w') as f:
    json.dump(ds_config, f, indent=2)

print("üöÄ Starting XiYan-SQL Training")
print("="*60)
print(f"üìÅ Model: {TRAINING_CONFIG['model_path']}")
print(f"üìä Dataset: {TRAINING_CONFIG['data_path']} (English)")
print(f"üíæ Output: {TRAINING_CONFIG['output_dir']}")
print(f"üéØ Effective Batch: {TRAINING_CONFIG['batch_size'] * TRAINING_CONFIG['gradient_accumulation_steps']}")
print(f"üìè Max Length: {TRAINING_CONFIG['max_length']} tokens")
print(f"üîß LoRA Rank: {TRAINING_CONFIG['lora_r']}")
print("="*60)
print("\n‚è≥ Training will take several hours...")
print("üí° Keep this tab active to prevent disconnection\n")

# Build training command
cmd = [
    "accelerate", "launch",
    "--config_file", ds_config_path,
    "--num_processes", "1",
    "train/sft4xiyan.py",
    "--save_only_model", "True",
    "--resume", "False",
    "--model_name_or_path", TRAINING_CONFIG["model_path"],
    "--data_path", TRAINING_CONFIG["data_path"],
    "--output_dir", TRAINING_CONFIG["output_dir"],
    "--num_train_epochs", str(TRAINING_CONFIG["epochs"]),
    "--per_device_train_batch_size", str(TRAINING_CONFIG["batch_size"]),
    "--gradient_accumulation_steps", str(TRAINING_CONFIG["gradient_accumulation_steps"]),
    "--save_strategy", "steps",
    "--save_steps", str(TRAINING_CONFIG["save_steps"]),
    "--save_total_limit", "3",
    "--learning_rate", str(TRAINING_CONFIG["learning_rate"]),
    "--weight_decay", str(TRAINING_CONFIG["weight_decay"]),
    "--adam_beta2", "0.95",
    "--warmup_ratio", "0.1",
    "--lr_scheduler_type", "cosine",
    "--logging_steps", "10",
    "--report_to", "none",
    "--model_max_length", str(TRAINING_CONFIG["max_length"]),
    "--lazy_preprocess", "False",
    "--gradient_checkpointing", "True",
    "--predict_with_generate", "True",
    "--include_inputs_for_metrics", "True",
    "--use_lora", str(TRAINING_CONFIG["use_lora"]),
    "--lora_r", str(TRAINING_CONFIG["lora_r"]),
    "--lora_alpha", str(TRAINING_CONFIG["lora_alpha"]),
    "--do_shuffle", str(TRAINING_CONFIG["shuffle"]),
    "--torch_compile", "False",
    "--group_by_length", str(TRAINING_CONFIG["group_by_length"]),
    "--model_type", "auto",
    "--use_flash_attention", str(TRAINING_CONFIG["use_flash_attention"]),
    "--bf16",
    "--expr_id", TRAINING_CONFIG["expr_id"]
]

# Run training
try:
    result = subprocess.run(cmd, cwd=TRAINING_DIR, check=False)
    
    if result.returncode == 0:
        print("\n" + "="*60)
        print("‚úÖ Training completed successfully!")
        print(f"üìÅ Model saved to: {TRAINING_CONFIG['output_dir']}")
        print("="*60)
    else:
        print("\n" + "="*60)
        print(f"‚ùå Training failed with return code {result.returncode}")
        print("="*60)
except Exception as e:
    print(f"\n‚ùå Error during training: {e}")

## Step 8: Save Trained Model to Google Drive (Optional)

After training completes, save your model to Google Drive for future use.

import shutil
import os

# Path to trained model
TRAINED_MODEL_PATH = TRAINING_CONFIG["output_dir"]

# Destination in Google Drive
DRIVE_SAVE_PATH = f"/content/drive/MyDrive/XiYan-SQL/Trained-Models/{TRAINING_CONFIG['expr_id']}"

if os.path.exists(TRAINED_MODEL_PATH):
    print(f"üì• Copying trained model to Google Drive...")
    print(f"   From: {TRAINED_MODEL_PATH}")
    print(f"   To: {DRIVE_SAVE_PATH}")
    
    # Create parent directory
    os.makedirs(os.path.dirname(DRIVE_SAVE_PATH), exist_ok=True)
    
    # Copy model
    if os.path.exists(DRIVE_SAVE_PATH):
        shutil.rmtree(DRIVE_SAVE_PATH)
    
    shutil.copytree(TRAINED_MODEL_PATH, DRIVE_SAVE_PATH)
    print(f"\n‚úÖ Model saved to Google Drive!")
    print(f"üìÅ Location: {DRIVE_SAVE_PATH}")
else:
    print(f"‚ö†Ô∏è  Trained model not found at {TRAINED_MODEL_PATH}")
    print("Make sure training completed successfully in Step 7.")

In [None]:
import os
import subprocess
import json

# Set training directory
TRAINING_DIR = "/content/XiYan-SQL/XiYan-SQLTraining"
os.chdir(TRAINING_DIR)

# Create DeepSpeed config for single GPU (Colab typically has 1 GPU)
ds_config = {
    "compute_environment": "LOCAL_MACHINE",
    "distributed_type": "DEEPSPEED",
    "deepspeed_config": {
        "gradient_accumulation_steps": TRAINING_CONFIG["gradient_accumulation_steps"],
        "gradient_clipping": 1.0,
        "offload_optimizer_device": "cpu",  # Offload to CPU to save GPU memory
        "offload_param_device": "cpu",
        "zero3_init_flag": False,
        "zero3_save_16bit_model": False,
        "zero_stage": 2,  # Use Zero2 for efficiency
        "bf16": {
            "enabled": True
        }
    },
    "machine_rank": 0,
    "main_process_ip": None,
    "main_process_port": None,
    "num_machines": 1,
    "num_processes": 1,  # Single GPU in Colab
    "rdzv_backend": "static",
    "same_network": True,
    "tpu_env": [],
    "tpu_use_cluster": False,
    "tpu_use_sudo": False,
    "use_cpu": False
}

# Save DeepSpeed config
os.makedirs("train/config", exist_ok=True)
ds_config_path = "train/config/colab_zero2.json"
with open(ds_config_path, 'w') as f:
    json.dump(ds_config, f, indent=2)

print("üöÄ Starting training...")
print(f"üìÅ Model: {TRAINING_CONFIG['model_path']}")
print(f"üìä Dataset: {TRAINING_CONFIG['data_path']}")
print(f"üíæ Output: {TRAINING_CONFIG['output_dir']}")
print("\n‚è≥ This may take several hours depending on dataset size...")
print("\n" + "="*60)

# Build training command
cmd = [
    "accelerate", "launch",
    "--config_file", ds_config_path,
    "--num_processes", "1",
    "train/sft4xiyan.py",
    "--save_only_model", "True",
    "--resume", "False",
    "--model_name_or_path", TRAINING_CONFIG["model_path"],
    "--data_path", TRAINING_CONFIG["data_path"],
    "--output_dir", TRAINING_CONFIG["output_dir"],
    "--num_train_epochs", str(TRAINING_CONFIG["epochs"]),
    "--per_device_train_batch_size", str(TRAINING_CONFIG["batch_size"]),
    "--gradient_accumulation_steps", str(TRAINING_CONFIG["gradient_accumulation_steps"]),
    "--save_strategy", "steps",
    "--save_steps", str(TRAINING_CONFIG["save_steps"]),
    "--save_total_limit", "3",  # Keep only last 3 checkpoints
    "--learning_rate", str(TRAINING_CONFIG["learning_rate"]),
    "--weight_decay", str(TRAINING_CONFIG["weight_decay"]),
    "--adam_beta2", "0.95",
    "--warmup_ratio", "0.1",
    "--lr_scheduler_type", "cosine",
    "--logging_steps", "10",
    "--report_to", "none",
    "--model_max_length", str(TRAINING_CONFIG["max_length"]),
    "--lazy_preprocess", "False",
    "--gradient_checkpointing", "True",
    "--predict_with_generate", "True",
    "--include_inputs_for_metrics", "True",
    "--use_lora", str(TRAINING_CONFIG["use_lora"]),
    "--lora_r", str(TRAINING_CONFIG["lora_r"]),
    "--lora_alpha", str(TRAINING_CONFIG["lora_alpha"]),
    "--do_shuffle", str(TRAINING_CONFIG["shuffle"]),
    "--torch_compile", "False",
    "--group_by_length", str(TRAINING_CONFIG["group_by_length"]),
    "--model_type", "auto",
    "--use_flash_attention", str(TRAINING_CONFIG["use_flash_attention"]),
    "--bf16",
    "--expr_id", TRAINING_CONFIG["expr_id"]
]

# Run training
try:
    result = subprocess.run(
        cmd,
        cwd=TRAINING_DIR,
        check=False  # Don't raise on error, we'll check return code
    )
    
    if result.returncode == 0:
        print("\n" + "="*60)
        print("‚úÖ Training completed successfully!")
        print(f"üìÅ Model saved to: {TRAINING_CONFIG['output_dir']}")
    else:
        print("\n" + "="*60)
        print(f"‚ùå Training failed with return code {result.returncode}")
        print("\nCommon issues:")
        print("  - Out of Memory (OOM): Reduce batch_size or max_length")
        print("  - Model not found: Check MODEL_PATH in Step 4")
        print("  - Dataset not found: Check data_path in Step 6")
except Exception as e:
    print(f"\n‚ùå Error during training: {e}")

## Step 9: Save Trained Model to Google Drive (Optional)

After training completes, save your model to Google Drive for future use.

In [None]:
import shutil
import os

# Path to trained model
TRAINED_MODEL_PATH = TRAINING_CONFIG["output_dir"]

# Destination in Google Drive
# ‚ö†Ô∏è UPDATE THIS: Where you want to save the trained model
DRIVE_SAVE_PATH = "/content/drive/MyDrive/trained_models/nl2sql_3b_colab"  # ‚ö†Ô∏è UPDATE THIS

if os.path.exists(TRAINED_MODEL_PATH):
    print(f"üì• Copying trained model to Google Drive...")
    print(f"   From: {TRAINED_MODEL_PATH}")
    print(f"   To: {DRIVE_SAVE_PATH}")
    
    # Create parent directory
    os.makedirs(os.path.dirname(DRIVE_SAVE_PATH), exist_ok=True)
    
    # Copy model
    if os.path.exists(DRIVE_SAVE_PATH):
        shutil.rmtree(DRIVE_SAVE_PATH)
    
    shutil.copytree(TRAINED_MODEL_PATH, DRIVE_SAVE_PATH)
    print(f"\n‚úÖ Model saved to Google Drive: {DRIVE_SAVE_PATH}")
else:
    print(f"‚ö†Ô∏è  Trained model not found at {TRAINED_MODEL_PATH}")
    print("Make sure training completed successfully in Step 8.")

## Troubleshooting

### Out of Memory (OOM) Errors
- Reduce `batch_size` to 1
- Reduce `max_length` to 8192 or 4096
- Increase `gradient_accumulation_steps` to maintain effective batch size
- The DeepSpeed config already uses CPU offloading, which helps

### Model Not Found
- Check that `MODEL_DRIVE_PATH` in Step 4 is correct
- Verify the model folder exists in Google Drive
- Ensure the model folder contains all required files (config.json, tokenizer files, etc.)

### Dataset Not Found
- Check that dataset paths in Step 5 are correct
- Verify files exist in Google Drive
- If processing raw data, ensure `db_conn.json` exists

### Training Too Slow
- Colab free tier has limited GPU time
- Consider using Colab Pro for longer training sessions
- Reduce dataset size for testing (set `sample_num` in dataset config)

### Connection Issues
- Colab sessions may disconnect after inactivity
- Use `nohup` or save checkpoints frequently
- Consider running training in multiple sessions if needed