# PyCodeAI - Google Colab Training

This notebook allows you to train your PyCodeAI model using Google Colab's free GPU.

## Instructions

1.  **Upload Project**: Zip your entire `PyCodeAI` folder and upload it to your Google Drive.
2.  **Mount Drive**: Run the cell below to mount your Google Drive.
3.  **Navigate**: Change the directory path to where you uploaded `PyCodeAI`.
4.  **Train**: Run the training cell. It will resume from `best_model.npz` and save to `best_model_new.npz`.

In [None]:
# 1. Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# 2. Install Dependencies
# We need cupy for GPU acceleration
!pip install cupy-cuda12x

In [None]:
# 3. Navigate to Project Directory
import os

# CHANGE THIS PATH to match where you uploaded the folder in your Drive
# Example: '/content/drive/MyDrive/PyCodeAI'
PROJECT_PATH = '/content/drive/MyDrive/PyCodeAI'

if os.path.exists(PROJECT_PATH):
    os.chdir(PROJECT_PATH)
    print(f"Current working directory: {os.getcwd()}")
else:
    print(f"ERROR: Path not found: {PROJECT_PATH}")
    print("Please upload your PyCodeAI folder to Google Drive and update the path above.")

In [None]:
# 4. Resume Training
# This command will:
# - Use the GPU (--device gpu)
# - Load your existing best model (--load-model best_model.npz)
# - Save the result to a NEW file (--output-model best_model_new.npz)
# - Use a cloned tokenizer so the original is untouched (--output-tokenizer tokenizer_new.json)

# Create a copy of the tokenizer first so we don't overwrite the original
!cp tokenizer.json tokenizer_new.json

!python cli.py train \
    --device gpu \
    --load-model best_model.npz \
    --output-model best_model_new.npz \
    --output-tokenizer tokenizer_new.json \
    --epochs 5 \
    --batch-size 32 \
    --log-interval 10

In [None]:
# 5. (Optional) Verify Generation with New Model
# Test the newly trained model

!python cli.py generate "def fibonacci(n):" \
    --model-path best_model_new.npz \
    --tokenizer-path tokenizer_new.json \
    --max-tokens 100