# Model Training Example

This notebook demonstrates how to use the `model_trainer` class to submit preprocessing and training jobs to Vertex AI.
## 1. Configuration
Set the project root path:

In [None]:
import os
from pathlib import Path
import json
from typing import Dict
import sys

def setup_project_path():
    """Add project root to Python path by searching for .git directory"""
    current_path = Path.cwd()
    
    # Search up the directory tree for .git folder or pyproject.toml
    root_indicators = ['.git', 'pyproject.toml']
    
    while current_path != current_path.parent:
        if any((current_path / indicator).exists() for indicator in root_indicators):
            sys.path.append(str(current_path))
            return current_path
        current_path = current_path.parent
    
    raise RuntimeError(
        "Could not find project root. "
        "Please run this notebook from within the project directory."
    )

# Setup path
project_root = setup_project_path()
print(f"Project root detected at: {project_root}")

Set your vertex ai configuration:

In [None]:
from models.vertex_ai import get_config

# Get vertex ai configuration for 'dev' environment
config = get_config("dev")  

## 2. Create an Experiment
First, we import the `ModelTrainer` class and instantiate it.
We then create an experiment with a name and description.

In [4]:
from models.vertex_ai import ModelTrainer

trainer = ModelTrainer(
    environment="dev",
    data_bucket=f"{config.project_id}-{config.environment}-data",
)

experiment = trainer.create_experiment(
    "bert-fine-tuning",
    description="Fine-tuning BERT for text classification"
)

## 3. Submit a Preprocessing Job

Next, we might want to submit a preprocessing job. This job will execute a Python script on Vertex AI. The script should be located in your project directory.

In [None]:
# Submit a preprocessing job
preprocessing_job = trainer.submit_preprocessing_job(
    script_path="models/scripts/processing/extract_landmarks.py",
    args={
        "input-folder": "raw-data/",
        "output-folder": "landmarks/",
        "batch-size": 32
    },
    sync=True # Set to False if you don't want to wait for the job to complete
)

## 4. Train a Model

Finally, we submit a training job. This job will also execute a Python script on Vertex AI. The script should be compatible with the requirements specified in the `submit_training_job` method.

In [4]:
# Regular training with experiment tracking
trainer.submit_training_job(
    script_path="train.py",
    args={
        "data-folder": "training/",
        "model-folder": "models/",
        "learning_rate": 1e-4
    },
    experiment_name="whisper-fine-tuning",
    run_name="run-001"
)

Alternatively, we can use hyperparameter tuning to find the best parameters for our model using the `submit_hyperparameter_tuning_job` method.

In [None]:
# Hyperparameter tuning
trainer.submit_hyperparameter_tuning_job(
    script_path="train.py",
    metric_id="accuracy",
    parameter_specs={
        "learning_rate": {"min": 1e-5, "max": 1e-3, "scale": "log"},
        "batch_size": {"min": 16, "max": 128, "scale": "linear"}
    },
    max_trials=10
)