# Train GPT-OSS 20B → Rust Coding Agent (v2)

End-to-end pipeline for training a Rust coding agent on OpenAI's GPT-OSS 20B (MoE, ~3.6B active params).

**v2 Optimisations** (see `docs/V2_OPTIMIZATION_PLAN.md`):
- **Split LoRA** — 7-12x faster MoE training via reordered LoRA computation
- **FP8 RL** — 1.6x throughput, 60% less VRAM on H100 (auto-fallback to 4-bit on A100)
- **GRPO long context** — Chunked batching enables 65K+ context (up from 32K)
- **Flex Attention** — 8x longer sequences with attention sinks
- **Auto packing** — 3x faster SFT with uncontaminated packing (zero-config)
- **Expert monitoring** — Routing utilisation tracking across all phases
- **QAT export** — 97-100% MXFP4 quality retention (vs 59-89% with PTQ)

**4-Phase Pipeline:**
1. **Lang Adapter** — Rust domain specialisation via QLoRA (script 13 + 19)
2. **Core Agent SFT** — Agent trajectory training with tool use (script 14)
3. **IPO Preference** — Identity Preference Optimisation on ranked pairs (script 17)
4. **GRPO RL** — Group Relative Policy Optimisation with execution rewards (script 18)

**Requirements:**
- **GPU**: A100 40GB+ (H100 80GB recommended for FP8 + extended context)
- **Storage**: Google Drive for persistent checkpoints
- **Rust toolchain**: Installed automatically (rustup + cargo-mutants)

---
## Step 0: Environment Setup

### 0.1 Mount Google Drive & Clone Repository

**PyCharm / headless users:** If `drive.mount()` doesn't work (e.g. PyCharm Colab
plugin can't relay the OAuth popup), set `use_service_account = True` and provide
your service-account JSON key path in Step 0.3.


In [1]:
import os
import sys

IN_COLAB = 'google.colab' in sys.modules

use_service_account = True

DRIVE_MOUNTED = False

if IN_COLAB and not use_service_account:
    try:
        from google.colab import drive
        drive.mount('/content/drive')
        DRIVE_MOUNTED = True
        print("Google Drive mounted")
    except Exception as e:
        print(f"drive.mount() failed: {e}")
        print("Falling back to local-only mode.")
        print("Tip: set use_service_account=True and provide a JSON key in Step 0.3.")
elif IN_COLAB and use_service_account:
    print("Service-account mode selected — skipping drive.mount()")
    print("Configure credentials in Step 0.3.")
else:
    print("Running locally")

REPO_URL = "https://github.com/rmarnold/llm-training-pipeline.git"
BRANCH = "main"

REPO_DIR = "/content/llm-training-pipeline"

if IN_COLAB:
    if os.path.exists(REPO_DIR):
        %cd {REPO_DIR}
        !git pull origin {BRANCH}
    else:
        !git clone -b {BRANCH} {REPO_URL} {REPO_DIR}
        %cd {REPO_DIR}

    PROJECT_ROOT = REPO_DIR
else:
    PROJECT_ROOT = os.getcwd()

os.chdir(PROJECT_ROOT)
print(f"\nProject root: {PROJECT_ROOT}")


Service-account mode selected — skipping drive.mount()
Configure credentials in Step 0.3.
/content/llm-training-pipeline
remote: Enumerating objects: 15, done.[K
remote: Counting objects: 100% (15/15), done.[K
remote: Compressing objects: 100% (1/1), done.[K
remote: Total 8 (delta 7), reused 8 (delta 7), pack-reused 0 (from 0)[K
Unpacking objects: 100% (8/8), 867 bytes | 867.00 KiB/s, done.
From https://github.com/rmarnold/llm-training-pipeline
 * branch            main       -> FETCH_HEAD
   5c0647e..14453f8  main       -> origin/main
Updating 5c0647e..14453f8
Fast-forward
 scripts/pipeline_lib/text_cleaning/__init__.py | 10 [32m+++++[m[31m-----[m
 scripts/pipeline_lib/text_cleaning/toxicity.py |  2 [32m+[m[31m-[m
 scripts/pipeline_lib/training_callbacks.py     |  2 [32m+[m[31m-[m
 3 files changed, 7 insertions(+), 7 deletions(-)

Project root: /content/llm-training-pipeline


### 0.2 Install Dependencies

Installs pipeline deps, latest Unsloth (with Split LoRA + FP8 RL), and the Rust toolchain.

**Note:** flash-attn is intentionally NOT installed. FA3 is incompatible with GPT-OSS
backward passes (incorrect training loss). Unsloth's Flex Attention replaces it
automatically — no compilation step needed.


In [2]:
if IN_COLAB:
    print("Installing Python dependencies...")
    print("=" * 60)
    !pip install -q -e ".[gpt_oss,rust_eval,colab]"

    # Fix pyarrow binary incompatibility with datasets 4.x on Colab
    # (Colab's pre-installed pyarrow C extension doesn't match the new header)
    !pip install -q --force-reinstall pyarrow

    # v2: Force latest Unsloth with Split LoRA + FP8 RL + GRPO long context
    # Flex Attention (bundled with Unsloth) replaces Flash Attention for GPT-OSS
    print("\nInstalling latest Unsloth (Split LoRA + FP8 RL + Flex Attention)...")
    !pip install -q --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
    !pip install -q "unsloth[colab-new]"

    # v2: vLLM for FP8 RL inference (H100 only, optional)
    !pip install -q vllm>=0.12.0 2>/dev/null || true

    print("\nInstalling Rust toolchain...")
    print("=" * 60)
    !curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
    os.environ["PATH"] = f"{os.environ['HOME']}/.cargo/bin:{os.environ['PATH']}"
    !cargo install cargo-mutants

    # Verification — use importlib.metadata to check versions without importing
    # (importing unsloth triggers heavy CUDA init that can hang in a notebook cell)
    from importlib.metadata import version, PackageNotFoundError
    print("\n" + "=" * 60)
    print("Dependency Verification:")
    print("=" * 60)

    for pkg in ["unsloth", "trl", "peft", "datasets", "tiktoken", "vllm"]:
        try:
            ver = version(pkg)
            print(f"\u2713 {pkg}: {ver}")
        except PackageNotFoundError:
            if pkg == "vllm":
                print(f"\u2014 {pkg}: not installed (optional, H100 FP8 RL only)")
            else:
                print(f"\u2717 {pkg}: not installed")

    import subprocess
    for cmd, label in [("cargo --version", "cargo"), ("cargo mutants --version", "cargo-mutants")]:
        result = subprocess.run(cmd.split(), capture_output=True, text=True)
        if result.returncode == 0:
            print(f"\u2713 {label}: {result.stdout.strip()}")
        else:
            print(f"\u2717 {label}: not found")

    print("=" * 60)
else:
    print("Running locally \u2014 ensure deps are installed:")
    print("  pip install -e '.[gpt_oss,rust_eval]'")
    print("  pip install --upgrade unsloth unsloth_zoo")


Installing Python dependencies...
[0m  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
  Building editable for llm-training-pipeline (pyproject.toml) ... [?25l[?25hdone
[0m[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.9.1 requires torch==2.9.1, but you have torch 2.10.0 which is incompatible.
vllm 0.15.1 requires torch==2.9.1, but you have torch 2.10.0 which is incompatible.
vllm 0.15.1 requires torchvision==0.24.1, but you have torchvision 0.25.0 which is incompatible.
fastai 2.8.6 requires torch<2.10,>=1.10, but you have torch 2.10.0 which is incompatible.
cuda-python 12.9.5 requires cuda-bindings~=12.9.5, but you have cuda-bindings 

### 0.3 Configure Pipeline

Edit the variables below to configure the training run.

**Training Scope** (`training_scope`):
- `full` — All 4 phases end-to-end
- `quick_test` — Short runs (100 steps each) to verify setup
- `lang_adapter_only` — Only train lang_rust adapter + merge
- `skip_to_rl` — Start from existing core_agent checkpoint (IPO + GRPO only)

**Other settings:**
- `gpu_tier` — Auto-detected below; override if needed
- `max_steps_override` — Set >0 to cap all training stages (0 = use defaults)
- `skip_data_generation` — Use pre-generated data from Drive
- `include_grpo` — GRPO RL is slow; set `False` to skip
- `enable_qat_export` — v2: QAT for MXFP4 export (97-100% quality vs 59-89% PTQ)

**Service Account** (PyCharm / headless):
- `service_account_key` — Path to JSON key on the VM, or leave empty to upload
- `drive_folder_id` — Google Drive folder ID for `gpt-oss-20b-rust-agent-v2`

> **Note:** The key file must exist on the Colab VM, not your local machine.
> If the path doesn't exist, the cell will prompt you to upload via `files.upload()`.
> For PyCharm plugin (where `files.upload()` can't show a dialog), manually upload
> the key to `/content/service_account.json` before running this cell.


In [None]:
training_scope = "quick_test"  # "full", "quick_test", "lang_adapter_only", "skip_to_rl"

gpu_tier = "h100_80gb"  # "a100_40gb", "a100_80gb", "h100_80gb"

max_steps_override = 0  # Set >0 to cap all stages (0 = use defaults)

skip_data_generation = False  # True to use pre-generated data from Drive

include_grpo = True  # False to skip GRPO RL (slow)

enable_qat_export = False  # True for MXFP4 QAT export


service_account_key = ""  # Path to service-account JSON key (PyCharm/headless)

drive_folder_id = ""  # Google Drive folder ID for gpt-oss-20b-rust-agent-v2

# ============================================================
# SERVICE ACCOUNT KEY — upload to VM if needed
# ============================================================
# The key file lives on your local machine but the notebook runs on a
# remote Colab VM.  This block gets the key onto the VM.

_SA_VM_PATH = "/content/service_account.json"

if use_service_account and not service_account_key:
    # No path specified — check if a previous upload exists on the VM
    if os.path.exists(_SA_VM_PATH):
        service_account_key = _SA_VM_PATH
        print(f"Using previously uploaded key: {_SA_VM_PATH}")
    elif IN_COLAB:
        # Upload via browser (works in standard Colab, not PyCharm plugin)
        try:
            from google.colab import files as _files
            print("Upload your service-account JSON key file:")
            _uploaded = _files.upload()
            if _uploaded:
                _name = list(_uploaded.keys())[0]
                with open(_SA_VM_PATH, "wb") as _f:
                    _f.write(_uploaded[_name])
                service_account_key = _SA_VM_PATH
                print(f"Saved to {_SA_VM_PATH}")
            else:
                print("No file uploaded — falling back to local mode (no Drive backup).")
        except Exception as _e:
            print(f"files.upload() failed: {_e}")
            print("For PyCharm/headless: paste your key JSON into service_account_json below,")
            print("or upload the file manually to the VM at /content/service_account.json")
    else:
        print("Running locally — set service_account_key to your JSON key path.")

elif use_service_account and service_account_key:
    if not os.path.exists(service_account_key):
        # Path doesn't exist on the VM — it's probably a local machine path
        if os.path.exists(_SA_VM_PATH):
            print(f"Key not found at {service_account_key}")
            print(f"Using previously uploaded key: {_SA_VM_PATH}")
            service_account_key = _SA_VM_PATH
        elif IN_COLAB:
            print(f"Key not found at {service_account_key} (local path?)")
            print(f"Upload it to the Colab VM:")
            try:
                from google.colab import files as _files
                _uploaded = _files.upload()
                if _uploaded:
                    _name = list(_uploaded.keys())[0]
                    with open(_SA_VM_PATH, "wb") as _f:
                        _f.write(_uploaded[_name])
                    service_account_key = _SA_VM_PATH
                    print(f"Saved to {_SA_VM_PATH}")
            except Exception as _e:
                print(f"files.upload() failed: {_e}")
                print("Upload manually to /content/service_account.json and re-run.")
                service_account_key = ""

# ============================================================
# DRIVE MODE
# ============================================================
from scripts.pipeline_lib.drive_utils import DriveHelper

DRIVE_BASE = "/content/drive/MyDrive/gpt-oss-20b-rust-agent-v2"

if DRIVE_MOUNTED:
    DRIVE_MODE = "mounted"
elif use_service_account and service_account_key and drive_folder_id:
    DRIVE_MODE = "service_account"
else:
    DRIVE_MODE = "local"

drive_helper = DriveHelper(
    mode=DRIVE_MODE,
    drive_base=DRIVE_BASE,
    credentials_path=service_account_key or None,
    folder_id=drive_folder_id or None,
)

# ============================================================
# v2 GPU TIER CONFIGS (with H100 FP8 tier)
# ============================================================

GPU_CONFIGS = {
    "a100_40gb": {
        "moe_backend": "unsloth_triton",
        "load_mode": "4bit",
        "fast_inference": False,
        "lang_rust": {"batch": 1, "grad_accum": 8, "seq_len": 8192, "max_steps": 3000},
        "core_agent": {"batch": 1, "grad_accum": 4, "seq_len": 12288, "max_steps": 2000},
        "ipo": {"batch": 1, "grad_accum": 8, "seq_len": 12288, "max_steps": 1000},
        "grpo": {"batch": 1, "grad_accum": 4, "seq_len": 16384, "max_steps": 2000, "num_gen": 2},
    },
    "a100_80gb": {
        "moe_backend": "unsloth_triton",
        "load_mode": "4bit",
        "fast_inference": False,
        "lang_rust": {"batch": 1, "grad_accum": 8, "seq_len": 8192, "max_steps": 5000},
        "core_agent": {"batch": 1, "grad_accum": 4, "seq_len": 16384, "max_steps": 3000},
        "ipo": {"batch": 1, "grad_accum": 16, "seq_len": 16384, "max_steps": 2000},
        "grpo": {"batch": 1, "grad_accum": 8, "seq_len": 32768, "max_steps": 5000, "num_gen": 4},
    },
    "h100_80gb": {
        "moe_backend": "grouped_mm",
        "load_mode": "fp8",
        "fast_inference": True,
        "lang_rust": {"batch": 2, "grad_accum": 4, "seq_len": 8192, "max_steps": 5000},
        "core_agent": {"batch": 1, "grad_accum": 4, "seq_len": 16384, "max_steps": 3000},
        "ipo": {"batch": 1, "grad_accum": 16, "seq_len": 16384, "max_steps": 2000},
        "grpo": {"batch": 1, "grad_accum": 8, "seq_len": 65536, "max_steps": 7000, "num_gen": 4},
    },
}

# Quick test overrides
if training_scope == "quick_test":
    max_steps_override = 100

gpu_cfg = GPU_CONFIGS[gpu_tier]

# Detect CPU count and RAM for parallel mutation jobs.
# Each cargo-mutants worker spawns cargo build/test subprocesses that can
# each use 1-2 GB RAM.  Cap jobs to avoid OOM kills on Colab instances.
import multiprocessing
import os as _os
cpu_count = multiprocessing.cpu_count()

# RAM-aware job limit: allow ~20 GB per mutation worker for headroom
try:
    _mem_bytes = _os.sysconf('SC_PAGE_SIZE') * _os.sysconf('SC_PHYS_PAGES')
    total_ram_gb = _mem_bytes / (1024**3)
    ram_based_jobs = max(1, int(total_ram_gb / 20))
except (ValueError, OSError):
    total_ram_gb = 0
    ram_based_jobs = cpu_count

mutation_jobs = min(max(1, cpu_count - 2), ram_based_jobs)

# Build CONFIG dict
CONFIG = {
    "training_scope": training_scope,
    "gpu_tier": gpu_tier,
    "include_grpo": include_grpo,
    "skip_data_generation": skip_data_generation,
    "enable_qat_export": enable_qat_export,
    # v2: MoE backend + load mode
    "moe_backend": gpu_cfg["moe_backend"],
    "load_mode": gpu_cfg["load_mode"],
    "fast_inference": gpu_cfg["fast_inference"],
    # Lang adapter
    "lang_rust_batch": gpu_cfg["lang_rust"]["batch"],
    "lang_rust_grad_accum": gpu_cfg["lang_rust"]["grad_accum"],
    "lang_rust_seq_len": gpu_cfg["lang_rust"]["seq_len"],
    "lang_rust_max_steps": max_steps_override or gpu_cfg["lang_rust"]["max_steps"],
    # Core agent
    "core_agent_batch": gpu_cfg["core_agent"]["batch"],
    "core_agent_grad_accum": gpu_cfg["core_agent"]["grad_accum"],
    "core_agent_seq_len": gpu_cfg["core_agent"]["seq_len"],
    "core_agent_max_steps": max_steps_override or gpu_cfg["core_agent"]["max_steps"],
    # IPO
    "ipo_batch": gpu_cfg["ipo"]["batch"],
    "ipo_grad_accum": gpu_cfg["ipo"]["grad_accum"],
    "ipo_seq_len": gpu_cfg["ipo"]["seq_len"],
    "ipo_max_steps": max_steps_override or gpu_cfg["ipo"]["max_steps"],
    # GRPO
    "grpo_batch": gpu_cfg["grpo"]["batch"],
    "grpo_grad_accum": gpu_cfg["grpo"]["grad_accum"],
    "grpo_seq_len": gpu_cfg["grpo"]["seq_len"],
    "grpo_max_steps": max_steps_override or gpu_cfg["grpo"]["max_steps"],
    "grpo_num_gen": gpu_cfg["grpo"]["num_gen"],
    # Mutation generation — balance CPU parallelism with RAM headroom
    "max_mutations_per_repo": 50 if training_scope == "quick_test" else 100,
    "mutation_jobs": mutation_jobs,
    # Eval
    "eval_num_samples": 10 if training_scope == "quick_test" else 50,
}

print("=" * 60)
print("PIPELINE CONFIGURATION (v2)")
print("=" * 60)
print(f"\nScope: {training_scope.upper()}")
print(f"GPU tier: {gpu_tier}")
ram_str = f" | RAM: {total_ram_gb:.0f} GB" if total_ram_gb else ""
print(f"CPUs: {cpu_count}{ram_str} (mutation jobs: {CONFIG['mutation_jobs']})")
print(f"MoE backend: {CONFIG['moe_backend']}")
print(f"Load mode: {CONFIG['load_mode']}")
print(f"Fast inference (vLLM): {CONFIG['fast_inference']}")
print(f"Include GRPO: {include_grpo}")
print(f"QAT export: {enable_qat_export}")
print(f"Skip data gen: {skip_data_generation}")
print(f"Drive mode: {DRIVE_MODE}")
if max_steps_override:
    print(f"Max steps override: {max_steps_override}")
print(f"\nLang Adapter:  batch={CONFIG['lang_rust_batch']} x grad_accum={CONFIG['lang_rust_grad_accum']}, seq={CONFIG['lang_rust_seq_len']}, steps={CONFIG['lang_rust_max_steps']}")
print(f"Core Agent:    batch={CONFIG['core_agent_batch']} x grad_accum={CONFIG['core_agent_grad_accum']}, seq={CONFIG['core_agent_seq_len']}, steps={CONFIG['core_agent_max_steps']}")
print(f"IPO:           batch={CONFIG['ipo_batch']} x grad_accum={CONFIG['ipo_grad_accum']}, seq={CONFIG['ipo_seq_len']}, steps={CONFIG['ipo_max_steps']}")
if include_grpo:
    print(f"GRPO:          batch={CONFIG['grpo_batch']} x grad_accum={CONFIG['grpo_grad_accum']}, seq={CONFIG['grpo_seq_len']}, steps={CONFIG['grpo_max_steps']}, gen={CONFIG['grpo_num_gen']}")
print("=" * 60)


### 0.4 Set Up Persistent Storage


In [4]:
DRIVE_SUBDIRS = [
    "checkpoints/lang_rust",
    "checkpoints/core_agent",
    "checkpoints/core_agent_ipo",
    "checkpoints/core_agent_grpo",
    "checkpoints/gpt-oss-20b-rust-merged",
    "data/rust/lang_rust",
    "data/rust/core_agent",
    "data/rust/mutations",
    "data/rust/ipo",
    "data/rust/grpo",
    "data/rust/eval",
    "data/rust/repos",
    "logs",
    "evals/rust_agent",
]

if DRIVE_MODE == "mounted":
    # Mounted mode: create Drive dirs + symlink local → Drive (original behaviour)
    print(f"Setting up storage at: {DRIVE_BASE}")
    for subdir in DRIVE_SUBDIRS:
        os.makedirs(os.path.join(DRIVE_BASE, subdir), exist_ok=True)

    for dir_name in ["checkpoints", "data", "logs", "evals"]:
        local_path = os.path.join(PROJECT_ROOT, dir_name)
        drive_path = os.path.join(DRIVE_BASE, dir_name)

        if os.path.exists(local_path) and not os.path.islink(local_path):
            !cp -r {local_path}/* {drive_path}/ 2>/dev/null || true
            !rm -rf {local_path}
        elif os.path.islink(local_path):
            os.unlink(local_path)

        os.symlink(drive_path, local_path)
        print(f"  {dir_name} -> Drive (mounted)")

elif DRIVE_MODE == "service_account":
    # Service-account mode: create local dirs, restore existing data from Drive
    print("Setting up local storage + Drive API restore...")
    for subdir in DRIVE_SUBDIRS:
        os.makedirs(os.path.join(PROJECT_ROOT, subdir), exist_ok=True)
        drive_helper.ensure_dir(subdir)

    for dir_name in ["checkpoints", "data", "logs", "evals"]:
        local_path = os.path.join(PROJECT_ROOT, dir_name)
        # Remove stale symlinks from previous mounted runs
        if os.path.islink(local_path):
            os.unlink(local_path)
            os.makedirs(local_path, exist_ok=True)
        print(f"  {dir_name} -> local (backed up via Drive API)")

    print("\nRestoring existing data from Drive...")
    for subdir in DRIVE_SUBDIRS:
        local_target = os.path.join(PROJECT_ROOT, subdir)
        drive_helper.restore(subdir, local_target)
    print("Restore complete.")

else:
    # Local-only mode — no Drive
    for d in ["checkpoints", "data/rust", "logs", "evals/rust_agent"]:
        os.makedirs(d, exist_ok=True)
    print("Local directories created (no Drive backup).")

print("\nStorage ready!")


Local directories created (no Drive backup).

Storage ready!


### 0.5 Check GPU & Configure MoE Backend

v2: Auto-detects H100 for FP8 RL and sets the optimal Split LoRA backend.


In [5]:
import torch

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / (1024**3)
    capability = torch.cuda.get_device_capability()
    is_h100 = "H100" in gpu_name or "H200" in gpu_name or "B200" in gpu_name

    CONFIG["use_fp8"] = capability[0] >= 9 and is_h100

    # v2: Auto-detect GPU tier (now includes H100)
    if is_h100:
        detected_tier = "h100_80gb"
    elif gpu_memory >= 70:
        detected_tier = "a100_80gb"
    else:
        detected_tier = "a100_40gb"

    if detected_tier != CONFIG["gpu_tier"]:
        print(f"NOTE: Auto-detected {detected_tier}, overriding configured {CONFIG['gpu_tier']}")
        CONFIG["gpu_tier"] = detected_tier
        # Re-derive tier-specific settings
        gpu_cfg = GPU_CONFIGS[detected_tier]
        CONFIG["moe_backend"] = gpu_cfg["moe_backend"]
        CONFIG["load_mode"] = gpu_cfg["load_mode"]
        CONFIG["fast_inference"] = gpu_cfg["fast_inference"]

    # v2: Set Split LoRA MoE backend
    os.environ["UNSLOTH_MOE_BACKEND"] = CONFIG["moe_backend"]

    print("=" * 60)
    print(f"GPU: {gpu_name} ({gpu_memory:.0f} GB)")
    print(f"Compute capability: {capability[0]}.{capability[1]}")
    print(f"Tier: {CONFIG['gpu_tier']}")
    print(f"\nv2 Optimisations:")
    print(f"  Split LoRA backend: {CONFIG['moe_backend']}")
    print(f"  Load mode: {CONFIG['load_mode']}")
    print(f"  FP8 available: {CONFIG['use_fp8']}")
    print(f"  Fast inference (vLLM): {CONFIG['fast_inference']}")

    if gpu_memory < 40:
        print("\nWARNING: <40 GB VRAM. Long-context training (16K+) may OOM.")
    print("=" * 60)
else:
    print("No GPU detected!")
    CONFIG["use_fp8"] = False
    os.environ["UNSLOTH_MOE_BACKEND"] = "native_torch"

GPU: NVIDIA H100 80GB HBM3 (79 GB)
Compute capability: 9.0
Tier: h100_80gb

v2 Optimisations:
  Split LoRA backend: grouped_mm
  Load mode: fp8
  FP8 available: True
  Fast inference (vLLM): True


---
## Step 1: Data Generation

Generates mutation data from curated Rust repos and agent trajectories.
Skip this step if you have pre-generated data on Drive (`skip_data_generation=True`).

### 1.1 Generate Mutation Data

Runs `cargo-mutants` on curated Rust repos to produce bug-fix training pairs.


In [None]:
if CONFIG["skip_data_generation"]:
    print("Skipping data generation (using pre-generated data from Drive)")
elif CONFIG["training_scope"] in ("skip_to_rl",):
    print("Skipping — not needed for this training scope")
else:
    max_muts = CONFIG["max_mutations_per_repo"]
    jobs = CONFIG["mutation_jobs"]

    print(f"Generating mutations (max {max_muts}/repo, {jobs} parallel jobs)...")
    print("=" * 60)

    !python scripts/16_generate_mutations.py \
        --max_mutations_per_repo {max_muts} \
        --jobs {jobs}

    drive_helper.backup("data/rust/mutations", "data/rust/mutations")
    if DRIVE_MODE != "local":
        print("\nBacked up mutations to Drive.")


Generating mutations (max 50/repo, 11 parallel jobs)...

Generating Mutation Training Data

Loaded 21 repos

[1/21] Processing https://github.com/BurntSushi/bstr...
  Cloned https://github.com/BurntSushi/bstr -> /tmp/rust_repos/bstr
  Running: cargo mutants --timeout 300 --jobs 11 --output /tmp/mutants_vcspirs4 --json


### 1.2 Generate Agent Trajectories

Generates multi-turn agent trajectories from mutations + Strandset in Harmony format.


In [None]:
if CONFIG["skip_data_generation"]:
    print("Skipping data generation (using pre-generated data from Drive)")
elif CONFIG["training_scope"] in ("skip_to_rl",):
    print("Skipping — not needed for this training scope")
else:
    max_samples = 500 if CONFIG["training_scope"] == "quick_test" else 5000

    print(f"Generating trajectories (max {max_samples} per source)...")
    print("=" * 60)

    cmd = f"python scripts/15_generate_trajectories.py --max_samples {max_samples}"

    mutations_path = "data/rust/mutations/mutations.jsonl"
    if os.path.exists(mutations_path):
        cmd += f" --mutations_path {mutations_path}"

    !{cmd}

    drive_helper.backup("data/rust/core_agent", "data/rust/core_agent")
    if DRIVE_MODE != "local":
        print("\nBacked up trajectories to Drive.")


### 1.3 Verify Data


In [None]:
data_checks = [
    ("Mutations", "data/rust/mutations"),
    ("Lang Rust train", "data/rust/lang_rust/train"),
    ("Core Agent train", "data/rust/core_agent/train"),
    ("IPO train", "data/rust/ipo/train"),
    ("GRPO tasks", "data/rust/grpo"),
    ("Eval tasks", "data/rust/eval"),
]

print("Data Verification:")
print("=" * 60)
for name, path in data_checks:
    exists = os.path.exists(path)
    if exists and os.path.isdir(path):
        items = os.listdir(path)
        print(f"  \u2713 {name}: {path} ({len(items)} items)")
    elif exists:
        size_mb = os.path.getsize(path) / (1024 * 1024)
        print(f"  \u2713 {name}: {path} ({size_mb:.1f} MB)")
    else:
        needed = True
        if CONFIG["training_scope"] == "skip_to_rl" and name in ("Mutations", "Lang Rust train", "Core Agent train"):
            needed = False
        if CONFIG["training_scope"] == "lang_adapter_only" and name in ("IPO train", "GRPO tasks"):
            needed = False
        sym = "\u2717" if needed else "\u2014"
        label = "MISSING" if needed else "not needed"
        print(f"  {sym} {name}: {label}")
print("=" * 60)

---
## Step 2: Lang Adapter Training

Train a QLoRA adapter (rank 64) to specialise GPT-OSS 20B on Rust syntax, stdlib, and idioms.
Then merge the adapter into the base weights for downstream training.

**v2:** Split LoRA backend auto-enabled for 7-12x faster MoE training.

### 2.1 Train lang_rust Adapter

v2: Split LoRA enabled via UNSLOTH_MOE_BACKEND env var (set in 0.5).


In [None]:
if CONFIG["training_scope"] == "skip_to_rl":
    print("Skipping — scope is skip_to_rl")
else:
    batch = CONFIG["lang_rust_batch"]
    grad_accum = CONFIG["lang_rust_grad_accum"]
    max_steps = CONFIG["lang_rust_max_steps"]
    seq_len = CONFIG["lang_rust_seq_len"]

    cmd = f"python scripts/13_train_lang_adapter.py"
    cmd += f" --per_device_train_batch_size {batch}"
    cmd += f" --gradient_accumulation_steps {grad_accum}"
    cmd += f" --max_steps {max_steps}"

    print(f"Training lang_rust adapter...")
    print(f"  Batch: {batch} x {grad_accum} = {batch * grad_accum}")
    print(f"  Max steps: {max_steps}")
    print(f"  Seq length: {seq_len} (from config)")
    print(f"  Split LoRA backend: {CONFIG['moe_backend']}")
    print("=" * 60)

    !{cmd}

    drive_helper.backup("checkpoints/lang_rust", "checkpoints/lang_rust")
    if DRIVE_MODE != "local":
        print("\nCheckpoint backed up to Drive.")


### 2.2 Merge lang_rust into Base


In [None]:
if CONFIG["training_scope"] == "skip_to_rl":
    print("Skipping — scope is skip_to_rl")
else:
    print("Merging lang_rust adapter into base model...")
    print("=" * 60)

    !python scripts/19_merge_adapter.py \
        --adapter_path checkpoints/lang_rust/final \
        --output_dir checkpoints/gpt-oss-20b-rust-merged \
        --export_formats hf

    drive_helper.backup("checkpoints/gpt-oss-20b-rust-merged", "checkpoints/gpt-oss-20b-rust-merged")
    if DRIVE_MODE != "local":
        print("\nMerged model backed up to Drive.")


### 2.3 Verify Merge


In [None]:
if CONFIG["training_scope"] == "skip_to_rl":
    print("Skipping \u2014 scope is skip_to_rl")
else:
    merged_path = "checkpoints/gpt-oss-20b-rust-merged"
    adapter_path = "checkpoints/lang_rust/final"

    print("Merge Verification:")
    print("=" * 60)

    if os.path.exists(merged_path):
        files = os.listdir(merged_path)
        safetensors = [f for f in files if f.endswith(".safetensors")]
        print(f"  \u2713 Merged model: {merged_path}")
        print(f"    {len(safetensors)} safetensors shard(s), {len(files)} total files")
    else:
        print(f"  \u2717 Merged model not found at {merged_path}")

    if os.path.exists(adapter_path):
        adapter_files = os.listdir(adapter_path)
        print(f"  \u2713 Adapter: {adapter_path} ({len(adapter_files)} files)")
    else:
        print(f"  \u2717 Adapter not found at {adapter_path}")

    if CONFIG["training_scope"] == "lang_adapter_only":
        print("\n\u2713 lang_adapter_only scope complete. Stopping here.")

    print("=" * 60)

---
## Step 3: Core Agent SFT

Train a higher-rank LoRA adapter (rank 128) on agent trajectories with tool use.
Uses the merged lang_rust model as the base.

**v2:** Auto uncontaminated packing (3x faster, zero-config). Flex Attention for long context.

### 3.1 Train core_agent Adapter

v2: Auto packing (3x faster) + Split LoRA backend enabled.


In [None]:
if CONFIG["training_scope"] in ("lang_adapter_only", "skip_to_rl"):
    print(f"Skipping — scope is {CONFIG['training_scope']}")
else:
    batch = CONFIG["core_agent_batch"]
    grad_accum = CONFIG["core_agent_grad_accum"]
    max_steps = CONFIG["core_agent_max_steps"]
    seq_len = CONFIG["core_agent_seq_len"]

    cmd = f"python scripts/14_train_core_agent.py"
    cmd += f" --per_device_train_batch_size {batch}"
    cmd += f" --gradient_accumulation_steps {grad_accum}"
    cmd += f" --max_steps {max_steps}"

    print(f"Training core_agent adapter...")
    print(f"  Batch: {batch} x {grad_accum} = {batch * grad_accum}")
    print(f"  Max steps: {max_steps}")
    print(f"  Seq length: {seq_len} (from config)")
    print(f"  LoRA rank: 128")
    print(f"  Split LoRA backend: {CONFIG['moe_backend']}")
    print(f"  Auto packing: enabled (uncontaminated)")
    print("=" * 60)

    !{cmd}

    drive_helper.backup("checkpoints/core_agent", "checkpoints/core_agent")
    if DRIVE_MODE != "local":
        print("\nCheckpoint backed up to Drive.")


### 3.2 Verify core_agent


In [None]:
if CONFIG["training_scope"] in ("lang_adapter_only", "skip_to_rl"):
    print(f"Skipping \u2014 scope is {CONFIG['training_scope']}")
else:
    ckpt_path = "checkpoints/core_agent/final"

    print("Core Agent Verification:")
    print("=" * 60)

    if os.path.exists(ckpt_path):
        files = os.listdir(ckpt_path)
        print(f"  \u2713 Checkpoint: {ckpt_path} ({len(files)} files)")

        adapter_config = os.path.join(ckpt_path, "adapter_config.json")
        if os.path.exists(adapter_config):
            import json
            with open(adapter_config) as f:
                cfg = json.load(f)
            print(f"    LoRA rank: {cfg.get('r', '?')}")
            print(f"    Alpha: {cfg.get('lora_alpha', '?')}")
            print(f"    Target modules: {cfg.get('target_modules', '?')}")
    else:
        print(f"  \u2717 Checkpoint not found at {ckpt_path}")

    print("=" * 60)

---
## Step 4: Preference Optimisation (IPO)

Train with Identity Preference Optimisation on ranked pairs.
Very low learning rate (5e-7), 1 epoch only to avoid collapse.

**v2:** FP8 weights on H100 (60% less VRAM). Expert utilisation monitoring.

### 4.1 Train with IPO

v2: FP8 on H100, expert utilisation monitoring, Split LoRA.


In [None]:
if CONFIG["training_scope"] == "lang_adapter_only":
    print("Skipping — scope is lang_adapter_only")
else:
    batch = CONFIG["ipo_batch"]
    grad_accum = CONFIG["ipo_grad_accum"]
    max_steps = CONFIG["ipo_max_steps"]

    if CONFIG["training_scope"] == "skip_to_rl":
        ipo_checkpoint = "checkpoints/core_agent/final"
        print("Using existing core_agent checkpoint (skip_to_rl mode)")
    else:
        ipo_checkpoint = "checkpoints/core_agent/final"

    cmd = f"python scripts/17_ipo_preference.py"
    cmd += f" --checkpoint {ipo_checkpoint}"
    cmd += f" --per_device_train_batch_size {batch}"
    cmd += f" --gradient_accumulation_steps {grad_accum}"
    cmd += f" --max_steps {max_steps}"

    print(f"Training with IPO...")
    print(f"  Checkpoint: {ipo_checkpoint}")
    print(f"  Batch: {batch} x {grad_accum} = {batch * grad_accum}")
    print(f"  Max steps: {max_steps}")
    print(f"  Loss: IPO (beta=0.1)")
    print(f"  Load mode: {CONFIG['load_mode']}")
    print(f"  Split LoRA backend: {CONFIG['moe_backend']}")
    print("=" * 60)

    !{cmd}

    drive_helper.backup("checkpoints/core_agent_ipo", "checkpoints/core_agent_ipo")
    if DRIVE_MODE != "local":
        print("\nCheckpoint backed up to Drive.")


### 4.2 Verify IPO


In [None]:
if CONFIG["training_scope"] == "lang_adapter_only":
    print("Skipping \u2014 scope is lang_adapter_only")
else:
    ckpt_path = "checkpoints/core_agent_ipo/final"

    print("IPO Verification:")
    print("=" * 60)

    if os.path.exists(ckpt_path):
        files = os.listdir(ckpt_path)
        print(f"  \u2713 IPO checkpoint: {ckpt_path} ({len(files)} files)")
    else:
        print(f"  \u2717 IPO checkpoint not found at {ckpt_path}")

    # Check tensorboard logs for KL divergence
    tb_dir = "checkpoints/core_agent_ipo"
    tb_files = []
    if os.path.exists(tb_dir):
        for root, dirs, fnames in os.walk(tb_dir):
            for fn in fnames:
                if fn.startswith("events.out.tfevents"):
                    tb_files.append(os.path.join(root, fn))
    if tb_files:
        print(f"  \u2713 TensorBoard logs found ({len(tb_files)} event files)")
        print(f"    Monitor KL divergence: warn >0.3, abort >0.5")
    else:
        print(f"  \u2014 No TensorBoard logs found")

    print("=" * 60)

---
## Step 5: GRPO RL

Group Relative Policy Optimisation with execution-based rewards.
Generates N completions per prompt, runs `cargo check/test/clippy`, computes group-relative advantages.

**v2 Optimisations:**
- FP8 RL with vLLM inference on H100 (1.6x throughput)
- Chunked batching for 7x longer context
- Extended curriculum: 65K context on H100 (up from 32K)
- Harmony format compliance reward to prevent infinite reasoning loops

**This step is optional** (`include_grpo=False` to skip).

### 5.1 Train with GRPO

v2: FP8 RL + vLLM (H100), chunked batching, extended curriculum.


In [None]:
if CONFIG["training_scope"] == "lang_adapter_only":
    print("Skipping — scope is lang_adapter_only")
elif not CONFIG["include_grpo"]:
    print("Skipping — GRPO disabled (include_grpo=False)")
else:
    batch = CONFIG["grpo_batch"]
    grad_accum = CONFIG["grpo_grad_accum"]
    max_steps = CONFIG["grpo_max_steps"]
    max_seq = CONFIG["grpo_seq_len"]

    grpo_checkpoint = "checkpoints/core_agent_ipo/final"

    cmd = f"python scripts/18_grpo_rl.py"
    cmd += f" --checkpoint {grpo_checkpoint}"
    cmd += f" --per_device_train_batch_size {batch}"
    cmd += f" --gradient_accumulation_steps {grad_accum}"
    cmd += f" --max_steps {max_steps}"

    # v2: Note which optimisations are active
    v2_features = []
    v2_features.append(f"Split LoRA ({CONFIG['moe_backend']})")
    if CONFIG["load_mode"] == "fp8":
        v2_features.append("FP8 weights")
    if CONFIG["fast_inference"]:
        v2_features.append("vLLM inference")
    v2_features.append("Chunked batching (auto)")
    v2_features.append("Auto packing")

    if CONFIG["gpu_tier"] == "a100_40gb":
        print("NOTE: 40GB GPU — GRPO sequence length capped at 16384")

    print(f"Training with GRPO (v2)...")
    print(f"  Checkpoint: {grpo_checkpoint}")
    print(f"  Batch: {batch} x {grad_accum} = {batch * grad_accum}")
    print(f"  Max steps: {max_steps}")
    print(f"  Max seq length: {max_seq}")
    print(f"  Generations per prompt: {CONFIG['grpo_num_gen']}")
    print(f"\n  v2 features active:")
    for feat in v2_features:
        print(f"    ✓ {feat}")
    print("=" * 60)

    !{cmd}

    drive_helper.backup("checkpoints/core_agent_grpo", "checkpoints/core_agent_grpo")
    if DRIVE_MODE != "local":
        print("\nCheckpoint backed up to Drive.")


### 5.2 Verify GRPO


In [None]:
if CONFIG["training_scope"] == "lang_adapter_only":
    print("Skipping \u2014 scope is lang_adapter_only")
elif not CONFIG["include_grpo"]:
    print("Skipping \u2014 GRPO disabled")
else:
    ckpt_path = "checkpoints/core_agent_grpo/final"

    print("GRPO Verification:")
    print("=" * 60)

    if os.path.exists(ckpt_path):
        files = os.listdir(ckpt_path)
        print(f"  \u2713 GRPO checkpoint: {ckpt_path} ({len(files)} files)")
    else:
        print(f"  \u2717 GRPO checkpoint not found at {ckpt_path}")

    print("=" * 60)

---
## Step 6: Evaluation

Evaluate the best checkpoint on held-out Rust tasks using execution-based metrics
(cargo check, cargo test, clippy).

### 6.1 Run Rust Evaluation


In [None]:
if CONFIG["training_scope"] == "lang_adapter_only":
    print("Skipping — scope is lang_adapter_only")
else:
    # Determine best checkpoint
    if CONFIG["include_grpo"] and os.path.exists("checkpoints/core_agent_grpo/final"):
        eval_checkpoint = "checkpoints/core_agent_grpo/final"
    elif os.path.exists("checkpoints/core_agent_ipo/final"):
        eval_checkpoint = "checkpoints/core_agent_ipo/final"
    elif os.path.exists("checkpoints/core_agent/final"):
        eval_checkpoint = "checkpoints/core_agent/final"
    else:
        eval_checkpoint = "checkpoints/core_agent_ipo/final"

    num_samples = CONFIG["eval_num_samples"]

    print(f"Evaluating checkpoint: {eval_checkpoint}")
    print(f"Samples: {num_samples}")
    print("=" * 60)

    !python scripts/eval_rust_agent.py \
        --checkpoint {eval_checkpoint} \
        --num_samples {num_samples}

    drive_helper.backup("evals/rust_agent", "evals/rust_agent")
    if DRIVE_MODE != "local":
        print("\nResults backed up to Drive.")


### 6.2 Check Promotion Gates


In [None]:
if CONFIG["training_scope"] == "lang_adapter_only":
    print("Skipping \u2014 scope is lang_adapter_only")
else:
    print("Checking promotion gates...")
    print("=" * 60)

    !python scripts/12_check_gates.py rust_agent

### 6.3 Display Results


In [None]:
if CONFIG["training_scope"] == "lang_adapter_only":
    print("Skipping \u2014 scope is lang_adapter_only")
else:
    import json

    metrics_path = "evals/rust_agent/metrics.json"

    if os.path.exists(metrics_path):
        with open(metrics_path) as f:
            metrics = json.load(f)

        targets = {
            "cargo_check_pass_rate": (0.85, "higher"),
            "cargo_test_pass_rate": (0.70, "higher"),
            "clippy_clean_rate": (0.80, "higher"),
            "iterations_to_green_median": (3, "lower"),
            "diff_size_median": (50, "lower"),
            "tool_call_format_accuracy": (0.99, "higher"),
            "hallucinated_api_rate": (0.05, "lower"),
        }

        print("=" * 60)
        print("EVALUATION RESULTS")
        print("=" * 60)
        print(f"{'Metric':<32} {'Value':>8} {'Target':>8} {'Status':>8}")
        print("-" * 60)

        for key, (target, direction) in targets.items():
            value = metrics.get(key)
            if value is None:
                print(f"{key:<32} {'N/A':>8} {target:>8} {'\u2014':>8}")
                continue

            if direction == "higher":
                passed = value >= target
            else:
                passed = value <= target

            status = "\u2713 PASS" if passed else "\u2717 FAIL"
            fmt_val = f"{value:.2%}" if isinstance(value, float) and value <= 1 else f"{value}"
            fmt_tgt = f"{target:.0%}" if isinstance(target, float) and target <= 1 else f"{target}"
            print(f"{key:<32} {fmt_val:>8} {fmt_tgt:>8} {status:>8}")

        print("=" * 60)
    else:
        print(f"\u2717 Metrics file not found at {metrics_path}")
        print("Run evaluation (6.1) first.")

---
## Step 7: Test Model

Load the trained model and generate Rust code interactively.

**v2:** FP8 loading on H100 for faster inference. `fast_inference=True` enables vLLM backend.

### 7.1 Load Model

v2: FP8 loading on H100, vLLM-backed inference.


In [None]:
from unsloth import FastLanguageModel
import torch

CHECKPOINT_PRIORITY = [
    "checkpoints/core_agent_grpo/final",
    "checkpoints/core_agent_ipo/final",
    "checkpoints/core_agent/final",
    "checkpoints/gpt-oss-20b-rust-merged",
]

MODEL_PATH = None
for path in CHECKPOINT_PRIORITY:
    if os.path.exists(path):
        MODEL_PATH = path
        break

if MODEL_PATH is None:
    print("\u2717 No checkpoint found. Train the model first.")
else:
    print(f"Loading model from: {MODEL_PATH}")

    # v2: Use FP8 on H100, 4-bit otherwise
    load_kwargs = {
        "max_seq_length": 4096,
        "dtype": torch.bfloat16,
    }
    if CONFIG.get("load_mode") == "fp8" and CONFIG.get("use_fp8"):
        load_kwargs["load_in_fp8"] = True
        print("  Mode: FP8 (H100)")
    else:
        load_kwargs["load_in_4bit"] = True
        print("  Mode: 4-bit QLoRA")

    if CONFIG.get("fast_inference"):
        load_kwargs["fast_inference"] = True
        print("  Inference: vLLM backend")

    print("=" * 60)

    model, tokenizer = FastLanguageModel.from_pretrained(MODEL_PATH, **load_kwargs)
    FastLanguageModel.for_inference(model)

    print("\u2713 Model loaded!")

### 7.2 Generate Rust Code

Tests the model on 3 pre-defined Rust prompts using Harmony format.


In [None]:
import sys
sys.path.insert(0, "scripts")
from dataset_formatters.harmony import encode_harmony_messages

TEST_PROMPTS = [
    "Write a Rust function `fn merge_sorted(a: &[i32], b: &[i32]) -> Vec<i32>` that merges two sorted slices into a single sorted vector.",
    "This Rust code fails the borrow checker. Fix it:\n```rust\nfn main() {\n    let mut v = vec![1, 2, 3];\n    let first = &v[0];\n    v.push(4);\n    println!(\"{}\", first);\n}\n```",
    "Write an async Rust function using tokio that fetches a URL with reqwest, retries up to 3 times on failure, and returns the response body as a String.",
]

def generate_rust(prompt, max_tokens=1024):
    messages = [{"role": "user", "content": prompt}]
    formatted = encode_harmony_messages(
        messages,
        developer_instructions="You are a Rust programming expert. Write correct, idiomatic code.",
    )
    inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            temperature=0.3,
            do_sample=True,
            top_p=0.9,
        )
    return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

for i, prompt in enumerate(TEST_PROMPTS, 1):
    print(f"\n{'=' * 60}")
    print(f"Test {i}: {prompt[:80]}...")
    print("=" * 60)
    response = generate_rust(prompt)
    print(response)
    print()

### 7.3 Custom Prompt


In [None]:
CUSTOM_PROMPT = "Write a Rust function that reads a CSV file and returns the sum of a specified column."

print(f"Prompt: {CUSTOM_PROMPT}")
print("=" * 60)
print(generate_rust(CUSTOM_PROMPT))

---
## Step 8: Export

Merge the final adapter and export to HuggingFace + GGUF formats.

**v2:** Optional QAT export for 97-100% MXFP4 quality retention (vs 59-89% with PTQ).

### 8.1 Export to GGUF

Merges the best adapter and exports as HF safetensors + GGUF Q4.


In [None]:
ADAPTER_PRIORITY = [
    "checkpoints/core_agent_grpo/final",
    "checkpoints/core_agent_ipo/final",
    "checkpoints/core_agent/final",
    "checkpoints/lang_rust/final",
]

adapter_path = None
for path in ADAPTER_PRIORITY:
    if os.path.exists(path):
        adapter_path = path
        break

if adapter_path is None:
    print("✗ No adapter checkpoint found.")
else:
    export_dir = "checkpoints/gpt-oss-20b-rust-export-v2"
    print(f"Exporting adapter: {adapter_path}")
    print(f"Output: {export_dir}")
    print("=" * 60)

    !python scripts/19_merge_adapter.py \
        --adapter_path {adapter_path} \
        --output_dir {export_dir} \
        --export_formats hf gguf_q4

    drive_helper.backup(export_dir, "checkpoints/gpt-oss-20b-rust-export-v2")
    if DRIVE_MODE != "local":
        print("\nExport backed up to Drive.")


### 8.2 QAT Export (Optional)

v2: Quantisation-Aware Training for MXFP4 deployment.
Recovers 97-100% quality vs 59-89% with post-training quantisation.
Requires: `pip install nvidia-modelopt`


In [None]:
if not CONFIG.get("enable_qat_export"):
    print("QAT export disabled. Enable via enable_qat_export=True in Step 0.3.")
    print("\nQAT recovers 97-100% quality when deploying to MXFP4,")
    print("vs 59-89% with standard post-training quantisation (PTQ).")
else:
    export_dir = "checkpoints/gpt-oss-20b-rust-export-v2"
    qat_dir = "checkpoints/gpt-oss-20b-rust-qat"

    if not os.path.exists(export_dir):
        print("\u2717 Run standard export (8.1) first.")
    else:
        print("Running QAT pass on merged model...")
        print("  This fine-tunes with MXFP4-aware quantisation at reduced LR (1e-5).")
        print("=" * 60)

        try:
            import modelopt.torch.quantization as mtq
            print("\u2713 nvidia-modelopt available")

            # QAT would be run here via mtq.quantize()
            # For now, document the expected command:
            print("\nQAT pipeline (manual steps):")
            print(f"  1. Load merged BF16 model from {export_dir}")
            print(f"  2. mtq.quantize(model, config=mtq.MXFP4_DEFAULT_CFG)")
            print(f"  3. Fine-tune for ~100 steps at LR 1e-5")
            print(f"  4. Export to {qat_dir}")
        except ImportError:
            print("\u2717 nvidia-modelopt not installed.")
            print("  Install: pip install nvidia-modelopt")
            print("  See: https://developer.nvidia.com/blog/fine-tuning-gpt-oss-for-accuracy-and-performance-with-quantization-aware-training/")

### 8.3 Download GGUF


In [None]:
if IN_COLAB:
    from google.colab import files
    import glob

    export_dir = "checkpoints/gpt-oss-20b-rust-export-v2"
    gguf_files = glob.glob(os.path.join(export_dir, "*.gguf"))

    if gguf_files:
        gguf_path = gguf_files[0]
        size_gb = os.path.getsize(gguf_path) / (1024**3)
        print(f"Downloading: {os.path.basename(gguf_path)} ({size_gb:.1f} GB)")
        files.download(gguf_path)
    else:
        print("\u2717 No GGUF file found. Run export (8.1) first.")
else:
    print("Download not available outside Colab.")
    print("GGUF file is at: checkpoints/gpt-oss-20b-rust-export-v2/")

---
## Training Complete!

Your GPT-OSS 20B Rust coding agent (v2) is trained and ready to use.

**v2 Optimisations Applied:**
- Split LoRA: 7-12x faster MoE training
- FP8 RL: 1.6x throughput on H100 (60% less VRAM)
- Auto packing: 3x faster SFT
- Chunked GRPO: 65K context on H100 (up from 32K)
- QAT export: 97-100% MXFP4 quality (if enabled)

**Outputs:**
- Checkpoints: `checkpoints/core_agent_{ipo,grpo}/final`
- Evaluation: `evals/rust_agent/metrics.json`
- Exported model: `checkpoints/gpt-oss-20b-rust-export-v2/`
- All backed up to Google Drive: `gpt-oss-20b-rust-agent-v2/`

**Next steps:**
- Review evaluation metrics in Step 6.3
- Test interactively in Step 7
- Deploy the GGUF file with llama.cpp or Ollama
- For MXFP4 deployment, enable QAT export in Step 8.2

**References:**
- [V2 Optimization Plan](../docs/V2_OPTIMIZATION_PLAN.md)
- [Unsloth Split LoRA](https://unsloth.ai/docs/new/faster-moe)
- [Unsloth FP8 RL](https://unsloth.ai/docs/get-started/reinforcement-learning-rl-guide/fp8-reinforcement-learning)
- [NVIDIA QAT for GPT-OSS](https://developer.nvidia.com/blog/fine-tuning-gpt-oss-for-accuracy-and-performance-with-quantization-aware-training/)