# Train GPT-OSS 20B → Rust Coding Agent (v3 — Strandset)

Simplified pipeline using [Strandset-Rust-v1](https://huggingface.co/datasets/Fortytwo-Network/Strandset-Rust-v1) (191K verified Rust examples, Apache 2.0) as the sole data source.

**Key differences from v2:**
- **No Rust toolchain** — no `rustup`, `cargo-mutants`, or compilation needed
- **No mutation/trajectory generation** — data comes entirely from Strandset
- **No GRPO** — no execution-based rewards without cargo
- **IPO from synthetic preferences** — bug_detection pairs (fixed=chosen, buggy=rejected)

**3-Phase Pipeline:**
1. **Lang Adapter** — Rust domain specialisation via QLoRA (script 13 + 19)
2. **Core Agent SFT** — Debug/review training from Strandset (script 14)
3. **IPO Preference** — Synthetic preference pairs from bug_detection (script 17)

**Requirements:**
- **GPU**: A100 40GB+ (H100 80GB recommended for FP8)
- **Storage**: Google Drive for persistent checkpoints
- **No Rust toolchain required**

---
## Step 0: Environment Setup

### 0.1 Mount Google Drive & Clone Repository

**PyCharm / headless users:** If `drive.mount()` doesn't work, set `use_service_account = True`
and provide your service-account JSON key in Step 0.3.

In [1]:
import os
import sys

IN_COLAB = 'google.colab' in sys.modules

use_service_account = True

DRIVE_MOUNTED = False

if IN_COLAB and not use_service_account:
    try:
        from google.colab import drive
        drive.mount('/content/drive')
        DRIVE_MOUNTED = True
        print("Google Drive mounted")
    except Exception as e:
        print(f"drive.mount() failed: {e}")
        print("Falling back to local-only mode.")
        print("Tip: set use_service_account=True and provide a JSON key in Step 0.3.")
elif IN_COLAB and use_service_account:
    print("Service-account mode selected \u2014 skipping drive.mount()")
    print("Configure credentials in Step 0.3.")
else:
    print("Running locally")

REPO_URL = "https://github.com/rmarnold/llm-training-pipeline.git"
BRANCH = "main"

REPO_DIR = "/content/llm-training-pipeline"

if IN_COLAB:
    if os.path.exists(REPO_DIR):
        %cd {REPO_DIR}
        !git pull origin {BRANCH}
    else:
        !git clone -b {BRANCH} {REPO_URL} {REPO_DIR}
        %cd {REPO_DIR}

    PROJECT_ROOT = REPO_DIR
else:
    PROJECT_ROOT = os.getcwd()

os.chdir(PROJECT_ROOT)
print(f"\nProject root: {PROJECT_ROOT}")

Service-account mode selected — skipping drive.mount()
Configure credentials in Step 0.3.
Cloning into '/content/llm-training-pipeline'...
remote: Enumerating objects: 1112, done.[K
remote: Counting objects: 100% (283/283), done.[K
remote: Compressing objects: 100% (197/197), done.[K
remote: Total 1112 (delta 167), reused 184 (delta 85), pack-reused 829 (from 1)[K
Receiving objects: 100% (1112/1112), 1.82 MiB | 31.50 MiB/s, done.
Resolving deltas: 100% (708/708), done.
/content/llm-training-pipeline

Project root: /content/llm-training-pipeline


### 0.2 Install Dependencies

Installs pipeline deps and latest Unsloth. **No Rust toolchain needed** — all training data
comes from Strandset.

**Note:** flash-attn is intentionally NOT installed. FA3 is incompatible with GPT-OSS
backward passes. Unsloth's Flex Attention replaces it automatically.

In [2]:
if IN_COLAB:
    print("Installing Python dependencies...")
    print("=" * 60)
    !pip install -q -e ".[gpt_oss,colab]"

    # Fix pyarrow binary incompatibility with datasets 4.x on Colab
    !pip install -q --force-reinstall pyarrow

    # Force latest Unsloth with Split LoRA + FP8 RL
    print("\nInstalling latest Unsloth (Split LoRA + Flex Attention)...")
    !pip install -q --upgrade --force-reinstall --no-cache-dir --no-deps unsloth unsloth_zoo
    !pip install -q "unsloth[colab-new]"

    # vLLM for FP8 inference (H100 only, optional)
    !pip install -q vllm>=0.12.0 2>/dev/null || true

    # Verification
    from importlib.metadata import version, PackageNotFoundError
    print("\n" + "=" * 60)
    print("Dependency Verification:")
    print("=" * 60)

    for pkg in ["unsloth", "trl", "peft", "datasets", "tiktoken", "vllm"]:
        try:
            ver = version(pkg)
            print(f"\u2713 {pkg}: {ver}")
        except PackageNotFoundError:
            if pkg == "vllm":
                print(f"\u2014 {pkg}: not installed (optional, H100 FP8 only)")
            else:
                print(f"\u2717 {pkg}: not installed")

    print("\nNote: No Rust toolchain needed for v3 (Strandset-only pipeline)")
    print("=" * 60)
else:
    print("Running locally \u2014 ensure deps are installed:")
    print("  pip install -e '.[gpt_oss]'")
    print("  pip install --upgrade unsloth unsloth_zoo")

Installing Python dependencies...
  Installing build dependencies ... [?25l[?25hdone
  Checking if build backend supports build_editable ... [?25l[?25hdone
  Getting requirements to build editable ... [?25l[?25hdone
  Preparing editable metadata (pyproject.toml) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m69.7/69.7 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m4.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m96.5/96.5 kB[0m [31m8.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m17.3/17.3 MB[0m [31m17.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m84.1/84.1 kB[0m [31m12.6 MB/s[0m eta [36m0:

### 0.3 Configure Pipeline

**Training Scope** (`training_scope`):
- `full` \u2014 All 3 phases (Lang Adapter + Core Agent + IPO)
- `quick_test` \u2014 Short runs (100 steps each) to verify setup
- `lang_adapter_only` \u2014 Only train lang_rust adapter + merge

**Service Account Setup** (for Drive backup):
1. Set `use_service_account = True` in cell 0.1
2. Run cell 0.3 \u2014 it will try Colab Secrets, then file, then paste prompt
3. Set `DRIVE_FOLDER_ID` in Colab Secrets, or set `drive_folder_id` below

In [None]:
import json
training_scope = "quick_test"  # "full", "quick_test", "lang_adapter_only"

gpu_tier = "h100_80gb"  # "a100_40gb", "a100_80gb", "h100_80gb"

max_steps_override = 0  # Set >0 to cap all stages (0 = use defaults)

include_ipo = True  # False to skip IPO preference training

enable_qat_export = False  # True for MXFP4 QAT export

# ============================================================
# SERVICE ACCOUNT CREDENTIALS
# ============================================================
# Priority order:
#   1. Existing VALID file at /content/service_account.json (instant, no timeout)
#   2. Colab Secrets (only if no valid file — may timeout outside browser UI)
#   3. Paste JSON key via input() prompt
#   4. Fall back to local mode (no Drive backup)

drive_folder_id = "18UpFpUhiNrs2Etha0uFjSGWmj1Ee1SnX"  # Google Drive folder ID

_SA_VM_PATH = "/content/service_account.json"
_FOLDER_ID_PATH = "/content/drive_folder_id.txt"
service_account_key = ""

def _is_json(s):
    """Check if string looks like JSON (not a folder ID)."""
    return s.strip().startswith("{")

def _validate_sa_file(path):
    """Check that a service account JSON file exists and contains valid JSON."""
    try:
        with open(path) as f:
            data = json.load(f)
        return isinstance(data, dict) and "type" in data
    except (json.JSONDecodeError, OSError, ValueError):
        return False

if use_service_account and IN_COLAB:
    # 1. Check for existing VALID file first (avoids Colab Secrets timeout on re-runs)
    if os.path.exists(_SA_VM_PATH) and _validate_sa_file(_SA_VM_PATH):
        service_account_key = _SA_VM_PATH
        print(f"Using existing key file: {_SA_VM_PATH}")
    else:
        if os.path.exists(_SA_VM_PATH):
            os.remove(_SA_VM_PATH)
            print(f"Removed invalid/empty key file: {_SA_VM_PATH}")

        # 2. Try Colab Secrets (may timeout if not running in browser UI)
        try:
            from google.colab import userdata
            _key_json = userdata.get("SERVICE_ACCOUNT_KEY")
            if _key_json:
                # Validate before saving
                json.loads(_key_json)
                with open(_SA_VM_PATH, "w") as _f:
                    _f.write(_key_json)
                service_account_key = _SA_VM_PATH
                print("Service account key loaded from Colab Secrets.")
        except json.JSONDecodeError:
            print("  Colab Secret SERVICE_ACCOUNT_KEY contains invalid JSON.")
        except Exception as _e:
            print(f"  Colab Secrets lookup failed: {type(_e).__name__}: {_e}")

        # 3. Fall back to paste prompt
        if not service_account_key:
            try:
                print("No service account key found.")
                _key_text = input("Paste service account JSON (entire content in one go): ")
                _key_text = _key_text.strip()
                if _key_text:
                    json.loads(_key_text)
                    with open(_SA_VM_PATH, "w") as _f:
                        _f.write(_key_text)
                    service_account_key = _SA_VM_PATH
                    print(f"Saved to {_SA_VM_PATH}")
            except json.JSONDecodeError:
                print("  Invalid JSON — key not saved.")
            except EOFError:
                pass

    # Resolve drive_folder_id: saved file > Colab Secrets > input prompt
    if not drive_folder_id and os.path.exists(_FOLDER_ID_PATH):
        with open(_FOLDER_ID_PATH) as _f:
            drive_folder_id = _f.read().strip()
        if drive_folder_id:
            print(f"Using saved folder ID from {_FOLDER_ID_PATH}")

    if not drive_folder_id:
        try:
            from google.colab import userdata
            _fid = userdata.get("DRIVE_FOLDER_ID") or ""
            if _fid and not _is_json(_fid):
                drive_folder_id = _fid
                print(f"Drive folder ID loaded from Colab Secrets.")
            elif _fid:
                print("WARNING: DRIVE_FOLDER_ID Colab Secret contains JSON, not a folder ID. Ignoring.")
        except Exception:
            pass

    if not drive_folder_id and service_account_key:
        _fid = input("Enter Google Drive folder ID (from URL): ").strip()
        if _fid and not _is_json(_fid):
            drive_folder_id = _fid
            # Persist so we don't have to re-enter on re-runs
            with open(_FOLDER_ID_PATH, "w") as _f:
                _f.write(drive_folder_id)
            print(f"Saved folder ID to {_FOLDER_ID_PATH}")
        elif _fid:
            print("ERROR: That looks like JSON, not a folder ID.")
            print("The folder ID is the part after /folders/ in the Google Drive URL.")

    if not service_account_key:
        print("No service account key — Drive backup disabled.")

elif use_service_account:
    for _path in [_SA_VM_PATH, "service_account.json"]:
        if os.path.exists(_path):
            service_account_key = _path
            print(f"Using key file: {_path}")
            break
    if not service_account_key:
        print("Running locally — set service_account_key to your JSON key path.")

# ============================================================
# DRIVE MODE
# ============================================================
from scripts.pipeline_lib.drive_utils import DriveHelper

DRIVE_BASE = "/content/drive/MyDrive/gpt-oss-20b-rust-agent-v3"

if DRIVE_MOUNTED:
    DRIVE_MODE = "mounted"
elif use_service_account and service_account_key and drive_folder_id:
    DRIVE_MODE = "service_account"
else:
    DRIVE_MODE = "local"

drive_helper = DriveHelper(
    mode=DRIVE_MODE,
    drive_base=DRIVE_BASE,
    credentials_path=service_account_key or None,
    folder_id=drive_folder_id or None,
)

# ============================================================
# GPU TIER CONFIGS
# ============================================================

GPU_CONFIGS = {
    "a100_40gb": {
        "moe_backend": "unsloth_triton",
        "load_mode": "4bit",
        "fast_inference": False,
        "lang_rust": {"batch": 1, "grad_accum": 8, "seq_len": 8192, "max_steps": 3000},
        "core_agent": {"batch": 1, "grad_accum": 4, "seq_len": 12288, "max_steps": 2000},
        "ipo": {"batch": 1, "grad_accum": 8, "seq_len": 12288, "max_steps": 1000},
    },
    "a100_80gb": {
        "moe_backend": "unsloth_triton",
        "load_mode": "4bit",
        "fast_inference": False,
        "lang_rust": {"batch": 1, "grad_accum": 8, "seq_len": 8192, "max_steps": 5000},
        "core_agent": {"batch": 1, "grad_accum": 4, "seq_len": 16384, "max_steps": 3000},
        "ipo": {"batch": 1, "grad_accum": 16, "seq_len": 16384, "max_steps": 2000},
    },
    "h100_80gb": {
        "moe_backend": "grouped_mm",
        "load_mode": "fp8",
        "fast_inference": True,
        "lang_rust": {"batch": 2, "grad_accum": 4, "seq_len": 8192, "max_steps": 5000},
        "core_agent": {"batch": 1, "grad_accum": 4, "seq_len": 16384, "max_steps": 3000},
        "ipo": {"batch": 1, "grad_accum": 16, "seq_len": 16384, "max_steps": 2000},
    },
}

if training_scope == "quick_test":
    max_steps_override = 100

gpu_cfg = GPU_CONFIGS[gpu_tier]

CONFIG = {
    "training_scope": training_scope,
    "gpu_tier": gpu_tier,
    "include_ipo": include_ipo,
    "enable_qat_export": enable_qat_export,
    "moe_backend": gpu_cfg["moe_backend"],
    "load_mode": gpu_cfg["load_mode"],
    "fast_inference": gpu_cfg["fast_inference"],
    # Lang adapter
    "lang_rust_batch": gpu_cfg["lang_rust"]["batch"],
    "lang_rust_grad_accum": gpu_cfg["lang_rust"]["grad_accum"],
    "lang_rust_seq_len": gpu_cfg["lang_rust"]["seq_len"],
    "lang_rust_max_steps": max_steps_override or gpu_cfg["lang_rust"]["max_steps"],
    # Core agent
    "core_agent_batch": gpu_cfg["core_agent"]["batch"],
    "core_agent_grad_accum": gpu_cfg["core_agent"]["grad_accum"],
    "core_agent_seq_len": gpu_cfg["core_agent"]["seq_len"],
    "core_agent_max_steps": max_steps_override or gpu_cfg["core_agent"]["max_steps"],
    # IPO
    "ipo_batch": gpu_cfg["ipo"]["batch"],
    "ipo_grad_accum": gpu_cfg["ipo"]["grad_accum"],
    "ipo_seq_len": gpu_cfg["ipo"]["seq_len"],
    "ipo_max_steps": max_steps_override or gpu_cfg["ipo"]["max_steps"],
    # Eval
    "eval_num_samples": 10 if training_scope == "quick_test" else 50,
}

print("=" * 60)
print("PIPELINE CONFIGURATION (v3 \u2014 Strandset)")
print("=" * 60)
print(f"\nScope: {training_scope.upper()}")
print(f"GPU tier: {gpu_tier}")
print(f"MoE backend: {CONFIG['moe_backend']}")
print(f"Load mode: {CONFIG['load_mode']}")
print(f"Fast inference (vLLM): {CONFIG['fast_inference']}")
print(f"Include IPO: {include_ipo}")
print(f"QAT export: {enable_qat_export}")
print(f"Drive mode: {DRIVE_MODE}")
if max_steps_override:
    print(f"Max steps override: {max_steps_override}")
print(f"\nLang Adapter:  batch={CONFIG['lang_rust_batch']} x grad_accum={CONFIG['lang_rust_grad_accum']}, seq={CONFIG['lang_rust_seq_len']}, steps={CONFIG['lang_rust_max_steps']}")
print(f"Core Agent:    batch={CONFIG['core_agent_batch']} x grad_accum={CONFIG['core_agent_grad_accum']}, seq={CONFIG['core_agent_seq_len']}, steps={CONFIG['core_agent_max_steps']}")
if include_ipo:
    print(f"IPO:           batch={CONFIG['ipo_batch']} x grad_accum={CONFIG['ipo_grad_accum']}, seq={CONFIG['ipo_seq_len']}, steps={CONFIG['ipo_max_steps']}")
print("=" * 60)

### 0.4 Set Up Persistent Storage

In [None]:
DRIVE_SUBDIRS = [
    "checkpoints/lang_rust",
    "checkpoints/core_agent",
    "checkpoints/core_agent_ipo",
    "checkpoints/gpt-oss-20b-rust-merged",
    "data/rust/strandset",
    "logs",
]

if DRIVE_MODE == "mounted":
    print(f"Setting up storage at: {DRIVE_BASE}")
    for subdir in DRIVE_SUBDIRS:
        os.makedirs(os.path.join(DRIVE_BASE, subdir), exist_ok=True)

    for dir_name in ["checkpoints", "data", "logs"]:
        local_path = os.path.join(PROJECT_ROOT, dir_name)
        drive_path = os.path.join(DRIVE_BASE, dir_name)

        if os.path.exists(local_path) and not os.path.islink(local_path):
            !cp -r {local_path}/* {drive_path}/ 2>/dev/null || true
            !rm -rf {local_path}
        elif os.path.islink(local_path):
            os.unlink(local_path)

        os.symlink(drive_path, local_path)
        print(f"  {dir_name} -> Drive (mounted)")

elif DRIVE_MODE == "service_account":
    print("Setting up local storage + Drive API restore...")
    for subdir in DRIVE_SUBDIRS:
        os.makedirs(os.path.join(PROJECT_ROOT, subdir), exist_ok=True)
        drive_helper.ensure_dir(subdir)

    for dir_name in ["checkpoints", "data", "logs"]:
        local_path = os.path.join(PROJECT_ROOT, dir_name)
        if os.path.islink(local_path):
            os.unlink(local_path)
            os.makedirs(local_path, exist_ok=True)
        print(f"  {dir_name} -> local (backed up via Drive API)")

    print("\nRestoring existing data from Drive...")
    for subdir in DRIVE_SUBDIRS:
        local_target = os.path.join(PROJECT_ROOT, subdir)
        drive_helper.restore(subdir, local_target)
    print("Restore complete.")

else:
    for d in ["checkpoints", "data/rust", "logs"]:
        os.makedirs(d, exist_ok=True)
    print("Local directories created (no Drive backup).")

print("\nStorage ready!")

### 0.5 Check GPU & Configure MoE Backend

In [None]:
import torch

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / (1024**3)
    capability = torch.cuda.get_device_capability()
    is_h100 = "H100" in gpu_name or "H200" in gpu_name or "B200" in gpu_name

    CONFIG["use_fp8"] = capability[0] >= 9 and is_h100

    if is_h100:
        detected_tier = "h100_80gb"
    elif gpu_memory >= 70:
        detected_tier = "a100_80gb"
    else:
        detected_tier = "a100_40gb"

    if detected_tier != CONFIG["gpu_tier"]:
        print(f"NOTE: Auto-detected {detected_tier}, overriding configured {CONFIG['gpu_tier']}")
        CONFIG["gpu_tier"] = detected_tier
        gpu_cfg = GPU_CONFIGS[detected_tier]
        CONFIG["moe_backend"] = gpu_cfg["moe_backend"]
        CONFIG["load_mode"] = gpu_cfg["load_mode"]
        CONFIG["fast_inference"] = gpu_cfg["fast_inference"]

    os.environ["UNSLOTH_MOE_BACKEND"] = CONFIG["moe_backend"]

    print("=" * 60)
    print(f"GPU: {gpu_name} ({gpu_memory:.0f} GB)")
    print(f"Compute capability: {capability[0]}.{capability[1]}")
    print(f"Tier: {CONFIG['gpu_tier']}")
    print(f"\nSplit LoRA backend: {CONFIG['moe_backend']}")
    print(f"Load mode: {CONFIG['load_mode']}")
    print(f"FP8 available: {CONFIG['use_fp8']}")
    print(f"Fast inference (vLLM): {CONFIG['fast_inference']}")

    if gpu_memory < 40:
        print("\nWARNING: <40 GB VRAM. Long-context training (16K+) may OOM.")
    print("=" * 60)
else:
    print("No GPU detected!")
    CONFIG["use_fp8"] = False
    os.environ["UNSLOTH_MOE_BACKEND"] = "native_torch"

---
## Step 1: Data Preparation

Downloads Strandset-Rust-v1 from HuggingFace, parses the 15 task categories,
and formats everything in Harmony for each training stage.

### 1.1 Download & Format Strandset

In [None]:
max_samples = 500 if CONFIG["training_scope"] == "quick_test" else 0  # 0 = all

print("Downloading & formatting Strandset-Rust-v1...")
print("=" * 60)

cmd = "python scripts/20_prepare_strandset.py"
if max_samples:
    cmd += f" --max_samples {max_samples}"
if not CONFIG["include_ipo"]:
    cmd += " --no-preferences"

!{cmd}

drive_helper.backup("data/rust/strandset", "data/rust/strandset")
if DRIVE_MODE != "local":
    print("\nBacked up Strandset data to Drive.")

### 1.2 Verify Data

In [None]:
data_checks = [
    ("Strandset lang_rust", "data/rust/strandset/lang_rust/train"),
    ("Strandset core_agent", "data/rust/strandset/core_agent/train"),
    ("Strandset IPO", "data/rust/strandset/ipo/train"),
    ("Strandset eval", "data/rust/strandset/eval/test"),
    ("Stats", "data/rust/strandset/stats.json"),
]

print("Data Verification:")
print("=" * 60)
for name, path in data_checks:
    exists = os.path.exists(path)
    if exists and os.path.isdir(path):
        items = os.listdir(path)
        print(f"  \u2713 {name}: {path} ({len(items)} items)")
    elif exists:
        size_kb = os.path.getsize(path) / 1024
        print(f"  \u2713 {name}: {path} ({size_kb:.1f} KB)")
    else:
        needed = True
        if not CONFIG["include_ipo"] and "IPO" in name:
            needed = False
        if CONFIG["training_scope"] == "lang_adapter_only" and name in ("Strandset core_agent", "Strandset IPO"):
            needed = False
        sym = "\u2717" if needed else "\u2014"
        label = "MISSING" if needed else "not needed"
        print(f"  {sym} {name}: {label}")

# Show stats if available
stats_path = "data/rust/strandset/stats.json"
if os.path.exists(stats_path):
    import json
    with open(stats_path) as f:
        stats = json.load(f)
    print(f"\n  Total processed: {stats.get('total_processed', '?'):,}")
    print(f"  Lang adapter: {stats.get('lang_rust', '?'):,}")
    print(f"  Core agent: {stats.get('core_agent_debug', 0) + stats.get('core_agent_review', 0):,}")
    print(f"  IPO pairs: {stats.get('ipo', '?'):,}")
print("=" * 60)

---
## Step 2: Lang Adapter Training

Train a QLoRA adapter (rank 64) to specialise GPT-OSS 20B on Rust syntax, stdlib, and idioms.
Uses Strandset's code_generation, code_completion, docstring, comment, and naming examples.
Then merge the adapter into the base weights for downstream training.

**Split LoRA** backend auto-enabled for 7-12x faster MoE training.

### 2.1 Train lang_rust Adapter

In [None]:
batch = CONFIG["lang_rust_batch"]
grad_accum = CONFIG["lang_rust_grad_accum"]
max_steps = CONFIG["lang_rust_max_steps"]
seq_len = CONFIG["lang_rust_seq_len"]

cmd = f"python scripts/13_train_lang_adapter.py"
cmd += f" --train_data_path data/rust/strandset/lang_rust/train"
cmd += f" --per_device_train_batch_size {batch}"
cmd += f" --gradient_accumulation_steps {grad_accum}"
cmd += f" --max_steps {max_steps}"

print(f"Training lang_rust adapter...")
print(f"  Data: data/rust/strandset/lang_rust/train")
print(f"  Batch: {batch} x {grad_accum} = {batch * grad_accum}")
print(f"  Max steps: {max_steps}")
print(f"  Seq length: {seq_len} (from config)")
print(f"  Split LoRA backend: {CONFIG['moe_backend']}")
print("=" * 60)

!{cmd}

drive_helper.backup("checkpoints/lang_rust", "checkpoints/lang_rust")
if DRIVE_MODE != "local":
    print("\nCheckpoint backed up to Drive.")

### 2.2 Merge lang_rust into Base

In [None]:
print("Merging lang_rust adapter into base model...")
print("=" * 60)

!python scripts/19_merge_adapter.py \
    --adapter_path checkpoints/lang_rust/final \
    --output_dir checkpoints/gpt-oss-20b-rust-merged \
    --export_formats hf

drive_helper.backup("checkpoints/gpt-oss-20b-rust-merged", "checkpoints/gpt-oss-20b-rust-merged")
if DRIVE_MODE != "local":
    print("\nMerged model backed up to Drive.")

### 2.3 Verify Merge

In [None]:
merged_path = "checkpoints/gpt-oss-20b-rust-merged"
adapter_path = "checkpoints/lang_rust/final"

print("Merge Verification:")
print("=" * 60)

if os.path.exists(merged_path):
    files = os.listdir(merged_path)
    safetensors = [f for f in files if f.endswith(".safetensors")]
    print(f"  \u2713 Merged model: {merged_path}")
    print(f"    {len(safetensors)} safetensors shard(s), {len(files)} total files")
else:
    print(f"  \u2717 Merged model not found at {merged_path}")

if os.path.exists(adapter_path):
    adapter_files = os.listdir(adapter_path)
    print(f"  \u2713 Adapter: {adapter_path} ({len(adapter_files)} files)")
else:
    print(f"  \u2717 Adapter not found at {adapter_path}")

if CONFIG["training_scope"] == "lang_adapter_only":
    print("\n\u2713 lang_adapter_only scope complete. Stopping here.")

print("=" * 60)

---
## Step 3: Core Agent SFT

Train a higher-rank LoRA adapter (rank 128) on Strandset's debug/review examples.
Uses the merged lang_rust model as the base.

**Split LoRA** + **Auto packing** (3x faster, zero-config).

### 3.1 Train core_agent Adapter

In [None]:
if CONFIG["training_scope"] == "lang_adapter_only":
    print("Skipping \u2014 scope is lang_adapter_only")
else:
    batch = CONFIG["core_agent_batch"]
    grad_accum = CONFIG["core_agent_grad_accum"]
    max_steps = CONFIG["core_agent_max_steps"]
    seq_len = CONFIG["core_agent_seq_len"]

    cmd = f"python scripts/14_train_core_agent.py"
    cmd += f" --train_data_path data/rust/strandset/core_agent/train"
    cmd += f" --per_device_train_batch_size {batch}"
    cmd += f" --gradient_accumulation_steps {grad_accum}"
    cmd += f" --max_steps {max_steps}"

    print(f"Training core_agent adapter...")
    print(f"  Data: data/rust/strandset/core_agent/train")
    print(f"  Batch: {batch} x {grad_accum} = {batch * grad_accum}")
    print(f"  Max steps: {max_steps}")
    print(f"  Seq length: {seq_len} (from config)")
    print(f"  LoRA rank: 128")
    print(f"  Split LoRA backend: {CONFIG['moe_backend']}")
    print(f"  Auto packing: enabled (uncontaminated)")
    print("=" * 60)

    !{cmd}

    drive_helper.backup("checkpoints/core_agent", "checkpoints/core_agent")
    if DRIVE_MODE != "local":
        print("\nCheckpoint backed up to Drive.")

### 3.2 Verify core_agent

In [None]:
if CONFIG["training_scope"] == "lang_adapter_only":
    print("Skipping \u2014 scope is lang_adapter_only")
else:
    ckpt_path = "checkpoints/core_agent/final"

    print("Core Agent Verification:")
    print("=" * 60)

    if os.path.exists(ckpt_path):
        files = os.listdir(ckpt_path)
        print(f"  \u2713 Checkpoint: {ckpt_path} ({len(files)} files)")

        adapter_config = os.path.join(ckpt_path, "adapter_config.json")
        if os.path.exists(adapter_config):
            import json
            with open(adapter_config) as f:
                cfg = json.load(f)
            print(f"    LoRA rank: {cfg.get('r', '?')}")
            print(f"    Alpha: {cfg.get('lora_alpha', '?')}")
            print(f"    Target modules: {cfg.get('target_modules', '?')}")
    else:
        print(f"  \u2717 Checkpoint not found at {ckpt_path}")

    print("=" * 60)

---
## Step 4: IPO Preference Training (Optional)

Train with Identity Preference Optimisation on synthetic preference pairs
from Strandset's bug_detection category (fixed=chosen, buggy=rejected).

Very low learning rate (5e-7), 1 epoch only to avoid collapse.

Set `include_ipo=False` in Step 0.3 to skip.

### 4.1 Train with IPO

In [None]:
if CONFIG["training_scope"] == "lang_adapter_only":
    print("Skipping \u2014 scope is lang_adapter_only")
elif not CONFIG["include_ipo"]:
    print("Skipping \u2014 IPO disabled (include_ipo=False)")
else:
    batch = CONFIG["ipo_batch"]
    grad_accum = CONFIG["ipo_grad_accum"]
    max_steps = CONFIG["ipo_max_steps"]

    ipo_checkpoint = "checkpoints/core_agent/final"

    cmd = f"python scripts/17_ipo_preference.py"
    cmd += f" --checkpoint {ipo_checkpoint}"
    cmd += f" --train_data_path data/rust/strandset/ipo/train"
    cmd += f" --per_device_train_batch_size {batch}"
    cmd += f" --gradient_accumulation_steps {grad_accum}"
    cmd += f" --max_steps {max_steps}"

    print(f"Training with IPO (synthetic preferences)...")
    print(f"  Checkpoint: {ipo_checkpoint}")
    print(f"  Data: data/rust/strandset/ipo/train")
    print(f"  Batch: {batch} x {grad_accum} = {batch * grad_accum}")
    print(f"  Max steps: {max_steps}")
    print(f"  Loss: IPO (beta=0.1)")
    print(f"  Load mode: {CONFIG['load_mode']}")
    print(f"  Split LoRA backend: {CONFIG['moe_backend']}")
    print("=" * 60)

    !{cmd}

    drive_helper.backup("checkpoints/core_agent_ipo", "checkpoints/core_agent_ipo")
    if DRIVE_MODE != "local":
        print("\nCheckpoint backed up to Drive.")

### 4.2 Verify IPO

In [None]:
if CONFIG["training_scope"] == "lang_adapter_only":
    print("Skipping \u2014 scope is lang_adapter_only")
elif not CONFIG["include_ipo"]:
    print("Skipping \u2014 IPO disabled")
else:
    ckpt_path = "checkpoints/core_agent_ipo/final"

    print("IPO Verification:")
    print("=" * 60)

    if os.path.exists(ckpt_path):
        files = os.listdir(ckpt_path)
        print(f"  \u2713 IPO checkpoint: {ckpt_path} ({len(files)} files)")
    else:
        print(f"  \u2717 IPO checkpoint not found at {ckpt_path}")

    tb_dir = "checkpoints/core_agent_ipo"
    tb_files = []
    if os.path.exists(tb_dir):
        for root, dirs, fnames in os.walk(tb_dir):
            for fn in fnames:
                if fn.startswith("events.out.tfevents"):
                    tb_files.append(os.path.join(root, fn))
    if tb_files:
        print(f"  \u2713 TensorBoard logs found ({len(tb_files)} event files)")
        print(f"    Monitor KL divergence: warn >0.3, abort >0.5")
    else:
        print(f"  \u2014 No TensorBoard logs found")

    print("=" * 60)

---
## Step 5: Test Model

Load the trained model and generate Rust code interactively.

### 5.1 Load Model

In [None]:
from unsloth import FastLanguageModel
import torch

CHECKPOINT_PRIORITY = [
    "checkpoints/core_agent_ipo/final",
    "checkpoints/core_agent/final",
    "checkpoints/gpt-oss-20b-rust-merged",
]

MODEL_PATH = None
for path in CHECKPOINT_PRIORITY:
    if os.path.exists(path):
        MODEL_PATH = path
        break

if MODEL_PATH is None:
    print("\u2717 No checkpoint found. Train the model first.")
else:
    print(f"Loading model from: {MODEL_PATH}")

    load_kwargs = {
        "max_seq_length": 4096,
        "dtype": torch.bfloat16,
    }
    if CONFIG.get("load_mode") == "fp8" and CONFIG.get("use_fp8"):
        load_kwargs["load_in_fp8"] = True
        print("  Mode: FP8 (H100)")
    else:
        load_kwargs["load_in_4bit"] = True
        print("  Mode: 4-bit QLoRA")

    if CONFIG.get("fast_inference"):
        load_kwargs["fast_inference"] = True
        print("  Inference: vLLM backend")

    print("=" * 60)

    model, tokenizer = FastLanguageModel.from_pretrained(MODEL_PATH, **load_kwargs)
    FastLanguageModel.for_inference(model)

    print("\u2713 Model loaded!")

### 5.2 Generate Rust Code

In [None]:
import sys
sys.path.insert(0, "scripts")
from dataset_formatters.harmony import encode_harmony_messages

TEST_PROMPTS = [
    "Write a Rust function `fn merge_sorted(a: &[i32], b: &[i32]) -> Vec<i32>` that merges two sorted slices into a single sorted vector.",
    "This Rust code fails the borrow checker. Fix it:\n```rust\nfn main() {\n    let mut v = vec![1, 2, 3];\n    let first = &v[0];\n    v.push(4);\n    println!(\"{}\", first);\n}\n```",
    "Write an async Rust function using tokio that fetches a URL with reqwest, retries up to 3 times on failure, and returns the response body as a String.",
]

def generate_rust(prompt, max_tokens=1024):
    messages = [{"role": "user", "content": prompt}]
    formatted = encode_harmony_messages(
        messages,
        developer_instructions="You are a Rust programming expert. Write correct, idiomatic code.",
    )
    inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            temperature=0.3,
            do_sample=True,
            top_p=0.9,
        )
    return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

for i, prompt in enumerate(TEST_PROMPTS, 1):
    print(f"\n{'=' * 60}")
    print(f"Test {i}: {prompt[:80]}...")
    print("=" * 60)
    response = generate_rust(prompt)
    print(response)
    print()

### 5.3 Custom Prompt

In [None]:
CUSTOM_PROMPT = "Write a Rust function that reads a CSV file and returns the sum of a specified column."

print(f"Prompt: {CUSTOM_PROMPT}")
print("=" * 60)
print(generate_rust(CUSTOM_PROMPT))

---
## Step 6: Export

Merge the final adapter and export to HuggingFace + GGUF formats.

### 6.1 Export to GGUF

In [None]:
ADAPTER_PRIORITY = [
    "checkpoints/core_agent_ipo/final",
    "checkpoints/core_agent/final",
    "checkpoints/lang_rust/final",
]

adapter_path = None
for path in ADAPTER_PRIORITY:
    if os.path.exists(path):
        adapter_path = path
        break

if adapter_path is None:
    print("\u2717 No adapter checkpoint found.")
else:
    export_dir = "checkpoints/gpt-oss-20b-rust-export-v3"
    print(f"Exporting adapter: {adapter_path}")
    print(f"Output: {export_dir}")
    print("=" * 60)

    !python scripts/19_merge_adapter.py \
        --adapter_path {adapter_path} \
        --output_dir {export_dir} \
        --export_formats hf gguf_q4

    drive_helper.backup(export_dir, "checkpoints/gpt-oss-20b-rust-export-v3")
    if DRIVE_MODE != "local":
        print("\nExport backed up to Drive.")

### 6.2 Download GGUF

In [None]:
if IN_COLAB:
    from google.colab import files
    import glob

    export_dir = "checkpoints/gpt-oss-20b-rust-export-v3"
    gguf_files = glob.glob(os.path.join(export_dir, "*.gguf"))

    if gguf_files:
        gguf_path = gguf_files[0]
        size_gb = os.path.getsize(gguf_path) / (1024**3)
        print(f"Downloading: {os.path.basename(gguf_path)} ({size_gb:.1f} GB)")
        files.download(gguf_path)
    else:
        print("\u2717 No GGUF file found. Run export (6.1) first.")
else:
    print("Download not available outside Colab.")
    print("GGUF file is at: checkpoints/gpt-oss-20b-rust-export-v3/")

---
## Training Complete!

Your GPT-OSS 20B Rust coding agent (v3 \u2014 Strandset) is trained and ready to use.

**Data source:** [Strandset-Rust-v1](https://huggingface.co/datasets/Fortytwo-Network/Strandset-Rust-v1) (191K examples, Apache 2.0)

**Pipeline:**
1. Lang Adapter: Rust domain specialisation from code generation/completion examples
2. Core Agent SFT: Debug and review training from bug_detection/code_review examples
3. IPO: Synthetic preference pairs from bug_detection (if enabled)

**Outputs:**
- Checkpoints: `checkpoints/core_agent_{ipo}/final`
- Exported model: `checkpoints/gpt-oss-20b-rust-export-v3/`
- All backed up to Google Drive: `gpt-oss-20b-rust-agent-v3/`

**Compared to v2:**
- No Rust toolchain required \u2014 runs on any Colab GPU instance
- No cargo-mutants or trajectory generation \u2014 faster setup
- No GRPO RL \u2014 no execution-based rewards
- For better results, consider upgrading to v2 with mutation data + GRPO