# Train GPT-OSS 20B Coding TUI Agent

**Combined pipeline** — Tool calling SFT + agent trajectory training + proxy log extraction + IPO preference + GRPO RL.

**Target failure modes to fix:**
- Tool calling errors (invalid params, non-existent MCP servers)
- No follow-through (analysis loops, never writes code)
- Circular reasoning (repeating the same analysis)
- Context loss (forgetting task state mid-session)

**Pipeline:**
1. Tool calling SFT (rank 64 LoRA)
2. Merge → Agent SFT from proxy log trajectories (rank 128 LoRA)
3. IPO preference optimisation (decisive action > endless analysis)
4. GRPO RL (execution-grounded: compiles / tests pass / no loops)
5. Eval → Export

**Base model:** [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) (20.9B MoE, 3.6B active)

**Data sources:**
- [glaiveai/glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) (113K)
- [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) (60K)
- [NousResearch/hermes-function-calling-v1](https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1)
- [xingyaoww/code-act](https://huggingface.co/datasets/xingyaoww/code-act)
- [bigcode/commitpack](https://huggingface.co/datasets/bigcode/commitpack) (50K subsample)
- [bigcode/editpackft](https://huggingface.co/datasets/bigcode/editpackft) (50K subsample)
- [Anthropic/hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf)
- [m-a-p/CodeFeedback-Filtered-Instruction](https://huggingface.co/datasets/m-a-p/CodeFeedback-Filtered-Instruction)
- MacLean AI proxy logs (real Codex CLI agent sessions — most valuable source)

## Step 0: Environment Setup

### 0.1 Mount Google Drive & Clone Repository

In [None]:
import os

IN_COLAB = "COLAB_GPU" in os.environ or os.path.exists("/content")

DRIVE_BASE = ""
DRIVE_MODE = "local"

if IN_COLAB:
    from google.colab import drive
    drive.mount("/content/drive")
    DRIVE_BASE = "/content/drive/MyDrive/gpt-oss-20b-coding-tui"
    DRIVE_MODE = "mounted"
    os.makedirs(DRIVE_BASE, exist_ok=True)

    if not os.path.exists("llm-training-pipeline"):
        !git clone https://github.com/rmarnold/llm-training-pipeline.git
    os.chdir("llm-training-pipeline")
    !git pull --ff-only
    print(f"Working directory: {os.getcwd()}")
else:
    print("Running locally (not in Colab).")
    print(f"Working directory: {os.getcwd()}")

### 0.2 Install Dependencies

In [None]:
import subprocess, sys, os

IN_COLAB = "COLAB_GPU" in os.environ or os.path.exists("/content")

if IN_COLAB:
    # Core + GPT-OSS deps
    !pip install -q -e ".[gpt_oss]"

    # Unsloth (Colab optimised)
    !pip install -q unsloth

    # vLLM for fast inference in GRPO
    !pip install -q vllm

    # ipywidgets for config UI
    !pip install -q ipywidgets

    # Datasets for HuggingFace downloads
    !pip install -q datasets huggingface_hub

    # Code execution evaluation deps
    !pip install -q pyflakes astunparse

    print("\nDependencies installed.")
else:
    print("Assuming local dependencies are already installed.")
    print("Run: pip install -e '.[gpt_oss]'")

### 0.3 Configure Pipeline

Toggle the form view (click the "..." menu on this cell) to see the interactive configuration panel.
Adjust settings, then **run this cell** to apply them.

In [None]:
#@title ### Pipeline Configuration { display-mode: "form" }

#@markdown ---
#@markdown #### Core Settings

training_scope = "quick_test"  #@param ["full", "quick_test", "tool_calling_only", "skip_to_rl"] {type: "string"}
gpu_tier = "h100_80gb"  #@param ["a100_40gb", "a100_80gb", "h100_80gb"] {type: "string"}
max_steps_override = 0  #@param {type: "integer"}

#@markdown > *Max Steps Override: 0 = use GPU tier defaults. Set > 0 to cap all stages.*

#@markdown ---
#@markdown #### Data Sources

include_proxy_logs = True  #@param {type: "boolean"}
proxy_log_dir = ""  #@param {type: "string"}

#@markdown > *proxy_log_dir: path to MacLean AI proxy log directory (contains per-request JSON files).*

include_tool_calling = True  #@param {type: "boolean"}
include_agent_trajectories = True  #@param {type: "boolean"}

#@markdown ---
#@markdown #### Pipeline Phases

include_preference = True  #@param {type: "boolean"}
include_grpo = True  #@param {type: "boolean"}
skip_data_generation = False  #@param {type: "boolean"}

#@markdown ---
#@markdown #### Export

enable_qat_export = False  #@param {type: "boolean"}

#@markdown ---
#@markdown #### Advanced

use_service_account = False  #@param {type: "boolean"}
drive_folder_id = ""  #@param {type: "string"}

# ======================================================================
# GPU tier presets (auto-selected based on gpu_tier above)
# ======================================================================
import os, sys, json

GPU_CONFIGS = {
    "a100_40gb": {
        "tool_calling_batch": 2, "tool_calling_grad_accum": 16, "tool_calling_max_steps": 3000, "tool_calling_seq_len": 4096,
        "agent_sft_batch": 1, "agent_sft_grad_accum": 8, "agent_sft_max_steps": 2000, "agent_sft_seq_len": 8192,
        "ipo_batch": 1, "ipo_grad_accum": 16, "ipo_max_steps": 1000, "ipo_seq_len": 4096,
        "grpo_batch": 1, "grpo_grad_accum": 8, "grpo_max_steps": 3000, "grpo_seq_len": 16384, "grpo_num_gen": 4,
        "eval_num_samples": 100,
        "load_mode": "4bit", "moe_backend": "triton", "fast_inference": False,
    },
    "a100_80gb": {
        "tool_calling_batch": 4, "tool_calling_grad_accum": 8, "tool_calling_max_steps": 3000, "tool_calling_seq_len": 8192,
        "agent_sft_batch": 2, "agent_sft_grad_accum": 4, "agent_sft_max_steps": 2000, "agent_sft_seq_len": 16384,
        "ipo_batch": 1, "ipo_grad_accum": 16, "ipo_max_steps": 1000, "ipo_seq_len": 8192,
        "grpo_batch": 1, "grpo_grad_accum": 8, "grpo_max_steps": 5000, "grpo_seq_len": 32768, "grpo_num_gen": 4,
        "eval_num_samples": 200,
        "load_mode": "4bit", "moe_backend": "triton", "fast_inference": False,
    },
    "h100_80gb": {
        "tool_calling_batch": 6, "tool_calling_grad_accum": 8, "tool_calling_max_steps": 3000, "tool_calling_seq_len": 8192,
        "agent_sft_batch": 6, "agent_sft_grad_accum": 4, "agent_sft_max_steps": 2000, "agent_sft_seq_len": 16384,
        "ipo_batch": 2, "ipo_grad_accum": 16, "ipo_max_steps": 1000, "ipo_seq_len": 8192,
        "grpo_batch": 2, "grpo_grad_accum": 8, "grpo_max_steps": 5000, "grpo_seq_len": 65536, "grpo_num_gen": 4,
        "eval_num_samples": 200,
        "load_mode": "fp8", "moe_backend": "triton", "fast_inference": True,
    },
}

tier = GPU_CONFIGS[gpu_tier]

# ======================================================================
# Build CONFIG dict from form values
# ======================================================================
CONFIG = {
    "training_scope": training_scope,
    "gpu_tier": gpu_tier,
    **tier,
    # Data sources
    "include_proxy_logs": include_proxy_logs,
    "proxy_log_dir": proxy_log_dir,
    "include_tool_calling": include_tool_calling,
    "include_agent_trajectories": include_agent_trajectories,
    # Pipeline phases
    "include_preference": include_preference,
    "include_grpo": include_grpo,
    "enable_qat_export": enable_qat_export,
    "skip_data_generation": skip_data_generation,
    # Advanced
    "use_service_account": use_service_account,
    "drive_folder_id": drive_folder_id,
}

# Apply max_steps override
if max_steps_override > 0:
    for key in list(CONFIG.keys()):
        if key.endswith("_max_steps"):
            CONFIG[key] = max_steps_override

# Quick test caps
if CONFIG["training_scope"] == "quick_test":
    for key in list(CONFIG.keys()):
        if key.endswith("_max_steps"):
            CONFIG[key] = min(CONFIG[key], 50)
    CONFIG["eval_num_samples"] = 10

# Scope-based overrides
if CONFIG["training_scope"] == "tool_calling_only":
    CONFIG["include_preference"] = False
    CONFIG["include_grpo"] = False
    CONFIG["include_agent_trajectories"] = False
elif CONFIG["training_scope"] == "skip_to_rl":
    CONFIG["include_tool_calling"] = False
    CONFIG["include_agent_trajectories"] = False

# ======================================================================
# Set up DriveHelper
# ======================================================================
sys.path.insert(0, "scripts")
from pipeline_lib.drive_utils import DriveHelper

if "DRIVE_BASE" not in dir():
    DRIVE_BASE = ""
if "DRIVE_MODE" not in dir():
    DRIVE_MODE = "local"

if CONFIG["use_service_account"] and CONFIG["drive_folder_id"]:
    sa_path = "service_account.json"
    try:
        from google.colab import userdata
        sa_json = userdata.get("SERVICE_ACCOUNT_JSON")
        with open(sa_path, "w") as f:
            f.write(sa_json)
    except Exception:
        pass

    if os.path.exists(sa_path) and os.path.getsize(sa_path) > 10:
        try:
            drive_helper = DriveHelper(
                mode="service_account",
                credentials_path=sa_path,
                folder_id=CONFIG["drive_folder_id"],
            )
            DRIVE_MODE = "service_account"
        except Exception as e:
            print(f"Service account failed: {e}")
            drive_helper = DriveHelper(mode="local")
            DRIVE_MODE = "local"
    else:
        drive_helper = DriveHelper(mode="local")
        DRIVE_MODE = "local"
elif DRIVE_BASE:
    drive_helper = DriveHelper(mode="mounted", drive_base=DRIVE_BASE)
    DRIVE_MODE = "mounted"
else:
    drive_helper = DriveHelper(mode="local")
    DRIVE_MODE = "local"

# Save for persistence across restarts
os.makedirs("data", exist_ok=True)
with open("data/config_coding_tui.json", "w") as f:
    json.dump(CONFIG, f, indent=2)

# ======================================================================
# Print summary
# ======================================================================
print("=" * 58)
print("  PIPELINE CONFIGURATION (Coding TUI Agent)")
print("=" * 58)
print(f"  Scope:           {CONFIG['training_scope'].upper()}")
print(f"  GPU tier:        {CONFIG['gpu_tier']}")
print(f"  MoE backend:     {CONFIG['moe_backend']}")
print(f"  Load mode:       {CONFIG['load_mode']}")
print(f"  Drive mode:      {DRIVE_MODE}")
print()
print(f"  Proxy logs:      {CONFIG['include_proxy_logs']}")
if CONFIG["include_proxy_logs"] and CONFIG["proxy_log_dir"]:
    print(f"    Log dir:       {CONFIG['proxy_log_dir']}")
print(f"  Tool calling:    {CONFIG['include_tool_calling']}")
print(f"  Agent traj:      {CONFIG['include_agent_trajectories']}")
print(f"  IPO preference:  {CONFIG['include_preference']}")
print(f"  GRPO:            {CONFIG['include_grpo']}")
print(f"  QAT export:      {CONFIG['enable_qat_export']}")
if max_steps_override > 0:
    print(f"  Max steps:       {max_steps_override} (override)")
print()
print(f"  Tool Calling SFT: batch={CONFIG['tool_calling_batch']} x grad={CONFIG['tool_calling_grad_accum']}, seq={CONFIG['tool_calling_seq_len']}, steps={CONFIG['tool_calling_max_steps']}")
print(f"  Agent SFT:        batch={CONFIG['agent_sft_batch']} x grad={CONFIG['agent_sft_grad_accum']}, seq={CONFIG['agent_sft_seq_len']}, steps={CONFIG['agent_sft_max_steps']}")
if CONFIG["include_preference"]:
    print(f"  IPO:              batch={CONFIG['ipo_batch']} x grad={CONFIG['ipo_grad_accum']}, seq={CONFIG['ipo_seq_len']}, steps={CONFIG['ipo_max_steps']}")
if CONFIG["include_grpo"]:
    print(f"  GRPO:             batch={CONFIG['grpo_batch']} x grad={CONFIG['grpo_grad_accum']}, seq={CONFIG['grpo_seq_len']}, steps={CONFIG['grpo_max_steps']}")
print("=" * 58)

### 0.4 Pipeline Dashboard

In [None]:
import ipywidgets as widgets
from IPython.display import display

class PipelineTracker:
    """Track pipeline progress with visual indicators."""

    PHASES = [
        ("tool_calling_data", "Tool Calling Data"),
        ("agent_traj_data", "Agent Trajectory Data"),
        ("proxy_log_extract", "Proxy Log Extraction"),
        ("tool_calling_sft", "Tool Calling SFT"),
        ("merge", "Merge Adapter"),
        ("agent_sft", "Agent SFT"),
        ("ipo", "IPO Preference"),
        ("grpo", "GRPO RL"),
        ("eval", "Evaluation"),
        ("export", "Export"),
    ]

    def __init__(self):
        self._bars = {}
        self._labels = {}
        rows = []
        for key, name in self.PHASES:
            label = widgets.HTML(
                value=f"<span style='color:#888'>&#x25CB; {name}</span>",
                layout=widgets.Layout(width="240px"),
            )
            bar = widgets.FloatProgress(
                value=0, min=0, max=1.0,
                bar_style="info",
                layout=widgets.Layout(width="300px", height="18px"),
            )
            self._bars[key] = bar
            self._labels[key] = label
            rows.append(widgets.HBox([label, bar]))
        self._container = widgets.VBox(rows)
        display(widgets.HTML("<b>Pipeline Progress</b>"))
        display(self._container)

    def start(self, phase):
        self._labels[phase].value = (
            f"<span style='color:#2196F3'>&#x25B6; {dict(self.PHASES)[phase]}</span>"
        )
        self._bars[phase].value = 0.1
        self._bars[phase].bar_style = "info"

    def complete(self, phase):
        self._labels[phase].value = (
            f"<span style='color:#4CAF50'>&#x2714; {dict(self.PHASES)[phase]}</span>"
        )
        self._bars[phase].value = 1.0
        self._bars[phase].bar_style = "success"

    def skip(self, phase):
        self._labels[phase].value = (
            f"<span style='color:#9E9E9E'>&#x2014; {dict(self.PHASES)[phase]} (skipped)</span>"
        )
        self._bars[phase].value = 1.0
        self._bars[phase].bar_style = ""

    def fail(self, phase):
        self._labels[phase].value = (
            f"<span style='color:#F44336'>&#x2718; {dict(self.PHASES)[phase]}</span>"
        )
        self._bars[phase].bar_style = "danger"

tracker = PipelineTracker()

### 0.5 Set Up Persistent Storage

In [None]:
import os

DRIVE_SUBDIRS = [
    "data/coding_tui/tool_calling",
    "data/coding_tui/agent_traj",
    "data/coding_tui/proxy_logs",
    "data/coding_tui/preference",
    "data/coding_tui/grpo",
    "data/coding_tui/eval",
    "checkpoints/tool_calling_sft",
    "checkpoints/gpt-oss-20b-coding-tui-merged",
    "checkpoints/agent_sft",
    "checkpoints/agent_sft_ipo",
    "checkpoints/agent_sft_grpo",
    "evals",
]

if DRIVE_MODE == "mounted":
    for subdir in DRIVE_SUBDIRS:
        drive_path = os.path.join(DRIVE_BASE, subdir)
        os.makedirs(drive_path, exist_ok=True)
        local_path = subdir
        if not os.path.exists(local_path):
            os.makedirs(os.path.dirname(local_path) or ".", exist_ok=True)
            os.symlink(drive_path, local_path)
            print(f"  Linked: {local_path} -> {drive_path}")
        else:
            print(f"  Exists: {local_path}")
    print(f"\nDrive base: {DRIVE_BASE}")
elif DRIVE_MODE == "service_account":
    for subdir in DRIVE_SUBDIRS:
        os.makedirs(subdir, exist_ok=True)
        drive_helper.ensure_dir(subdir)
    print("Drive directories created (service account mode).")
else:
    for subdir in DRIVE_SUBDIRS:
        os.makedirs(subdir, exist_ok=True)
    print("Local directories created (no Drive backup).")

### 0.6 Check GPU & Configure MoE Backend

In [None]:
import torch, os

if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_mem = torch.cuda.get_device_properties(0).total_memory / (1024**3)

    print(f"GPU: {gpu_name}")
    print(f"VRAM: {gpu_mem:.1f} GB")

    # Auto-detect GPU tier override
    detected_tier = None
    if "H100" in gpu_name or "H200" in gpu_name:
        detected_tier = "h100_80gb"
    elif "A100" in gpu_name:
        detected_tier = "a100_80gb" if gpu_mem > 45 else "a100_40gb"

    if detected_tier and detected_tier != CONFIG["gpu_tier"]:
        print(f"\n  Auto-override: {CONFIG['gpu_tier']} -> {detected_tier}")
        CONFIG["gpu_tier"] = detected_tier
        tier = GPU_CONFIGS[detected_tier]
        for k, v in tier.items():
            CONFIG[k] = v
        print(f"  Updated CONFIG with {detected_tier} presets.")

    # Set MoE backend
    os.environ["UNSLOTH_MOE_BACKEND"] = CONFIG.get("moe_backend", "triton")
    print(f"\n  MoE backend: {os.environ['UNSLOTH_MOE_BACKEND']}")
    print(f"  Load mode: {CONFIG['load_mode']}")
    print(f"  Fast inference: {CONFIG.get('fast_inference', False)}")

    # FP8 detection
    if CONFIG["load_mode"] == "fp8":
        try:
            import transformer_engine
            print("  FP8: transformer-engine available")
        except ImportError:
            print("  FP8: transformer-engine not found, falling back to 4bit")
            CONFIG["load_mode"] = "4bit"
else:
    print("No GPU detected! Training will fail.")
    print("Enable GPU: Runtime -> Change runtime type -> GPU")

print(f"\nFinal config: scope={CONFIG['training_scope']}, tier={CONFIG['gpu_tier']}")

## Step 1: Data Preparation

### 1.1 Download Tool Calling Datasets

Downloads and formats three tool/function calling datasets into Harmony format:
- **Glaive v2** (113K): high-quality synthetic function calling conversations
- **xLAM-60K** (60K): diverse function calling from Salesforce
- **Hermes v1**: NousResearch curated function calling data

In [None]:
import os, json, sys

sys.path.insert(0, "scripts")
from dataset_formatters.function_calling import (
    format_glaive_function_calling,
    format_hermes_function_calling,
)
from dataset_formatters.harmony import encode_harmony_messages

TOOL_CALLING_DATASETS = [
    ("glaiveai/glaive-function-calling-v2", "glaive", 113000),
    ("Salesforce/xlam-function-calling-60k", "xlam", 60000),
    ("NousResearch/hermes-function-calling-v1", "hermes", None),
]

# ──────────────────────────────────────────────────────────────────────────────
# xLAM inline formatter
# xLAM format: {"query": str, "tools": str (JSON array), "answers": str (JSON array)}
# Output: Harmony tool call format
# ──────────────────────────────────────────────────────────────────────────────
def format_xlam_function_calling(example):
    """Format Salesforce xLAM function calling data into Harmony format.

    xLAM schema:
        query   - natural language user request
        tools   - JSON-encoded list of tool schemas (name/description/parameters)
        answers - JSON-encoded list of call dicts [{name: ..., arguments: {...}}]

    Returns Harmony-encoded text with developer context, user query, and
    one tool_call per answer entry in the assistant turn.
    """
    query = example.get("query", "").strip()
    tools_raw = example.get("tools", "[]")
    answers_raw = example.get("answers", "[]")

    if not query:
        return {"text": ""}

    try:
        tools = json.loads(tools_raw) if isinstance(tools_raw, str) else tools_raw
    except (json.JSONDecodeError, TypeError):
        tools = []

    try:
        answers = json.loads(answers_raw) if isinstance(answers_raw, str) else answers_raw
    except (json.JSONDecodeError, TypeError):
        answers = []

    if not answers:
        return {"text": ""}

    # Build tool schema description for developer context
    tool_descriptions = []
    for t in tools:
        name = t.get("name", "unknown")
        desc = t.get("description", "")
        params = t.get("parameters", {})
        tool_descriptions.append(
            f"  {name}: {desc}\n    Parameters: {json.dumps(params, separators=(',', ':'))}"
        )
    tool_ctx = "\n".join(tool_descriptions) if tool_descriptions else "No tools defined."

    dev_instructions = (
        "You are a helpful assistant with access to tools. "
        "Call the appropriate tool with valid parameters based on the user's request.\n\n"
        f"Available tools:\n{tool_ctx}"
    )

    # Build tool_calls list from answers
    tool_calls = []
    for i, ans in enumerate(answers):
        tc_name = ans.get("name", "")
        tc_args = ans.get("arguments", ans.get("parameters", {}))
        if not tc_name:
            continue
        tool_calls.append({
            "id": f"call_{i}",
            "type": "function",
            "function": {
                "name": tc_name,
                "arguments": json.dumps(tc_args) if isinstance(tc_args, dict) else str(tc_args),
            },
        })

    if not tool_calls:
        return {"text": ""}

    messages = [
        {"role": "user", "content": query},
        {"role": "assistant", "tool_calls": tool_calls},
    ]

    return {"text": encode_harmony_messages(messages, developer_instructions=dev_instructions)}


if CONFIG["skip_data_generation"] or not CONFIG["include_tool_calling"]:
    print("Skipping tool calling data download.")
    tracker.skip("tool_calling_data")
else:
    tracker.start("tool_calling_data")
    from datasets import load_dataset, Dataset, concatenate_datasets

    all_tool_calling = []
    stats = {}

    for ds_name, ds_key, max_samples in TOOL_CALLING_DATASETS:
        print(f"\nDownloading {ds_name}...")
        try:
            if ds_key == "glaive":
                raw = load_dataset(ds_name, split="train")
                if CONFIG["training_scope"] == "quick_test":
                    raw = raw.select(range(min(500, len(raw))))
                elif max_samples:
                    raw = raw.select(range(min(max_samples, len(raw))))
                formatted = raw.map(format_glaive_function_calling, remove_columns=raw.column_names)

            elif ds_key == "xlam":
                raw = load_dataset(ds_name, split="train")
                if CONFIG["training_scope"] == "quick_test":
                    raw = raw.select(range(min(200, len(raw))))
                elif max_samples:
                    raw = raw.select(range(min(max_samples, len(raw))))
                formatted = raw.map(format_xlam_function_calling, remove_columns=raw.column_names)

            elif ds_key == "hermes":
                raw = load_dataset(ds_name, split="train")
                if CONFIG["training_scope"] == "quick_test":
                    raw = raw.select(range(min(300, len(raw))))
                formatted = raw.map(format_hermes_function_calling, remove_columns=raw.column_names)

            else:
                continue

            # Filter empty examples
            formatted = formatted.filter(lambda x: bool(x.get("text", "").strip()))
            stats[ds_key] = len(formatted)
            all_tool_calling.append(formatted)
            print(f"  Formatted: {len(formatted):,} examples")

        except Exception as e:
            print(f"  WARNING: failed to load {ds_name}: {e}")
            stats[ds_key] = 0

    if all_tool_calling:
        combined = concatenate_datasets(all_tool_calling)
        combined = combined.shuffle(seed=42)
        out_path = "data/coding_tui/tool_calling/train"
        combined.save_to_disk(out_path)
        print(f"\nTotal tool calling examples: {len(combined):,} -> {out_path}")
    else:
        print("WARNING: no tool calling data collected.")

    drive_helper.backup("data/coding_tui/tool_calling", "data/coding_tui/tool_calling")
    if DRIVE_MODE != "local":
        print("Backed up to Drive.")

    tracker.complete("tool_calling_data")

### 1.2 Download Agent Trajectory Datasets

Downloads and formats agent trajectory datasets into Harmony agent format:
- **code-act**: multi-step code execution agent trajectories
- **commitpack**: commit-based code change examples
- **editpackft**: instruction-following code edits

In [None]:
import os, sys, json

sys.path.insert(0, "scripts")
from dataset_formatters.harmony import format_harmony_agent, encode_harmony_messages

AGENT_TRAJECTORY_DATASETS = [
    ("xingyaoww/code-act", "code_act", None),
    ("bigcode/commitpack", "commitpack", 50000),
    ("bigcode/editpackft", "editpackft", 50000),
]

CODING_AGENT_DEV_PROMPT = (
    "You are a coding agent. Use tools to read files, write code, run tests, and "
    "complete programming tasks. Do not just analyze — always take action and produce "
    "working code. After making changes, verify they work by running the relevant tests. "
    "If a tool call fails, diagnose and retry with corrected parameters."
)


def format_code_act_example(example):
    """Format code-act dataset into Harmony agent trajectory format.

    code-act format: conversations list with role/content turns, where
    tool calls appear as bash/python execution blocks in assistant content.
    """
    conversations = example.get("conversations", [])
    if not conversations:
        return {"text": ""}

    messages = []
    for turn in conversations:
        role = turn.get("role", turn.get("from", ""))
        content = turn.get("content", turn.get("value", ""))
        if not content:
            continue
        if role in ["human", "user"]:
            messages.append({"role": "user", "content": content})
        elif role in ["gpt", "assistant"]:
            messages.append({"role": "assistant", "content": content})
        elif role in ["tool", "function", "observation"]:
            messages.append({"role": "tool", "content": content})

    if len(messages) < 2:
        return {"text": ""}

    return {"text": encode_harmony_messages(
        messages,
        developer_instructions=CODING_AGENT_DEV_PROMPT,
        reasoning_effort="high",
    )}


def format_commitpack_example(example):
    """Format commitpack (code change from commit message) into Harmony format.

    commitpack format: {subject, message, old_contents, new_contents, lang}
    Task: given old code + commit message, produce the new code.
    """
    subject = example.get("subject", "")
    message = example.get("message", subject)
    old_code = example.get("old_contents", "")
    new_code = example.get("new_contents", "")
    lang = example.get("lang", "")

    if not old_code or not new_code or not message:
        return {"text": ""}

    # Skip trivially identical or very large examples
    if old_code.strip() == new_code.strip():
        return {"text": ""}
    if len(old_code) > 8000 or len(new_code) > 8000:
        return {"text": ""}

    fence = lang.lower() if lang else ""
    user_content = (
        f"Apply the following change to the code:\n\n"
        f"Commit message: {message}\n\n"
        f"Current code:\n```{fence}\n{old_code}\n```"
    )
    assistant_content = f"```{fence}\n{new_code}\n```"

    messages = [
        {"role": "user", "content": user_content},
        {"role": "assistant", "content": assistant_content},
    ]

    return {"text": encode_harmony_messages(
        messages,
        developer_instructions=CODING_AGENT_DEV_PROMPT,
        reasoning_effort="medium",
    )}


def format_editpackft_example(example):
    """Format editpackft (instruction-following code edits) into Harmony format.

    editpackft format: {instruction, old_code, new_code, lang}
    Task: given code + instruction, apply the edit.
    """
    instruction = example.get("instruction", "")
    old_code = example.get("old_code", example.get("input", ""))
    new_code = example.get("new_code", example.get("output", ""))
    lang = example.get("lang", "")

    if not instruction or not old_code or not new_code:
        return {"text": ""}
    if old_code.strip() == new_code.strip():
        return {"text": ""}
    if len(old_code) > 8000 or len(new_code) > 8000:
        return {"text": ""}

    fence = lang.lower() if lang else ""
    user_content = (
        f"{instruction}\n\n"
        f"Code:\n```{fence}\n{old_code}\n```"
    )
    assistant_content = f"```{fence}\n{new_code}\n```"

    messages = [
        {"role": "user", "content": user_content},
        {"role": "assistant", "content": assistant_content},
    ]

    return {"text": encode_harmony_messages(
        messages,
        developer_instructions=CODING_AGENT_DEV_PROMPT,
        reasoning_effort="medium",
    )}


if CONFIG["skip_data_generation"] or not CONFIG["include_agent_trajectories"]:
    print("Skipping agent trajectory data download.")
    tracker.skip("agent_traj_data")
else:
    tracker.start("agent_traj_data")
    from datasets import load_dataset, Dataset, concatenate_datasets

    all_agent_traj = []

    for ds_name, ds_key, max_samples in AGENT_TRAJECTORY_DATASETS:
        print(f"\nDownloading {ds_name}...")
        try:
            if ds_key == "code_act":
                raw = load_dataset(ds_name, split="train")
                if CONFIG["training_scope"] == "quick_test":
                    raw = raw.select(range(min(200, len(raw))))
                elif max_samples:
                    raw = raw.select(range(min(max_samples, len(raw))))
                formatted = raw.map(format_code_act_example, remove_columns=raw.column_names)

            elif ds_key == "commitpack":
                # commitpack has many language subsets; load the 'all' config or default
                try:
                    raw = load_dataset(ds_name, "all", split="train")
                except Exception:
                    raw = load_dataset(ds_name, split="train")
                if CONFIG["training_scope"] == "quick_test":
                    raw = raw.select(range(min(300, len(raw))))
                elif max_samples:
                    raw = raw.select(range(min(max_samples, len(raw))))
                formatted = raw.map(format_commitpack_example, remove_columns=raw.column_names)

            elif ds_key == "editpackft":
                raw = load_dataset(ds_name, split="train")
                if CONFIG["training_scope"] == "quick_test":
                    raw = raw.select(range(min(300, len(raw))))
                elif max_samples:
                    raw = raw.select(range(min(max_samples, len(raw))))
                formatted = raw.map(format_editpackft_example, remove_columns=raw.column_names)

            else:
                continue

            formatted = formatted.filter(lambda x: bool(x.get("text", "").strip()))
            all_agent_traj.append(formatted)
            print(f"  Formatted: {len(formatted):,} examples")

        except Exception as e:
            print(f"  WARNING: failed to load {ds_name}: {e}")

    if all_agent_traj:
        combined = concatenate_datasets(all_agent_traj)
        combined = combined.shuffle(seed=42)
        out_path = "data/coding_tui/agent_traj/train"
        combined.save_to_disk(out_path)
        print(f"\nTotal agent trajectory examples: {len(combined):,} -> {out_path}")
    else:
        print("WARNING: no agent trajectory data collected.")

    drive_helper.backup("data/coding_tui/agent_traj", "data/coding_tui/agent_traj")
    if DRIVE_MODE != "local":
        print("Backed up to Drive.")

    tracker.complete("agent_traj_data")

### 1.3 Download Preference Datasets

Downloads and formats preference datasets for IPO training.
Good responses (task completed, decisive action) are paired against bad ones (circular analysis, incomplete).

In [None]:
import os, sys, json

sys.path.insert(0, "scripts")
from dataset_formatters.harmony import format_harmony_preference, encode_harmony_messages

PREFERENCE_DATASETS = [
    ("Anthropic/hh-rlhf", "hh_rlhf", None),
    ("m-a-p/CodeFeedback-Filtered-Instruction", "code_feedback", None),
]

CODING_PREF_DEV = "You are a helpful coding assistant. Provide complete, working code solutions."


def format_hh_rlhf_example(example):
    """Format Anthropic HH-RLHF into Harmony preference pairs.

    hh-rlhf format: {chosen: str, rejected: str}
    Both are full conversation strings with \\nHuman: / \\nAssistant: turns.
    """
    chosen_raw = example.get("chosen", "")
    rejected_raw = example.get("rejected", "")

    if not chosen_raw or not rejected_raw:
        return {"text": ""}

    def parse_conversation(raw):
        """Parse Human/Assistant turn format into message list."""
        messages = []
        # Split on role markers
        import re
        parts = re.split(r'\n(Human|Assistant):\s*', raw)
        current_role = None
        for part in parts:
            part = part.strip()
            if part == "Human":
                current_role = "user"
            elif part == "Assistant":
                current_role = "assistant"
            elif part and current_role:
                messages.append({"role": current_role, "content": part})
                current_role = None
        return messages

    chosen_msgs = parse_conversation(chosen_raw)
    rejected_msgs = parse_conversation(rejected_raw)

    if not chosen_msgs or not rejected_msgs:
        return {"text": ""}

    # Extract the shared prompt (all turns up to last assistant turn)
    prompt_msgs = chosen_msgs[:-1] if chosen_msgs else []
    chosen_content = chosen_msgs[-1].get("content", "") if chosen_msgs else ""
    rejected_content = rejected_msgs[-1].get("content", "") if rejected_msgs else ""

    if not chosen_content or not rejected_content or chosen_content == rejected_content:
        return {"text": ""}

    # Encode chosen and rejected with full context
    chosen_full = encode_harmony_messages(
        prompt_msgs + [{"role": "assistant", "content": chosen_content}],
        developer_instructions=CODING_PREF_DEV,
    )
    rejected_full = encode_harmony_messages(
        prompt_msgs + [{"role": "assistant", "content": rejected_content}],
        developer_instructions=CODING_PREF_DEV,
    )

    prompt_text = ""
    if prompt_msgs:
        prompt_text = prompt_msgs[-1].get("content", "")

    return {
        "text": chosen_full,
        "prompt": prompt_text,
        "chosen": chosen_full,
        "rejected": rejected_full,
    }


def format_code_feedback_example(example):
    """Format CodeFeedback-Filtered-Instruction into Harmony preference pairs.

    Format: {query: str, answer: str} — high quality coding Q&A.
    We use these as positive examples; we generate synthetic rejected responses
    by truncating or slightly degrading the chosen answer.
    """
    query = example.get("query", example.get("instruction", "")).strip()
    answer = example.get("answer", example.get("output", "")).strip()

    if not query or not answer or len(answer) < 100:
        return {"text": ""}

    # Synthetic rejected: truncate answer at 30% and add a non-committal ending
    cutoff = max(50, int(len(answer) * 0.3))
    rejected = answer[:cutoff] + "\n\n(I would need to analyze this further before proceeding.)"

    return format_harmony_preference({
        "prompt": query,
        "chosen": answer,
        "rejected": rejected,
    })


if CONFIG["skip_data_generation"] or not CONFIG["include_preference"]:
    print("Skipping preference data download.")
else:
    from datasets import load_dataset, Dataset, concatenate_datasets

    all_pref = []

    for ds_name, ds_key, max_samples in PREFERENCE_DATASETS:
        print(f"\nDownloading {ds_name}...")
        try:
            if ds_key == "hh_rlhf":
                raw = load_dataset(ds_name, split="train")
                if CONFIG["training_scope"] == "quick_test":
                    raw = raw.select(range(min(300, len(raw))))
                elif max_samples:
                    raw = raw.select(range(min(max_samples, len(raw))))
                formatted = raw.map(format_hh_rlhf_example, remove_columns=raw.column_names)

            elif ds_key == "code_feedback":
                raw = load_dataset(ds_name, split="train")
                if CONFIG["training_scope"] == "quick_test":
                    raw = raw.select(range(min(200, len(raw))))
                elif max_samples:
                    raw = raw.select(range(min(max_samples, len(raw))))
                formatted = raw.map(format_code_feedback_example, remove_columns=raw.column_names)

            else:
                continue

            formatted = formatted.filter(lambda x: bool(x.get("text", "").strip()))
            all_pref.append(formatted)
            print(f"  Formatted: {len(formatted):,} examples")

        except Exception as e:
            print(f"  WARNING: failed to load {ds_name}: {e}")

    if all_pref:
        combined = concatenate_datasets(all_pref)
        combined = combined.shuffle(seed=42)
        out_path = "data/coding_tui/preference/train"
        combined.save_to_disk(out_path)
        print(f"\nTotal preference examples: {len(combined):,} -> {out_path}")
    else:
        print("WARNING: no preference data collected.")

    drive_helper.backup("data/coding_tui/preference", "data/coding_tui/preference")
    if DRIVE_MODE != "local":
        print("Backed up to Drive.")

### 1.4 Extract Training Data from Proxy Logs (Optional)

Scans the MacLean AI proxy log directory for real Codex CLI agent sessions.
Each session is a JSON file written by `claude-proxy-v2` containing the full
request/response. These are the highest-value training examples because they
are real agent tasks, not synthetic data.

Also scans for `*_cot.txt` files (chain-of-thought logs from the streaming filter)
and associates them with the corresponding requests for thinking data.

**Set `proxy_log_dir` in Step 0.3 to enable.**

In [None]:
import os, sys, json, glob, re
from pathlib import Path

sys.path.insert(0, "scripts")
from dataset_formatters.harmony import encode_harmony_messages

CODING_AGENT_DEV_PROMPT = (
    "You are a coding agent. Use tools to read files, write code, run tests, and "
    "complete programming tasks. Do not just analyze — always take action and produce "
    "working code. After making changes, verify they work by running the relevant tests. "
    "If a tool call fails, diagnose and retry with corrected parameters."
)


def load_proxy_log(log_path):
    """Load and validate a single proxy log JSON file.

    Returns the parsed dict if valid, None otherwise.
    Expected fields: request_num, path, translated, original_request,
                     response, input_tokens, output_tokens, latency_ms
    """
    try:
        with open(log_path) as f:
            data = json.load(f)
    except (json.JSONDecodeError, OSError):
        return None

    # Must have a response with output
    if data.get("output_tokens", 0) <= 0:
        return None

    resp = data.get("response", {})
    if not resp:
        return None

    return data


def extract_messages_from_request(req_data, cot_text=None):
    """Extract Harmony-ready messages from a proxy log entry.

    Handles both:
    - /v1/messages (Anthropic format, possibly translated)
    - /v1/chat/completions (OpenAI format)

    Args:
        req_data: parsed proxy log dict
        cot_text: optional chain-of-thought text from *_cot.txt sidecar

    Returns list of message dicts suitable for encode_harmony_messages, or None.
    """
    path = req_data.get("path", "")
    original_req = req_data.get("original_request", {})
    response = req_data.get("response", {})

    messages = []

    if "/messages" in path:
        # Anthropic Messages API format
        sys_content = original_req.get("system", "")
        if isinstance(sys_content, list):
            # system can be a list of content blocks
            sys_text = " ".join(
                block.get("text", "") for block in sys_content
                if isinstance(block, dict) and block.get("type") == "text"
            )
        else:
            sys_text = str(sys_content) if sys_content else ""

        raw_msgs = original_req.get("messages", [])
        for m in raw_msgs:
            role = m.get("role", "")
            content = m.get("content", "")
            if isinstance(content, list):
                # content can be a list of blocks (text, tool_use, tool_result)
                text_parts = []
                tool_calls_out = []
                for block in content:
                    btype = block.get("type", "")
                    if btype == "text":
                        text_parts.append(block.get("text", ""))
                    elif btype == "tool_use":
                        tool_calls_out.append({
                            "id": block.get("id", ""),
                            "type": "function",
                            "function": {
                                "name": block.get("name", ""),
                                "arguments": json.dumps(block.get("input", {})),
                            },
                        })
                    elif btype == "tool_result":
                        # tool results come back as user messages in Anthropic format
                        result_content = block.get("content", "")
                        if isinstance(result_content, list):
                            result_content = " ".join(
                                rb.get("text", "") for rb in result_content
                                if isinstance(rb, dict)
                            )
                        messages.append({
                            "role": "tool",
                            "tool_call_id": block.get("tool_use_id", ""),
                            "content": result_content,
                        })

                msg_entry = {"role": role}
                if text_parts:
                    msg_entry["content"] = "\n".join(text_parts)
                if tool_calls_out:
                    msg_entry["tool_calls"] = tool_calls_out
                if role == "assistant" and cot_text:
                    msg_entry["thinking"] = cot_text
                    cot_text = None  # use only for the first assistant turn

                if role not in ("user",) or text_parts:
                    messages.append(msg_entry)
            else:
                entry = {"role": role, "content": str(content) if content else ""}
                if role == "assistant" and cot_text:
                    entry["thinking"] = cot_text
                    cot_text = None
                messages.append(entry)

        # Extract assistant response
        resp_content = response.get("content", [])
        if isinstance(resp_content, list):
            resp_text_parts = []
            resp_tool_calls = []
            for block in resp_content:
                btype = block.get("type", "")
                if btype == "text":
                    resp_text_parts.append(block.get("text", ""))
                elif btype == "tool_use":
                    resp_tool_calls.append({
                        "id": block.get("id", ""),
                        "type": "function",
                        "function": {
                            "name": block.get("name", ""),
                            "arguments": json.dumps(block.get("input", {})),
                        },
                    })
            resp_entry = {"role": "assistant"}
            if resp_text_parts:
                resp_entry["content"] = "\n".join(resp_text_parts)
            if resp_tool_calls:
                resp_entry["tool_calls"] = resp_tool_calls
            messages.append(resp_entry)
        elif isinstance(resp_content, str) and resp_content:
            messages.append({"role": "assistant", "content": resp_content})

        dev_instructions = sys_text if sys_text else CODING_AGENT_DEV_PROMPT

    elif "/chat/completions" in path:
        # OpenAI Chat Completions format
        raw_msgs = original_req.get("messages", [])
        dev_instructions = CODING_AGENT_DEV_PROMPT

        for m in raw_msgs:
            role = m.get("role", "")
            content = m.get("content", "")
            tool_calls = m.get("tool_calls", [])
            tool_call_id = m.get("tool_call_id")

            if role == "system":
                dev_instructions = content
                continue

            entry = {"role": role}
            if content:
                entry["content"] = content
            if tool_calls:
                entry["tool_calls"] = tool_calls
            if tool_call_id:
                entry["tool_call_id"] = tool_call_id
            if role == "assistant" and cot_text:
                entry["thinking"] = cot_text
                cot_text = None
            messages.append(entry)

        # Extract response from choices
        choices = response.get("choices", [])
        if choices:
            resp_msg = choices[0].get("message", {})
            resp_entry = {"role": "assistant"}
            if resp_msg.get("content"):
                resp_entry["content"] = resp_msg["content"]
            if resp_msg.get("tool_calls"):
                resp_entry["tool_calls"] = resp_msg["tool_calls"]
            messages.append(resp_entry)
    else:
        return None, None

    if len(messages) < 2:
        return None, None

    return messages, dev_instructions


def extract_proxy_log_trajectories(log_dir, max_samples=None, quick_test=False):
    """Scan proxy log directory and extract training examples.

    Args:
        log_dir: path to MacLean AI proxy log directory
        max_samples: cap on number of examples to extract
        quick_test: if True, stop after 20 examples

    Returns list of Harmony-encoded text strings.
    """
    log_dir = Path(log_dir)
    if not log_dir.exists():
        print(f"  WARNING: proxy log dir not found: {log_dir}")
        return []

    # Find all JSON log files (exclude *_cot.txt sidecars)
    log_files = sorted(log_dir.glob("*.json"))
    if not log_files:
        # Try subdirectories (MacLean AI organises logs by date)
        log_files = sorted(log_dir.glob("**/*.json"))

    print(f"  Found {len(log_files)} log files in {log_dir}")

    examples = []
    skipped = 0

    for log_path in log_files:
        if quick_test and len(examples) >= 20:
            break
        if max_samples and len(examples) >= max_samples:
            break

        req_data = load_proxy_log(log_path)
        if req_data is None:
            skipped += 1
            continue

        # Check for CoT sidecar: same stem but _cot.txt suffix
        cot_path = log_path.with_name(log_path.stem + "_cot.txt")
        cot_text = None
        if cot_path.exists():
            try:
                cot_text = cot_path.read_text(encoding="utf-8").strip() or None
            except OSError:
                pass

        messages, dev_instructions = extract_messages_from_request(req_data, cot_text)
        if messages is None:
            skipped += 1
            continue

        try:
            text = encode_harmony_messages(
                messages,
                developer_instructions=dev_instructions,
                reasoning_effort="high",
            )
        except Exception as e:
            skipped += 1
            continue

        if text and len(text.strip()) > 200:
            examples.append({"text": text})

    print(f"  Extracted: {len(examples):,} valid examples ({skipped} skipped)")
    return examples


# ── Run extraction ─────────────────────────────────────────────────────────────
log_dir = CONFIG.get("proxy_log_dir", "").strip()

if not CONFIG["include_proxy_logs"] or not log_dir:
    print("Proxy log extraction disabled (include_proxy_logs=False or proxy_log_dir not set).")
    print("To enable: set proxy_log_dir to the MacLean AI log directory in Step 0.3.")
    tracker.skip("proxy_log_extract")
elif CONFIG["skip_data_generation"]:
    print("Skipping proxy log extraction (skip_data_generation=True).")
    tracker.skip("proxy_log_extract")
else:
    tracker.start("proxy_log_extract")

    quick = CONFIG["training_scope"] == "quick_test"
    proxy_examples = extract_proxy_log_trajectories(
        log_dir,
        max_samples=5000 if not quick else None,
        quick_test=quick,
    )

    if proxy_examples:
        from datasets import Dataset, load_from_disk, concatenate_datasets

        proxy_ds = Dataset.from_list(proxy_examples)
        proxy_ds = proxy_ds.shuffle(seed=42)

        # Save standalone proxy dataset
        proxy_out = "data/coding_tui/proxy_logs/train"
        proxy_ds.save_to_disk(proxy_out)
        print(f"\nProxy log dataset: {len(proxy_ds):,} examples -> {proxy_out}")

        # Merge into agent_sft data (proxy logs are highest value)
        agent_path = "data/coding_tui/agent_traj/train"
        if os.path.exists(agent_path):
            base_ds = load_from_disk(agent_path)
            merged = concatenate_datasets([base_ds, proxy_ds])
            merged = merged.shuffle(seed=42)
            merged.save_to_disk(agent_path)
            print(f"Merged into agent_traj/train: {len(merged):,} total")
        else:
            proxy_ds.save_to_disk(agent_path)
            print(f"Saved as agent_traj/train: {len(proxy_ds):,} examples")

        drive_helper.backup("data/coding_tui/proxy_logs", "data/coding_tui/proxy_logs")
        if DRIVE_MODE != "local":
            print("Backed up to Drive.")

        tracker.complete("proxy_log_extract")
    else:
        print("No valid proxy log examples extracted.")
        tracker.fail("proxy_log_extract")

### 1.5 Verify Data

In [None]:
import os

data_checks = [
    ("Tool Calling train", "data/coding_tui/tool_calling/train"),
    ("Agent Trajectory train", "data/coding_tui/agent_traj/train"),
    ("Proxy Log train", "data/coding_tui/proxy_logs/train"),
    ("Preference train", "data/coding_tui/preference/train"),
]

print("Data Verification Summary")
print("=" * 60)
print(f"  {'Dataset':<30} {'Examples':>12}")
print("-" * 60)

total = 0
for name, path in data_checks:
    if os.path.exists(path):
        try:
            from datasets import load_from_disk
            ds = load_from_disk(path)
            count = len(ds)
            total += count
            print(f"  {name:<30} {count:>12,}")
        except Exception as e:
            print(f"  {name:<30} {'ERROR':>12}  ({e})")
    else:
        print(f"  {name:<30} {'not found':>12}")

print("-" * 60)
print(f"  {'TOTAL':<30} {total:>12,}")
print("=" * 60)

## Step 2: Tool Calling SFT (Phase 1)

Train a LoRA adapter (rank 64) focused on correct tool/function calling.

**Goals:**
- Learn parameter accuracy for tool calls
- Learn when NOT to call tools
- Learn valid tool schemas and JSON formatting
- Reduce hallucinated MCP server calls

Low rank (64) to avoid catastrophic forgetting of general coding ability.

### 2.1 Train Tool Calling Adapter

In [None]:
if CONFIG["training_scope"] in ("skip_to_rl",):
    print(f"Skipping — scope is {CONFIG['training_scope']}")
    tracker.skip("tool_calling_sft")
elif not CONFIG["include_tool_calling"]:
    print("Skipping — include_tool_calling=False")
    tracker.skip("tool_calling_sft")
else:
    tracker.start("tool_calling_sft")

    batch = CONFIG["tool_calling_batch"]
    grad_accum = CONFIG["tool_calling_grad_accum"]
    max_steps = CONFIG["tool_calling_max_steps"]
    seq_len = CONFIG["tool_calling_seq_len"]

    cmd = "python scripts/13_train_lang_adapter.py"
    cmd += " --train_data_path data/coding_tui/tool_calling/train"
    cmd += f" --per_device_train_batch_size {batch}"
    cmd += f" --gradient_accumulation_steps {grad_accum}"
    cmd += f" --max_steps {max_steps}"
    cmd += f" --output_dir checkpoints/tool_calling_sft"
    cmd += " --lora_rank 64"
    cmd += " --developer_prompt 'You are a coding agent with tool access. Call tools accurately with valid parameters. Never call tools that do not exist.'"

    print("Training tool calling SFT adapter...")
    print(f"  Data:     data/coding_tui/tool_calling/train")
    print(f"  Batch:    {batch} x {grad_accum} = {batch * grad_accum}")
    print(f"  Steps:    {max_steps}")
    print(f"  Seq len:  {seq_len}")
    print(f"  LoRA rank: 64 (low rank to preserve general ability)")
    print(f"  MoE backend: {CONFIG['moe_backend']}")
    print("=" * 60)

    !{cmd}

    drive_helper.backup("checkpoints/tool_calling_sft", "checkpoints/tool_calling_sft")
    if DRIVE_MODE != "local":
        print("\nCheckpoint backed up to Drive.")

    tracker.complete("tool_calling_sft")

### 2.2 Merge Tool Calling Adapter into Base

In [None]:
if CONFIG["training_scope"] in ("skip_to_rl",):
    print(f"Skipping — scope is {CONFIG['training_scope']}")
    tracker.skip("merge")
elif not CONFIG["include_tool_calling"]:
    print("Skipping — include_tool_calling=False")
    tracker.skip("merge")
else:
    tracker.start("merge")

    print("Merging tool calling adapter into base model...")
    print("=" * 60)

    !python scripts/19_merge_adapter.py \
        --adapter_path checkpoints/tool_calling_sft/final \
        --output_dir checkpoints/gpt-oss-20b-coding-tui-merged

    drive_helper.backup(
        "checkpoints/gpt-oss-20b-coding-tui-merged",
        "checkpoints/gpt-oss-20b-coding-tui-merged",
    )
    if DRIVE_MODE != "local":
        print("\nMerged model backed up to Drive.")

    tracker.complete("merge")

### 2.3 Verify Merge

In [None]:
import os

if CONFIG["training_scope"] not in ("skip_to_rl",) and CONFIG["include_tool_calling"]:
    merged_path = "checkpoints/gpt-oss-20b-coding-tui-merged"

    print("Merge Verification:")
    print("=" * 60)

    if os.path.exists(merged_path):
        files = os.listdir(merged_path)
        total_size = sum(
            os.path.getsize(os.path.join(merged_path, f))
            for f in files if os.path.isfile(os.path.join(merged_path, f))
        )
        print(f"  ✓ Merged model: {merged_path}")
        print(f"    Files: {len(files)}")
        print(f"    Total size: {total_size / (1024**3):.1f} GB")
    else:
        print(f"  ✗ Merged model not found at {merged_path}")

    print("=" * 60)
else:
    print("Merge skipped for this training scope.")

## Step 3: Agent SFT (Phase 2)

Train a higher-rank LoRA (rank 128) on agent trajectories using the merged
tool-calling model as the base.

**Data includes:**
- Multi-turn code agent sessions (code-act, commitpack, editpackft)
- Real proxy log trajectories from live Codex CLI sessions (most valuable)

**Goals:**
- Learn complete read → plan → edit → verify cycles
- Learn to actually write code after planning
- Learn context tracking across long sessions

### 3.1 Train Agent SFT Adapter

In [None]:
if CONFIG["training_scope"] in ("tool_calling_only", "skip_to_rl"):
    print(f"Skipping — scope is {CONFIG['training_scope']}")
    tracker.skip("agent_sft")
elif not CONFIG["include_agent_trajectories"]:
    print("Skipping — include_agent_trajectories=False")
    tracker.skip("agent_sft")
else:
    tracker.start("agent_sft")

    # Use merged model as base if available, otherwise base model
    import os
    if os.path.exists("checkpoints/gpt-oss-20b-coding-tui-merged"):
        base_model = "checkpoints/gpt-oss-20b-coding-tui-merged"
        print("Using merged tool-calling model as base.")
    else:
        base_model = "openai/gpt-oss-20b"
        print("Merged model not found, using original base model.")

    batch = CONFIG["agent_sft_batch"]
    grad_accum = CONFIG["agent_sft_grad_accum"]
    max_steps = CONFIG["agent_sft_max_steps"]
    seq_len = CONFIG["agent_sft_seq_len"]

    cmd = "python scripts/14_train_core_agent.py"
    cmd += " --train_data_path data/coding_tui/agent_traj/train"
    cmd += f" --base_model_path {base_model}"
    cmd += f" --per_device_train_batch_size {batch}"
    cmd += f" --gradient_accumulation_steps {grad_accum}"
    cmd += f" --max_steps {max_steps}"
    cmd += f" --output_dir checkpoints/agent_sft"
    cmd += " --lora_rank 128"
    cmd += " --developer_prompt 'You are a coding agent. Use tools to read files, write code, run tests, and complete programming tasks. Do not just analyze — always take action and produce working code.'"

    print("Training agent SFT adapter (rank 128)...")
    print(f"  Base:     {base_model}")
    print(f"  Data:     data/coding_tui/agent_traj/train")
    print(f"  Batch:    {batch} x {grad_accum} = {batch * grad_accum}")
    print(f"  Steps:    {max_steps}")
    print(f"  Seq len:  {seq_len}")
    print(f"  LoRA rank: 128")
    print(f"  MoE backend: {CONFIG['moe_backend']}")
    print(f"  Auto packing: enabled")
    print("=" * 60)

    !{cmd}

    drive_helper.backup("checkpoints/agent_sft", "checkpoints/agent_sft")
    if DRIVE_MODE != "local":
        print("\nCheckpoint backed up to Drive.")

    tracker.complete("agent_sft")

In [None]:
import os, json

if CONFIG["training_scope"] not in ("tool_calling_only", "skip_to_rl") and CONFIG["include_agent_trajectories"]:
    ckpt_path = "checkpoints/agent_sft/final"

    print("Agent SFT Verification:")
    print("=" * 60)

    if os.path.exists(ckpt_path):
        files = os.listdir(ckpt_path)
        print(f"  ✓ Checkpoint: {ckpt_path} ({len(files)} files)")

        adapter_config = os.path.join(ckpt_path, "adapter_config.json")
        if os.path.exists(adapter_config):
            with open(adapter_config) as f:
                cfg = json.load(f)
            print(f"    LoRA rank:       {cfg.get('r', '?')}")
            print(f"    Alpha:           {cfg.get('lora_alpha', '?')}")
            print(f"    Target modules:  {cfg.get('target_modules', '?')}")
    else:
        print(f"  ✗ Checkpoint not found at {ckpt_path}")

    print("=" * 60)
else:
    print("Agent SFT skipped for this training scope.")

## Step 4: IPO Preference Optimisation (Phase 3)

Train with IPO on preference pairs targeting the key failure modes:

**Good (chosen):** Task completed, code written, tests pass, decisive action
**Bad (rejected):** Circular analysis, "I would need to look at this more", no code written, wrong tool params

Very low learning rate (5e-7), 1 epoch to avoid collapse.

### 4.1 Train with IPO

In [None]:
import os

if CONFIG["training_scope"] == "tool_calling_only":
    print("Skipping — scope is tool_calling_only")
    tracker.skip("ipo")
elif not CONFIG["include_preference"]:
    print("Skipping — include_preference=False")
    tracker.skip("ipo")
else:
    tracker.start("ipo")

    batch = CONFIG["ipo_batch"]
    grad_accum = CONFIG["ipo_grad_accum"]
    max_steps = CONFIG["ipo_max_steps"]

    # Determine best checkpoint to train from
    if CONFIG["training_scope"] == "skip_to_rl":
        ipo_base = "checkpoints/agent_sft/final"
        print("skip_to_rl: starting IPO from agent_sft checkpoint")
    elif os.path.exists("checkpoints/agent_sft/final"):
        ipo_base = "checkpoints/agent_sft/final"
    else:
        ipo_base = "checkpoints/tool_calling_sft/final"
        print("agent_sft not found, falling back to tool_calling_sft")

    # Check data exists
    pref_path = "data/coding_tui/preference/train"
    if not os.path.exists(pref_path):
        print(f"WARNING: preference data not found at {pref_path}")
        print("Run Step 1.3 first.")
        tracker.fail("ipo")
    else:
        cmd = "python scripts/17_ipo_preference.py"
        cmd += f" --checkpoint {ipo_base}"
        cmd += f" --train_data_path {pref_path}"
        cmd += f" --per_device_train_batch_size {batch}"
        cmd += f" --gradient_accumulation_steps {grad_accum}"
        cmd += f" --max_steps {max_steps}"
        cmd += " --output_dir checkpoints/agent_sft_ipo"
        cmd += " --beta 0.1"

        print("Training with IPO (preference optimisation)...")
        print(f"  Base checkpoint: {ipo_base}")
        print(f"  Data:            {pref_path}")
        print(f"  Batch:           {batch} x {grad_accum} = {batch * grad_accum}")
        print(f"  Steps:           {max_steps}")
        print(f"  Loss:            IPO (beta=0.1)")
        print(f"  Load mode:       {CONFIG['load_mode']}")
        print(f"  MoE backend:     {CONFIG['moe_backend']}")
        print("=" * 60)

        !{cmd}

        drive_helper.backup("checkpoints/agent_sft_ipo", "checkpoints/agent_sft_ipo")
        if DRIVE_MODE != "local":
            print("\nCheckpoint backed up to Drive.")

        tracker.complete("ipo")

In [None]:
import os

if CONFIG["training_scope"] not in ("tool_calling_only",) and CONFIG["include_preference"]:
    ckpt_path = "checkpoints/agent_sft_ipo/final"

    print("IPO Verification:")
    print("=" * 60)

    if os.path.exists(ckpt_path):
        files = os.listdir(ckpt_path)
        print(f"  ✓ IPO checkpoint: {ckpt_path} ({len(files)} files)")
    else:
        print(f"  ✗ IPO checkpoint not found at {ckpt_path}")

    # Check TensorBoard logs for KL divergence
    import glob
    tb_files = glob.glob("checkpoints/agent_sft_ipo/**/events.out.tfevents*", recursive=True)
    if tb_files:
        print(f"  ✓ TensorBoard logs: {len(tb_files)} event files")
        print("    Monitor KL divergence: warn >0.3, abort >0.5")
    else:
        print("  — No TensorBoard logs found")

    print("=" * 60)
else:
    print("IPO skipped for this training scope.")

## Step 5: GRPO RL (Phase 4)

Execution-grounded RL with Codex-style evaluation.

**Reward function:**
- `+1.0` for code that compiles / passes syntax check
- `+2.0` for passing test cases
- `+0.5` for clean linting (no obvious errors)
- `-1.0` for circular/no-action responses (no code written)
- `-0.5` for tool calls with malformed JSON parameters

**Goals:**
- Reinforce follow-through and complete code generation
- Penalise looping analysis without action
- Reinforce correct tool parameter formatting

**Optimisations:**
- FP8 RL with vLLM inference on H100 (1.6x throughput)
- Chunked batching for longer context
- Harmony format compliance reward

### 5.1 Train with GRPO

In [None]:
import os

if CONFIG["training_scope"] == "tool_calling_only":
    print("Skipping — scope is tool_calling_only")
    tracker.skip("grpo")
elif not CONFIG["include_grpo"]:
    print("Skipping — include_grpo=False")
    tracker.skip("grpo")
else:
    tracker.start("grpo")

    batch = CONFIG["grpo_batch"]
    grad_accum = CONFIG["grpo_grad_accum"]
    max_steps = CONFIG["grpo_max_steps"]
    max_seq = CONFIG["grpo_seq_len"]
    num_gen = CONFIG["grpo_num_gen"]

    # Determine best checkpoint
    if os.path.exists("checkpoints/agent_sft_ipo/final"):
        grpo_base = "checkpoints/agent_sft_ipo/final"
    elif os.path.exists("checkpoints/agent_sft/final"):
        grpo_base = "checkpoints/agent_sft/final"
        print("IPO checkpoint not found, using agent_sft.")
    elif os.path.exists("checkpoints/tool_calling_sft/final"):
        grpo_base = "checkpoints/tool_calling_sft/final"
        print("agent_sft not found, using tool_calling_sft.")
    else:
        grpo_base = "openai/gpt-oss-20b"
        print("No fine-tuned checkpoint found, using base model.")

    cmd = "python scripts/18_grpo_rl.py"
    cmd += f" --checkpoint {grpo_base}"
    cmd += f" --per_device_train_batch_size {batch}"
    cmd += f" --gradient_accumulation_steps {grad_accum}"
    cmd += f" --max_steps {max_steps}"
    cmd += f" --num_generations {num_gen}"
    cmd += " --output_dir checkpoints/agent_sft_grpo"
    cmd += " --reward_mode coding_tui"
    cmd += " --developer_prompt 'You are a coding agent. Use tools to read files, write code, run tests, and complete programming tasks. Do not just analyze — always take action and produce working code.'"

    v4_features = [f"Split LoRA ({CONFIG['moe_backend']})"]
    if CONFIG["load_mode"] == "fp8":
        v4_features.append("FP8 weights")
    if CONFIG.get("fast_inference"):
        v4_features.append("vLLM inference")
    v4_features += ["Chunked batching (auto)", "Auto packing"]

    if CONFIG["gpu_tier"] == "a100_40gb":
        print("NOTE: 40GB GPU — GRPO sequence length capped at 16384")

    print("Training with GRPO (execution-grounded RL)...")
    print(f"  Base:          {grpo_base}")
    print(f"  Batch:         {batch} x {grad_accum} = {batch * grad_accum}")
    print(f"  Steps:         {max_steps}")
    print(f"  Seq length:    {max_seq}")
    print(f"  Generations:   {num_gen} per prompt")
    print()
    print("  Reward signals:")
    print("    +1.0 code compiles / passes syntax check")
    print("    +2.0 test cases pass")
    print("    +0.5 clean linting")
    print("    -1.0 circular/no-action response")
    print("    -0.5 malformed tool call JSON")
    print()
    print("  Active features:")
    for feat in v4_features:
        print(f"    ✓ {feat}")
    print("=" * 60)

    !{cmd}

    drive_helper.backup("checkpoints/agent_sft_grpo", "checkpoints/agent_sft_grpo")
    if DRIVE_MODE != "local":
        print("\nCheckpoint backed up to Drive.")

    tracker.complete("grpo")

In [None]:
import os

if CONFIG["training_scope"] not in ("tool_calling_only",) and CONFIG["include_grpo"]:
    ckpt_path = "checkpoints/agent_sft_grpo/final"

    print("GRPO Verification:")
    print("=" * 60)

    if os.path.exists(ckpt_path):
        files = os.listdir(ckpt_path)
        print(f"  ✓ GRPO checkpoint: {ckpt_path} ({len(files)} files)")
    else:
        print(f"  ✗ GRPO checkpoint not found at {ckpt_path}")

    print("=" * 60)
else:
    print("GRPO skipped for this training scope.")

## Step 6: Evaluation

Evaluate on coding agent tasks targeting the four failure modes:
1. **Tool call format accuracy** — JSON schema compliance, valid tool names
2. **Task completion rate** — did it actually produce a code change?
3. **Circular detection rate** — does it loop the same analysis?
4. **Code correctness** — compiles, passes tests

### 6.1 Run Coding Agent Evaluation

In [None]:
import os

if CONFIG["training_scope"] == "tool_calling_only":
    # Still run reduced eval
    pass

tracker.start("eval")

# Determine best checkpoint
CHECKPOINT_PRIORITY = [
    "checkpoints/agent_sft_grpo/final",
    "checkpoints/agent_sft_ipo/final",
    "checkpoints/agent_sft/final",
    "checkpoints/tool_calling_sft/final",
]

eval_checkpoint = None
for path in CHECKPOINT_PRIORITY:
    if os.path.exists(path):
        eval_checkpoint = path
        break

if eval_checkpoint is None:
    print("✗ No checkpoint found. Train the model first.")
    tracker.fail("eval")
else:
    num_samples = CONFIG["eval_num_samples"]

    print(f"Evaluating checkpoint: {eval_checkpoint}")
    print(f"Samples: {num_samples}")
    print("=" * 60)

    !python scripts/eval_rust_agent.py \
        --checkpoint {eval_checkpoint} \
        --num_samples {num_samples} \
        --eval_mode coding_tui \
        --output_dir evals/coding_tui_agent

    drive_helper.backup("evals/coding_tui_agent", "evals/coding_tui_agent")
    if DRIVE_MODE != "local":
        print("\nResults backed up to Drive.")

    tracker.complete("eval")

### 6.2 Check Promotion Gates

In [None]:
!python scripts/12_check_gates.py coding_tui_agent

In [None]:
import os, json

metrics_path = "evals/coding_tui_agent/metrics.json"

if os.path.exists(metrics_path):
    with open(metrics_path) as f:
        metrics = json.load(f)

    # Targets tuned for a coding TUI agent
    targets = {
        "tool_call_format_accuracy": (0.95, "higher"),   # Valid JSON + known tool names
        "task_completion_rate": (0.70, "higher"),         # Actually wrote/modified code
        "circular_detection_rate": (0.10, "lower"),       # Loop rate should be low
        "code_correctness_rate": (0.65, "higher"),        # Compiles / passes tests
        "follow_through_rate": (0.80, "higher"),          # Takes action after planning
    }

    print("=" * 62)
    print("EVALUATION RESULTS — Coding TUI Agent")
    print("=" * 62)
    print(f"  {'Metric':<34} {'Value':>8} {'Target':>8} {'Status':>8}")
    print("-" * 62)

    all_pass = True
    for key, (target, direction) in targets.items():
        value = metrics.get(key)
        if value is None:
            print(f"  {key:<34} {'N/A':>8} {target:>8} {'—':>8}")
            continue

        if direction == "higher":
            passed = value >= target
        else:
            passed = value <= target

        if not passed:
            all_pass = False

        status = "✓ PASS" if passed else "✗ FAIL"
        fmt_val = f"{value:.1%}" if isinstance(value, float) and value <= 1 else f"{value}"
        fmt_tgt = f"{target:.0%}" if isinstance(target, float) and target <= 1 else f"{target}"
        print(f"  {key:<34} {fmt_val:>8} {fmt_tgt:>8} {status:>8}")

    print("=" * 62)
    if all_pass:
        print("  ALL GATES PASSED ✓ — Model ready for export")
    else:
        print("  SOME GATES FAILED ✗ — Consider additional training")
    print("=" * 62)
else:
    print(f"✗ Metrics file not found at {metrics_path}")
    print("Run evaluation (6.1) first.")

## Step 7: Test Model

Load the trained model and test it interactively against the specific coding TUI
agent failure modes that this pipeline targets.

### 7.1 Load Model

In [None]:
from unsloth import FastLanguageModel
from peft import PeftModel
import torch, os

CHECKPOINT_PRIORITY = [
    "checkpoints/agent_sft_grpo/final",
    "checkpoints/agent_sft_ipo/final",
    "checkpoints/agent_sft/final",
    "checkpoints/tool_calling_sft/final",
]

MERGED_PATH = "checkpoints/gpt-oss-20b-coding-tui-merged"

MODEL_PATH = None
is_adapter = False
for path in CHECKPOINT_PRIORITY:
    if os.path.exists(path) and os.path.exists(os.path.join(path, "adapter_config.json")):
        MODEL_PATH = path
        is_adapter = True
        break

if MODEL_PATH is None and os.path.exists(MERGED_PATH):
    MODEL_PATH = MERGED_PATH
    is_adapter = False

if MODEL_PATH is None:
    print("✗ No checkpoint found. Train the model first.")
else:
    print(f"Loading model from: {MODEL_PATH}")
    print(f"  Type: {'LoRA adapter' if is_adapter else 'merged model'}")

    model = None
    base_name = "openai/gpt-oss-20b"

    # Try 1: Pre-quantized BNB 4-bit (avoids GptOssExperts BNB traversal issue)
    try:
        print("  Loading pre-quantized BNB 4-bit model...")
        model, tokenizer = FastLanguageModel.from_pretrained(
            "unsloth/gpt-oss-20b-unsloth-bnb-4bit",
            max_seq_length=8192,
            dtype=None,
            load_in_4bit=False,
        )
        print("  Mode: BNB 4-bit (pre-quantized)")
    except Exception as e:
        print(f"  Pre-quantized BNB failed: {e}")

    # Try 2: bfloat16 without quantization
    if model is None:
        print("  Loading in bfloat16 (no quantization)...")
        model, tokenizer = FastLanguageModel.from_pretrained(
            base_name,
            max_seq_length=8192,
            dtype=torch.bfloat16,
            load_in_4bit=False,
        )
        print("  Mode: bfloat16 (no quantization)")

    if is_adapter:
        print(f"  Applying LoRA adapter from {MODEL_PATH}...")
        model = PeftModel.from_pretrained(model, MODEL_PATH)

    FastLanguageModel.for_inference(model)
    print("✓ Model loaded!")

### 7.2 Test Against Failure Modes

In [None]:
import sys, torch
sys.path.insert(0, "scripts")
from dataset_formatters.harmony import encode_harmony_messages

CODING_AGENT_DEV = (
    "You are a coding agent. Use tools to read files, write code, run tests, and "
    "complete programming tasks. Do not just analyze — always take action and produce "
    "working code. After making changes, verify they work by running the relevant tests. "
    "If a tool call fails, diagnose and retry with corrected parameters."
)

# Test prompts designed to expose each failure mode
TEST_PROMPTS = [
    # Test 1: Tool calling accuracy — should call read_file with valid path, NOT a made-up tool
    (
        "Failure Mode: Tool Calling",
        "Read the file at src/main.rs and fix any compilation errors you find.",
        ["read_file", "write_file", "run_command"],
    ),
    # Test 2: Follow-through — should NOT just say "I would need to look at..."
    (
        "Failure Mode: No Follow-Through",
        "Write a Python function called `binary_search(arr, target)` that returns the index of target in sorted arr, or -1 if not found. Add it to utils.py and write a pytest test for it.",
        None,
    ),
    # Test 3: Circular reasoning — model should take action, not loop
    (
        "Failure Mode: Circular Reasoning",
        "Analyze the codebase and suggest improvements. Then implement the most impactful one.",
        None,
    ),
    # Test 4: Context tracking — should remember the task mid-session
    (
        "Failure Mode: Context Loss",
        "I need you to refactor the authentication module. Start by reading auth.py, then identify the issues, then fix them one by one.",
        None,
    ),
]

def generate_response(prompt, tools=None, max_tokens=512):
    """Generate a response using Harmony format."""
    messages = [{"role": "user", "content": prompt}]
    if tools:
        tool_ctx = "\n".join(f"  - {t}(path: str)" for t in tools)
        messages[0]["content"] = (
            f"Available tools:\n{tool_ctx}\n\n" + messages[0]["content"]
        )
    formatted = encode_harmony_messages(
        messages,
        developer_instructions=CODING_AGENT_DEV,
        add_generation_prompt=True,
    )
    inputs = tokenizer(formatted, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_tokens,
            temperature=0.3,
            do_sample=True,
            top_p=0.9,
        )
    return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)


for label, prompt, tools in TEST_PROMPTS:
    print(f"\n{'=' * 64}")
    print(f"TEST: {label}")
    print(f"{'=' * 64}")
    print(f"Prompt: {prompt[:120]}...")
    print("-" * 64)
    response = generate_response(prompt, tools, max_tokens=384)
    print(response)
    print()

In [None]:
# ── Custom Prompt ─────────────────────────────────────────────────────────────
CUSTOM_PROMPT = (
    "Read requirements.txt and install any missing packages, "
    "then run the test suite and fix any failing tests."
)

print(f"Custom prompt: {CUSTOM_PROMPT}")
print("=" * 64)
print(generate_response(CUSTOM_PROMPT, max_tokens=512))

## Step 8: Export

Merge the final adapter and export to HuggingFace safetensors + GGUF formats.

The GGUF file can be loaded directly into MacLean AI via llama-server
for Codex CLI integration testing.

### 8.1 Export to GGUF

In [None]:
import os

tracker.start("export")

ADAPTER_PRIORITY = [
    "checkpoints/agent_sft_grpo/final",
    "checkpoints/agent_sft_ipo/final",
    "checkpoints/agent_sft/final",
    "checkpoints/tool_calling_sft/final",
]

adapter_path = None
for path in ADAPTER_PRIORITY:
    if os.path.exists(path):
        adapter_path = path
        break

if adapter_path is None:
    print("✗ No adapter checkpoint found.")
    tracker.fail("export")
else:
    export_dir = "checkpoints/gpt-oss-20b-coding-tui-export"
    print(f"Exporting adapter: {adapter_path}")
    print(f"Output: {export_dir}")
    print("=" * 60)

    !python scripts/19_merge_adapter.py \
        --adapter_path {adapter_path} \
        --output_dir {export_dir} \
        --export_formats hf gguf_q4

    drive_helper.backup(export_dir, "checkpoints/gpt-oss-20b-coding-tui-export")
    if DRIVE_MODE != "local":
        print("\nExport backed up to Drive.")

    tracker.complete("export")

### 8.2 QAT Export (Optional)

Quantisation-Aware Training for MXFP4 deployment.
Recovers 97-100% quality vs 59-89% with post-training quantisation.

In [None]:
if not CONFIG.get("enable_qat_export"):
    print("QAT export disabled. Enable via widget toggle in Step 0.3.")
    print("\nQAT recovers 97-100% quality when deploying to MXFP4,")
    print("vs 59-89% with standard post-training quantisation (PTQ).")
else:
    import os
    export_dir = "checkpoints/gpt-oss-20b-coding-tui-export"
    qat_dir = "checkpoints/gpt-oss-20b-coding-tui-qat"

    if not os.path.exists(export_dir):
        print("✗ Run standard export (8.1) first.")
    else:
        print("Running QAT pass on merged model...")
        print("  This fine-tunes with MXFP4-aware quantisation at reduced LR (1e-5).")
        print("=" * 60)

        try:
            import modelopt.torch.quantization as mtq
            print("✓ nvidia-modelopt available")
            print("\nQAT pipeline (manual steps):")
            print(f"  1. Load merged BF16 model from {export_dir}")
            print(f"  2. mtq.quantize(model, config=mtq.MXFP4_DEFAULT_CFG)")
            print(f"  3. Fine-tune for ~100 steps at LR 1e-5")
            print(f"  4. Export to {qat_dir}")
        except ImportError:
            print("✗ nvidia-modelopt not installed.")
            print("  Install: pip install nvidia-modelopt")

### 8.3 Download GGUF

In [None]:
IN_COLAB = "COLAB_GPU" in os.environ or os.path.exists("/content")

if IN_COLAB:
    from google.colab import files
    import glob, os

    export_dir = "checkpoints/gpt-oss-20b-coding-tui-export"
    gguf_files = glob.glob(os.path.join(export_dir, "**/*.gguf"), recursive=True)

    if gguf_files:
        gguf_path = gguf_files[0]
        size_gb = os.path.getsize(gguf_path) / (1024**3)
        print(f"Downloading: {os.path.basename(gguf_path)} ({size_gb:.1f} GB)")
        files.download(gguf_path)
    else:
        print("✗ No GGUF file found. Run export (8.1) first.")
else:
    print("Download not available outside Colab.")
    print("GGUF file is at: checkpoints/gpt-oss-20b-coding-tui-export/")

### 8.4 Upload to HuggingFace Hub

In [None]:
# --- Configuration ---
HF_REPO_ID = ""  # e.g. "your-username/gpt-oss-20b-coding-tui-agent"
HF_PRIVATE = True

assert HF_REPO_ID, "Set HF_REPO_ID above before running this cell."

import os, glob
from huggingface_hub import HfApi

# Authenticate: try Colab Secrets first, then interactive login
try:
    from google.colab import userdata
    hf_token = userdata.get("HF_TOKEN")
    print("Using HF_TOKEN from Colab Secrets.")
except Exception:
    from huggingface_hub import login
    login()
    hf_token = None

api = HfApi(token=hf_token)
api.create_repo(repo_id=HF_REPO_ID, private=HF_PRIVATE, exist_ok=True)
print(f"Repo ready: https://huggingface.co/{HF_REPO_ID}")

# --- Model card ---
export_dir = "checkpoints/gpt-oss-20b-coding-tui-export"
hf_dir = os.path.join(export_dir, "hf")

model_card = """\
---
base_model: openai/gpt-oss-20b
tags:
  - coding-agent
  - tool-calling
  - codex-cli
  - gpt-oss
  - qlora
  - unsloth
  - grpo
  - tui
license: apache-2.0
pipeline_tag: text-generation
---

# GPT-OSS 20B Coding TUI Agent

Fine-tuned from [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) for
use as a coding TUI agent (Codex CLI integration via MacLean AI + llama-server).

## Problem Statement

GPT-OSS 20B exhibits four failure modes when used as a coding agent:
1. Tool calling errors (invalid params, non-existent MCP servers)
2. No follow-through (analysis loops, never writes code)
3. Circular reasoning (repeating the same analysis)
4. Context loss (forgetting task state mid-session)

## Training Pipeline

1. **Tool Calling SFT** (rank 64) \u2014 Glaive + xLAM + Hermes in Harmony format
2. **Merge** \u2014 tool-calling adapter merged into base
3. **Agent SFT** (rank 128) \u2014 code-act, commitpack, editpackft + proxy log trajectories
4. **IPO** \u2014 decisive action preferred over circular analysis
5. **GRPO RL** \u2014 execution-grounded: code compiles, tests pass, no loops

Trained with [Unsloth](https://github.com/unslothai/unsloth) QLoRA.

## Deployment

Designed for deployment via [llama-server](https://github.com/ggerganov/llama.cpp)
with the [claude-proxy-v2](https://github.com/rmarnold/claude-proxy-v2) translation layer
for Codex CLI (OpenAI Responses API).

## GGUF

A quantised GGUF file is included for use with llama.cpp.
""".format()

readme_path = os.path.join(hf_dir, "README.md")
os.makedirs(hf_dir, exist_ok=True)
with open(readme_path, "w") as f:
    f.write(model_card)
print(f"Wrote model card to {readme_path}")

# --- Upload HF safetensors model ---
assert os.path.isdir(hf_dir), f"HF export dir not found: {hf_dir}. Run export (8.1) first."
print(f"Uploading HF model from {hf_dir} ...")
api.upload_folder(
    folder_path=hf_dir,
    repo_id=HF_REPO_ID,
    commit_message="Upload GPT-OSS 20B Coding TUI Agent (tool-calling + agent-SFT + IPO + GRPO)",
    token=hf_token,
)
print("HF model uploaded.")

# --- Upload GGUF file ---
gguf_files = glob.glob(os.path.join(export_dir, "**/*.gguf"), recursive=True)
if gguf_files:
    gguf_path = gguf_files[0]
    gguf_name = os.path.basename(gguf_path)
    size_gb = os.path.getsize(gguf_path) / (1024**3)
    print(f"Uploading GGUF: {gguf_name} ({size_gb:.1f} GB) ...")
    api.upload_file(
        path_or_fileobj=gguf_path,
        path_in_repo=gguf_name,
        repo_id=HF_REPO_ID,
        commit_message=f"Upload GGUF quantisation ({gguf_name})",
        token=hf_token,
    )
    print("GGUF uploaded.")
else:
    print("No GGUF file found \u2014 skipping. Run export (8.1) to generate one.")

print(f"\nDone! View your model at: https://huggingface.co/{HF_REPO_ID}")

---
## Training Complete!

Your GPT-OSS 20B Coding TUI Agent is trained and ready for Codex CLI integration.

**Pipeline summary:**
1. Tool Calling SFT: Glaive (113K) + xLAM (60K) + Hermes — correct tool schemas and parameter formatting
2. Merge: tool-calling adapter fused into base weights
3. Agent SFT: code-act + commitpack + editpackft + real proxy log trajectories (most valuable)
4. IPO: decisive action preferred over circular analysis (hh-rlhf + code feedback)
5. GRPO RL: execution-grounded — rewards for compiling code and passing tests, penalises loops

**Outputs:**
- Checkpoints: `checkpoints/agent_sft_{ipo,grpo}/final`
- Evaluation: `evals/coding_tui_agent/metrics.json`
- Exported model: `checkpoints/gpt-oss-20b-coding-tui-export/`
- All backed up to Google Drive: `gpt-oss-20b-coding-tui/`

**MacLean AI integration:**
- Copy the exported GGUF to your MacLean AI model directory
- Select it in the Model Browser
- Enable Codex CLI support in Settings
- The proxy translation layer (`--translate-anthropic`) handles the Anthropic↔OpenAI format conversion
- Test with: `codex "Read src/main.rs and fix any compilation errors"`

**Next steps:**
- Review evaluation metrics in Step 6
- Test against each failure mode in Step 7
- If circular reasoning persists: increase GRPO steps or add more no-action penalisation
- If tool calling is still poor: increase tool_calling_sft steps or add more xLAM data

In [None]:
# Disconnect and release GPU runtime to stop billing
try:
    from google.colab import runtime
    runtime.unassign()
except ImportError:
    print("Not in Colab — no runtime to release.")