# Week 6 — Part 04: End-to-end capstone runner (one command)

**Estimated time:** 60–90 minutes

## What success looks like (end of Part 04)

- You can describe a stable CLI interface for the capstone runner.
- Running the runner produces `output/report.json` and `output/report.md` deterministically.
- When something fails, you still have intermediate artifacts in `output/` to debug.

### Checkpoint

After reading/running the skeleton, you should be able to point to:

- the CLI flags (`--input`, `--output_dir`, `--model`)
- the output contract (`report.json`, `report.md`)

## Learning Objectives

- Design a stable CLI interface for the capstone
- Define a clear output contract (report + intermediate artifacts)
- Build a runner skeleton with argparse
- Capture failure evidence for debugging

## Overview

Your capstone should run with **one command**. That means:

- clear CLI flags
- predictable outputs
- stable artifact locations

---

## Underlying theory: the runner is your system’s public interface

From Week 1, reproducibility is an interface. The runner is the concrete version of that idea:

$$
\text{outputs} = r(\text{input},\ \text{config})
$$

Practical implication:

- if the runner is stable, testing and demos become easy
- if the runner requires manual steps, failures become non-reproducible

In [None]:
import argparse
import json
from pathlib import Path
from typing import Any, Dict


def run_capstone(input_path: Path, output_dir: Path, model: str) -> Dict[str, Any]:
    output_dir.mkdir(parents=True, exist_ok=True)

    # TODO: implement pipeline stages (load -> profile -> compress -> llm -> report)
    report: Dict[str, Any] = {
        "model": model,
        "input": str(input_path),
        "summary": "placeholder",
    }

    (output_dir / "report.json").write_text(json.dumps(report, indent=2), encoding="utf-8")
    (output_dir / "report.md").write_text("# Report\n\nPlaceholder report", encoding="utf-8")
    return report


def build_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser()
    parser.add_argument("--input", required=True)
    parser.add_argument("--output_dir", default="output")
    parser.add_argument("--model", required=True)
    return parser


# Example CLI usage:
# python run_capstone.py --input data.csv --output_dir output --model llama3.1

## Suggested CLI

```bash
python run_capstone.py --input data.csv --output_dir output --model <MODEL_NAME>
```

## Output contract

The command should write:

- `output/report.json`
- `output/report.md`

Optionally:

- `output/profile.json`
- `output/compressed_input.json`

Failure-mode design tip:

- write intermediate artifacts *before* the LLM call

In [None]:
def validate_outputs(output_dir: Path) -> None:
    required = [output_dir / "report.json", output_dir / "report.md"]
    missing = [p for p in required if not p.exists()]
    if missing:
        raise FileNotFoundError(f"missing outputs: {missing}")


print("Implement validate_outputs() with extra checks if needed.")

## Self-check

- Can you run from a fresh folder after following README steps?
- If the model call fails, do you still get intermediate outputs?

## References

- Python `argparse`: https://docs.python.org/3/library/argparse.html