# Week 7 — Part 01: CLI design (argparse) + good defaults

**Estimated time:** 60–90 minutes

---

## Pre-study (Self-learn)

Foundamental Course assumes Self-learn is complete. If you need a refresher on modules/exceptions and project habits:

- [Foundamental Course Pre-study index](../PRESTUDY.md)
- [Self-learn — Modules and exception handling](../self_learn/Chapters/2/02_modules_exceptions.md)
- [Self-learn — Chapter 2: Python and Environment Management](../self_learn/Chapters/2/Chapter2.md)

---

## What success looks like (end of Part 01)

- `--help` clearly documents required inputs, defaults, and outputs.
- Incorrect usage fails fast with an actionable error message.
- You can normalize and validate key args (e.g., `output_dir`, `model`, `seed`).

### Checkpoint

After running this notebook, you should be able to:

- build a parser with stable flag names (`--input`, `--output_dir`, `--model`, `--seed`)
- show `normalize_args(...)` and `validate_args(...)` running without crashing

## Learning Objectives

- Implement a clean `argparse` interface with good defaults
- Validate required inputs and fail with helpful errors
- Keep flags consistent across scripts

### What this part covers
This notebook covers **CLI design** — building a clean command-line interface for your capstone so it's easy to use correctly and hard to use incorrectly.

**A good CLI has four properties:**
1. **Descriptive `--help`** — users know what flags exist without reading the source code
2. **Sensible defaults** — the README example command works without modification
3. **Explicit inputs/outputs** — no hidden assumptions about file locations
4. **Clear errors on invalid input** — users fix mistakes fast without guessing

**Why this matters:** Your CLI is the "front door" to your capstone. If it's confusing, teammates can't use it. If it has no defaults, the README becomes a list of required flags. If errors are cryptic, debugging takes hours instead of minutes.

## Overview

Your CLI is the “front door” to your capstone.

In this lab you will:

- build a stable parser (`--input`, `--output_dir`, `--model`, `--seed`)
- normalize and validate args
- fail fast with actionable errors

If you want the deeper contract/UX rationale, use the Self-learn links at the top of the notebook.

### What this cell does
Defines `build_parser()` — the argparse configuration — and `require_file()` — an input validator that fails fast with a helpful error message.

**Walk through `build_parser()`:**
- `prog="run_capstone"` — the name shown in `--help` output
- `--input` (required) — no default, must be provided
- `--output_dir` (default: `"output"`) — safe default, works without specifying
- `--model` (default: `"llama3.1"`) — sensible default for local inference
- `--seed` (type=int, default=42) — explicit type conversion, fixed default for reproducibility

**Walk through `require_file()`:**
- `path.expanduser()` — handles `~` in paths
- Checks existence before checking size — avoids a confusing `FileNotFoundError` from `stat()`
- Error messages include the path AND a hint: `"Try --help for usage."` — users know what to do next

In [None]:
import argparse
from pathlib import Path


def build_parser() -> argparse.ArgumentParser:
    p = argparse.ArgumentParser(
        prog="run_capstone",
        description="Run the capstone pipeline with stable outputs",
    )
    p.add_argument("--input", required=True, help="Input CSV file path")
    p.add_argument("--output_dir", default="output", help="Artifact directory")
    p.add_argument("--model", default="llama3.1", help="Model name")
    p.add_argument("--seed", type=int, default=42)
    p.add_argument("--config", default=None, help="Optional config file")
    return p


def require_file(path: str) -> Path:
    p = Path(path).expanduser()
    if not p.exists():
        raise FileNotFoundError(f"Input file not found: {p}. Try --help for usage.")
    if p.stat().st_size == 0:
        raise ValueError(f"Input file is empty: {p}. Provide a non-empty CSV.")
    return p


parser = build_parser()
print("CLI parser ready")

### What this cell does
Defines `normalize_args()` and `validate_args()` — two functions for cleaning and validating parsed arguments.

**`normalize_args()`** should strip whitespace and normalize the model name (e.g., lowercase). This prevents subtle bugs where `" output "` (with spaces) creates a directory with spaces in the name, or `"LLaMA3.1"` doesn't match the model name Ollama expects.

**`validate_args()`** should check that `output_dir` is non-empty and `seed` is non-negative. These are preconditions — if they fail, the pipeline should stop immediately with a clear error rather than failing mysteriously later.

**Your task:** Implement both functions properly. The current stubs do minimal work. The solution is in the Appendix.

In [None]:
from typing import Tuple


def normalize_args(output_dir: str, model: str) -> Tuple[str, str]:
    # TODO: implement normalization (e.g., strip whitespace, enforce lowercase model).
    out = (output_dir or "").strip()
    mdl = (model or "").strip()
    return out, mdl


def validate_args(output_dir: str, seed: int) -> None:
    # TODO: add validation (e.g., seed must be >= 0, output_dir not empty).
    if not output_dir:
        raise ValueError("output_dir must be non-empty")


out_dir_norm, model_norm = normalize_args(" output ", " LLaMA3.1 ")
validate_args(out_dir_norm, 42)
print("normalized:", out_dir_norm, model_norm)

print("Implement normalize_args() and validate_args().")

## Self-check

- Can a teammate run `python run_capstone.py --help` and understand how to use it?

## References

- Python `argparse`: https://docs.python.org/3/library/argparse.html
- Click: https://click.palletsprojects.com/

## Appendix: Solutions (peek only after trying)

Reference implementations for `normalize_args` and `validate_args`.

In [None]:
def normalize_args(output_dir: str, model: str) -> Tuple[str, str]:
    out = (output_dir or "").strip()
    mdl = (model or "").strip().lower()
    return out, mdl


def validate_args(output_dir: str, seed: int) -> None:
    if not output_dir or not str(output_dir).strip():
        raise ValueError("output_dir must be non-empty")
    if int(seed) < 0:
        raise ValueError("seed must be >= 0")


print("solution_example:", normalize_args(" output ", " LLaMA3.1 "))
validate_args("output", 0)