# Week 34 — Observability for LLM Systems

*Last updated:* 2025-09-09

## Objectives
- [ ] Understand observability for llm systems
- [ ] Complete guided exercises (theory → code → evaluation)
- [ ] Apply learning in a small project or lab
- [ ] Reflect using self-assessment checklist

## Mini-Theory (Deep Dive)
- Traces, token metrics, prompt drift
- p95/p99 latency; SLOs & SLIs; cost budgets

## Guided Exercises
    The following exercises are structured to help you learn by doing. Each has **starter code**, **hints**, and **checks**.

In [None]:
# Exercise: Wire up MLflow tracking for a small experiment
# NOTE: If MLflow isn't installed in this environment, run `pip install mlflow` locally.
try:
    import mlflow, random, time
    mlflow.set_experiment("demo")
    with mlflow.start_run():
        lr = 10**random.uniform(-4, -2)
        epochs = 5
        mlflow.log_params({"lr": lr, "epochs": epochs})
        for e in range(epochs):
            metric = 1.0/(e+1) + random.random()*0.01
            mlflow.log_metric("val_loss", metric, step=e)
            time.sleep(0.05)
    print("Logged run. Open MLflow UI to inspect.")
except Exception as e:
    print("Note: install mlflow in your local env to run this cell. Error:", e)

## Project Work
- This week connects to: `syllabus/phase-06-systems-mlops.md`
- Implement the **Build** task described in the project README. Tie your notebook experiments into that code (e.g., import your module or save artifacts for the project).

### Deliverable
- A short write-up (5–10 bullets) on **what worked, what didn’t, and what you’ll try next**.

## Self-Assessment Checklist
- [ ] I can explain the key concepts of **Observability for LLM Systems** in my own words.
- [ ] I completed the guided exercises and validated outputs.
- [ ] I produced a small artifact (code, plot, or report) and linked it to the project.
- [ ] I captured 3–5 learnings and 2 next steps.

---
**Tip:** Keep each week to ~10 hours: ~3h study, ~3h coding, ~3h project, ~1h reflection.