# Week 8 — Part 03: Preparing for Level 2 (what changes)

**Estimated time:** 30–45 minutes

## What success looks like (end of Part 03)

- You can describe at least 3 ways Level 2 differs from Level 1.
- You can name at least 2 new failure surfaces (e.g., retrieval quality, prompt injection).
- You write a short readiness checklist artifact under `output/`.

### Checkpoint

After running this notebook, you should have:

- `output/LEVEL2_SELF_CHECK.md`

## Learning Objectives

- Identify the shift from scripts to systems in Level 2
- Understand new failure surfaces (retrieval, evaluation, agents)
- Capture practical mindset shifts for Level 2 work

## Overview

Level 1 is mostly:

- a single-project pipeline
- mostly offline, script-based
- focusing on reproducibility and reliability basics

Level 2 shifts toward **systems thinking**:

- retrieval (RAG)
- evaluation loops
- multi-step agent workflows
- knowledge bases

---

## Underlying theory: Level 2 adds feedback loops and new failure surfaces

In Level 1, many workflows are “run once and inspect outputs”.

In Level 2, you build systems with feedback loops:

- retrieval quality affects generation quality
- evaluation metrics guide iteration
- multi-step workflows introduce compounding failure probability

Practical implication:

- you need observability (traces/logs) to debug why a system answered
- you need eval sets to prevent “prompt overfitting”
- you need trust boundaries to resist prompt injection when external data is involved

## Practical mindset shifts

- From “one script works” → “the system is observable and testable”.
- From “prompting” → “prompt + retrieval + evaluation”.
- From “manual checking” → “repeatable eval sets”.

In [None]:
from pathlib import Path
from typing import List


OUTPUT_DIR = Path("output")
OUTPUT_DIR.mkdir(exist_ok=True)


def level2_self_check_todo() -> List[str]:
    # TODO: add your own readiness checklist.
    return [
        "<todo: I can explain what RAG is and why it helps>",
        "<todo: I know how to build a small evaluation set>",
        "<todo: I understand prompt injection risk in retrieval systems>",
    ]


items = level2_self_check_todo()

out_path = OUTPUT_DIR / "LEVEL2_SELF_CHECK.md"
out_path.write_text("\n".join(["# Level 2 Self-Check", ""] + ["- " + x for x in items]) + "\n", encoding="utf-8")

print("Level 2 self-check:")
for item in items:
    print("-", item)
print("wrote:", out_path)

## References

- RAG overview: https://www.pinecone.io/learn/retrieval-augmented-generation/

## Self-check

- Can you explain how retrieval quality affects generation quality?
- Do you have a plan for eval sets and metrics?
- Do you know how to handle prompt injection risks?

## Appendix: Solutions (peek only after trying)

Reference implementation for `level2_self_check_todo`.

In [None]:
def level2_self_check_todo() -> List[str]:
    return [
        "I can explain what RAG is and why it helps.",
        "I can describe a minimal chunking + embedding + retrieval pipeline.",
        "I know how to build a small evaluation set and choose at least one metric.",
        "I understand prompt injection risk and can name at least one mitigation.",
        "I know why observability (logs/traces) matters for multi-step systems.",
    ]


solution_path = OUTPUT_DIR / "LEVEL2_SELF_CHECK_solution.md"
items = level2_self_check_todo()
solution_path.write_text("\n".join(["# Level 2 Self-Check", ""] + ["- " + x for x in items]) + "\n", encoding="utf-8")
print("wrote:", solution_path)