# Level 2 - Week 4 - 01 Context Assembly and Grounding

**Estimated time:** 60-90 minutes

## Learning Objectives

- Assemble context blocks from hits
- Control context length
- Keep doc_id and chunk_id visible


## Overview

In RAG, you are controlling *what the model is allowed to use*.

Your goal:

- the model answers from retrieved context
- it does not invent unsupported claims

## Grounding theory (what you are doing mathematically)

### Grounding as constrained generation

In an ideal grounded system, the modelâ€™s answer should be a function of:

- the user question $q$
- the retrieved evidence $E = \{e_1, \ldots, e_k\}$

Conceptually:

$$
\text{answer} \approx g(q, E)
$$

Practical implication:

- if $E$ is empty or irrelevant, the system should not guess
- refusal/clarification is a decision based on retrieval signals

### Context budget constraint

Even before the model starts answering, you spend tokens on structure:

$$
C \approx T_{\text{system}} + T_{\text{context}} + T_{\text{question}} + T_{\text{answer}}
$$

So your context packing should keep $T_{\text{context}}$ high-signal:

- keep metadata small but stable (`chunk_id`, `doc_id`, `url`)
- truncate chunk text if needed, but keep enough to preserve key sentences

## Practice Steps

- Assemble a `CONTEXT` block from ranked hits.
- Keep ordering stable (rank order).
- Enforce a max length (chars as a proxy for tokens).

### Sample code

Structured context assembly with metadata.


In [None]:
def assemble_context(hits: list[dict], max_chars: int = 800, per_chunk_max_chars: int = 300) -> str:
    parts: list[str] = ["CONTEXT:"]
    total = len(parts[0])

    for idx, h in enumerate(hits, start=1):
        doc_id = h.get("doc_id", "")
        chunk_id = h.get("chunk_id", "")
        url = h.get("url", "")
        text = h.get("text", "") or ""

        if per_chunk_max_chars > 0 and len(text) > per_chunk_max_chars:
            text = text[:per_chunk_max_chars] + "..."

        meta = f"doc_id={doc_id} chunk_id={chunk_id}"
        if url:
            meta = meta + f" url={url}"

        entry = f"[{idx}] {meta} text=\"{text}\""
        if total + 1 + len(entry) > max_chars:
            break

        parts.append(entry)
        total += 1 + len(entry)

    return "\n".join(parts)

### Exercise 1: Assemble a context block

Provide a list of ranked hits and inspect the packed `CONTEXT` output.

Verify:

- `chunk_id` values are preserved (for citations)
- order is stable (rank order)
- output respects your `max_chars` budget

In [None]:
hits = [
    {"doc_id": "fastapi", "chunk_id": "fastapi#00001", "url": "https://fastapi.tiangolo.com/", "text": "FastAPI is a modern, fast web framework for building APIs with Python."},
    {"doc_id": "openapi", "chunk_id": "openapi#00002", "url": "https://spec.openapis.org/oas/latest.html", "text": "OpenAPI is a specification for describing REST APIs in a machine-readable format."},
]

print(assemble_context(hits, max_chars=260, per_chunk_max_chars=80))

## Self-check

- Are `chunk_id` values preserved in the context?
- Is ordering stable (rank order)?
- Is the context length capped?
- Do you log the final set of `chunk_id`s passed to the model (not just the retrieved set)?

## Practical usage: what to log for grounding failures

To debug hallucinations and citation issues, log:

- the search query
- retrieved chunk ids + scores
- the exact chunk ids passed into the prompt (after any truncation)
- the exact prompt (or a hash + saved prompt artifact)

Logging the final prompt (or an artifact you can reproduce) is often the fastest way to debug grounding failures.

### Exercise 2: Record a prompt artifact shape

Write a minimal function that produces a prompt string with:

- short rules
- a CONTEXT block
- the question

Then print it (or compute a hash) so you can compare prompt changes across runs.

# Exercise 2: Record a prompt artifact shape
#
# Goal: produce a prompt string with:
# - short rules
# - a CONTEXT block
# - the question
#
# You can print it (or compute a hash) so you can compare prompt changes across runs.
#
# Note: the runnable reference implementation is provided in a later code cell.

In [None]:
def format_citations(hits: list[dict], snippet_chars: int = 200) -> list[dict]:
    out: list[dict] = []
    for h in hits:
        text = h.get("text", "") or ""
        out.append(
            {
                "doc_id": h.get("doc_id", ""),
                "chunk_id": h.get("chunk_id", ""),
                "snippet": text[:snippet_chars],
            }
        )
    return out


print(format_citations(hits))

# (Moved) Citation formatting implementation is now in the code cell above.
#
# Keep citations mechanically checkable:
# - preserve chunk_id exactly
# - snippet should be copied from the chunk text (or a substring)

In [None]:
def decide_mode_minimal(hits: list[dict]) -> str:
    return "clarify" if not hits else "answer"


print(decide_mode_minimal(hits))
print(decide_mode_minimal([]))

# (Moved) Minimal mode decision implementation is now in the code cell above.
#
# Next step (Week 4 Part 03): replace this with a score-based threshold rule and calibrate the threshold on labeled in-KB vs out-of-KB questions.

In [None]:
# Student fill-in
#
# - Change max_chars and observe which chunks get dropped.
# - Add a third hit with long text and confirm per-chunk truncation works.
# - Decide which fields you want to include for citations (e.g., url) and update the context/citation shape accordingly.

In [None]:
import hashlib


def build_prompt(question: str, context_block: str) -> str:
    return (
        "You are a grounded assistant.\n\n"
        "Rules:\n"
        "- Use only the CONTEXT.\n"
        "- If the CONTEXT is insufficient, respond with mode=clarify or mode=refuse.\n"
        "- For every factual claim, include a citation referencing one of the chunk_id values.\n\n"
        f"{context_block}\n\n"
        f"Question: {question}\n"
    )


context_block = assemble_context(hits, max_chars=400, per_chunk_max_chars=120)
prompt = build_prompt("What is FastAPI?", context_block)
print(prompt)
print("prompt_sha256:", hashlib.sha256(prompt.encode("utf-8")).hexdigest())