# Level 2 - Week 4 - 04 Chat Endpoint Contract

**Estimated time:** 60-90 minutes

## Learning Objectives

- Define chat request and response schemas
- Include citations and mode
- Keep /chat thin and testable


## Overview

Your `/chat` endpoint should be a thin layer:

- validate request
- call retrieval (same logic as `/search`)
- assemble context
- call model
- validate output (citations + mode rules)

## Contract theory (why a “thin /chat” works)

### Separation of concerns

Conceptually, your system has three distinct functions:

- retrieval: $R(q, \theta_r) \rightarrow E$ (get evidence)
- prompting/packing: $P(q, E, \theta_p) \rightarrow \text{prompt}$ (format evidence)
- generation: $G(\text{prompt}, \theta_g) \rightarrow y$ (produce answer)

Keeping `/chat` thin means:

- retrieval stays testable without generation
- failures have a single obvious location (retrieval vs packing vs generation)

### Invariants you want the contract to enforce

- **Traceability**: every citation points to a retrieved `chunk_id`.
- **Determinism of modes**: `mode` is driven by retrieval signals (empty hits, low score, conflicts), not random wording.
- **Observability**: you can log request inputs and retrieved outputs.

## Suggested `/chat` request/response

Request:

```json
{"question": "...", "top_k": 5, "filters": {}}
```

Response:

```json
{
  "answer": "...",
  "citations": [{"doc_id": "...", "chunk_id": "...", "snippet": "..."}],
  "mode": "answer"
}
```

Where `mode` is one of: `answer`, `clarify`, `refuse`.

## Failure modes (and what the contract prevents)

- Silent hallucination: returning `mode=answer` without valid citations.
- Un-debuggable behavior: retrieval and chat merged so you can’t inspect $E$.
- Threshold drift: embedding model/metric changes make old thresholds invalid (record thresholds in run artifacts).

## Practice Steps

- Create `ChatRequest` and `ChatResponse` models.
- Add a helper to build a response payload.
- Ensure every response includes `mode` and structured citations.

### Sample code

Chat request and response schema.


In [None]:
from pydantic import BaseModel, Field
from typing import Dict, List, Optional


class ChatRequest(BaseModel):
    question: str = Field(min_length=1)
    top_k: int = Field(default=5, ge=1, le=50)
    filters: dict | None = None


class Citation(BaseModel):
    doc_id: str
    chunk_id: str
    snippet: str


class ChatResponse(BaseModel):
    answer: Optional[str] = None
    citations: List[Citation] = Field(default_factory=list)
    mode: str

### Student fill-in

Build a response payload based on mode and citations.

Then decide what you want to validate before returning the response:

- `mode` is one of `answer|clarify|refuse`
- citations reference retrieved `chunk_id` values for this request
- if `mode=answer`, you should usually require at least one valid citation

In [None]:
ALLOWED_MODES = {"answer", "clarify", "refuse"}


def build_chat_response(answer: str | None, citations: list[dict], mode: str) -> dict:
    if mode not in ALLOWED_MODES:
        raise ValueError(f"invalid mode: {mode}")
    return {"answer": answer, "citations": citations, "mode": mode}


def citations_reference_retrieved(citations: list[dict], retrieved_chunk_ids: set[str]) -> bool:
    for c in citations:
        chunk_id = c.get("chunk_id")
        if not chunk_id or chunk_id not in retrieved_chunk_ids:
            return False
    return True


print(build_chat_response(answer=None, citations=[], mode="clarify"))

## Self-check

- Is mode always one of answer, clarify, refuse?
- Are citations traceable?
