# Level 2 - Week 9 - 01 Observability and Tracing

**Estimated time:** 60-90 minutes

## Learning Objectives

- Define log fields
- Identify spans
- Track latency and errors


## Overview

- Logs + request IDs answer: “what happened?”
- Traces answer: “where did time go?”

Instrument:

- retrieval queries
- model calls
- agent steps

## Underlying theory: SLIs, SLOs, and why observability exists

### SLIs (what you measure)

An SLI (service level indicator) is a measurable quantity like:

- request latency
- error rate
- availability

### SLOs (what you promise)

An SLO is a target on an SLI, for example:

- 99% of `/chat` requests complete within 2 seconds
- error rate under 1% over a day

### Why percentiles (p50/p95) matter

Average latency hides tail problems.

- p50 tells you the typical case
- p95 tells you whether a meaningful fraction of users are suffering

In LLM systems, tail latency is common (network calls + retries + variable model times), so percentiles are more informative than means.

## Practice Steps

- Define a minimal set of required log fields.
- Define core spans for `/chat` and agent loops.
- Add RAG-specific fields (top_k, context size, model name).

### Sample code

Minimal fields and spans.


In [None]:
LOG_FIELDS = ['request_id', 'path', 'status_code', 'latency_ms', 'component']
SPANS = ['ingestion', 'retrieval', 'generation', 'validation']

print(LOG_FIELDS)
print(SPANS)


### Student fill-in

Add RAG-specific fields (top_k, context length).


In [None]:
LOG_FIELDS = ["request_id", "path", "status_code", "latency_ms", "component"]

RAG_FIELDS = [
    "top_k",
    "n_chunks_returned",
    "context_chars",
    "model_name",
]

print("base_log_fields:", LOG_FIELDS)
print("rag_fields:", RAG_FIELDS)


def percentile(values: list[float], p: float) -> float:
    if not values:
        return 0.0
    xs = sorted(values)
    k = int(round((p / 100.0) * (len(xs) - 1)))
    return float(xs[k])


latencies_ms = [110, 120, 140, 180, 250, 310, 900]
print("p50_ms:", percentile(latencies_ms, 50))
print("p95_ms:", percentile(latencies_ms, 95))

## Self-check

- Do logs include request_id?
- Are spans defined for critical path?


### Exercise 1: Reliability targets (timeouts + retries)

Pick target values for:

- ingestion timeout
- retrieval timeout
- generation timeout
- max retries

Keep them conservative (bounded retries, explicit timeouts).

Why targets matter:

- timeouts prevent hanging requests
- bounded retries prevent infinite loops and retry storms

A good target is specific enough that you can code it into configs and logs.

### Exercise 1a: Fill target values

Adjust the values to match your system and deployment environment.

In [None]:
RELIABILITY_TARGETS = {
    "ingestion_timeout_s": 10,
    "retrieval_timeout_s": 2,
    "generation_timeout_s": 15,
    "max_retries": 2,
}

print(RELIABILITY_TARGETS)

### Exercise 2: Observability spans (critical path)

List the spans you will instrument.

Rule of thumb:

- make `/chat` latency decomposable into: retrieval vs generation vs validation
- log agent loops as step spans so runaway behavior is obvious

In [None]:
SPANS = [
    "http.request",
    "retrieval.search",
    "context.assembly",
    "llm.generate",
    "citations.validate",
]

print(SPANS)

## Self-check

- Do logs include `request_id` and `component`?
- Can you break `/chat` latency into retrieval vs model vs total?
- Are span names stable enough that you can grep for them in logs?