# Week 4 — Part 04: Caching and observability (logging)

**Estimated time:** 75–120 minutes

---

## Pre-study (Level 0)

Level 1 assumes Level 0 is complete. If you need a refresher on production constraints, debugging/observability habits:

- [Level 1 Pre-study index](../PRESTUDY.md)
- [Level 0 — Chapter 5: Resource Monitoring and Containerization](../../level_0/Chapters/5/Chapter5.md)

---

## What success looks like (end of Part 04)

- You can construct a cache key that changes when any relevant request field changes.
- You can demonstrate a cache hit vs cache miss.
- You can emit a minimal request log line with request id, model, latency, and success/failure.

### Checkpoint

After running this notebook, you should be able to:

- show one cache hit log event
- show at least one request log line that includes request id + latency

## Learning Objectives

- Design safe cache keys for pure-ish LLM calls
- Implement an in-memory cache and a simple file cache
- Add minimum viable request logging (request id, latency, success/failure, attempt)

## Overview

Two practical realities of LLM APIs:

- calls can be expensive
- failures are hard to debug without logs

In this lab you will:

- build a safe cache key (include every field that can change output)
- demonstrate cache hit vs miss
- emit a minimal request log line (request_id, latency, success/failure)

If you want the deeper caching theory, use the Level 0 links at the top of the notebook.

## Caching

Cache when:

- the same request repeats
- you are iterating on downstream code

Cache key must include everything that changes output:

- model name
- system prompt
- user prompt
- temperature

Common cache pitfalls:

- forgetting system prompt / tool context in the key
- caching when temperature is high (outputs are intentionally stochastic)
- caching errors (you accidentally “remember” a failure)

In [None]:
from __future__ import annotations

import hashlib
import json
import logging
import time
from dataclasses import asdict, dataclass
from pathlib import Path


@dataclass(frozen=True)
class LLMRequest:
    model: str
    system_prompt: str
    user_prompt: str
    temperature: float = 0.0


def make_cache_key(req: LLMRequest) -> str:
    # Key must include every field that can change the output.
    raw = json.dumps(asdict(req), sort_keys=True, ensure_ascii=False)
    return hashlib.sha256(raw.encode("utf-8")).hexdigest()


req = LLMRequest(model="demo", system_prompt="You are helpful.", user_prompt="Hello", temperature=0.0)
print("cache_key=", make_cache_key(req)[:12])

In [None]:
from typing import Optional


class SimpleMemoryCache:
    def __init__(self) -> None:
        self._store = {}

    def get(self, key: str) -> Optional[str]:
        return self._store.get(key)

    def set(self, key: str, value: str) -> None:
        self._store[key] = value


cache = SimpleMemoryCache()
key = "k1"
print(cache.get(key))
cache.set(key, "value")
print(cache.get(key))

In [None]:
from typing import Dict, Optional


class SimpleFileCache:
    def __init__(self, path: Path) -> None:
        self.path = path
        self.path.parent.mkdir(parents=True, exist_ok=True)
        if not self.path.exists():
            self.path.write_text("{}", encoding="utf-8")

    def _read(self) -> Dict[str, str]:
        return json.loads(self.path.read_text(encoding="utf-8"))

    def _write(self, data: Dict[str, str]) -> None:
        self.path.write_text(json.dumps(data, ensure_ascii=False, sort_keys=True, indent=2), encoding="utf-8")

    def get(self, key: str) -> Optional[str]:
        data = self._read()
        return data.get(key)

    def set(self, key: str, value: str) -> None:
        data = self._read()
        data[key] = value
        self._write(data)


file_cache = SimpleFileCache(Path("output/cache/llm_cache.json"))
k = "demo"
print(file_cache.get(k))
file_cache.set(k, "hello")
print(file_cache.get(k))

## Logging (minimum viable request log)

A minimal request log should include:

- request id
- model
- latency
- success/failure
- failure location (network vs parsing vs validation)

Two extra fields that help later:

- prompt length (or token estimate)
- retry attempt count

In [None]:
logger = logging.getLogger("demo")
logging.basicConfig(level=logging.INFO)


def fake_llm_call(text: str) -> str:
    time.sleep(0.05)
    return text.upper()


def logged_call(request_id: str, req: LLMRequest) -> str:
    t0 = time.time()
    try:
        out = fake_llm_call(req.user_prompt)
        logger.info(
            "llm_call_ok",
            extra={
                "request_id": request_id,
                "model": req.model,
                "latency_s": time.time() - t0,
                "prompt_len": len(req.system_prompt) + len(req.user_prompt),
                "attempt": 0,
            },
        )
        return out
    except Exception as e:
        logger.warning(
            "llm_call_failed",
            extra={
                "request_id": request_id,
                "model": req.model,
                "latency_s": time.time() - t0,
                "attempt": 0,
                "error_type": type(e).__name__,
            },
        )
        raise


print(logged_call("req_001", req))

In [None]:
def cached_call(cache_obj: SimpleMemoryCache, req: LLMRequest) -> str:
    key = make_cache_key(req)
    hit = cache_obj.get(key)
    if hit is not None:
        logger.info("llm_cache_hit", extra={"model": req.model})
        return hit

    out = fake_llm_call(req.user_prompt)
    cache_obj.set(key, out)
    logger.info("llm_cache_set", extra={"model": req.model})
    return out


cache2 = SimpleMemoryCache()
print(cached_call(cache2, req))
print(cached_call(cache2, req))

In [None]:
def make_cache_key_todo(req: LLMRequest) -> str:
    # TODO: extend the key so it would remain correct if you add fields like:
    # - top_p
    # - max_tokens
    # - tool schema / tool definitions
    # - few-shot examples
    return make_cache_key(req)


def should_cache(req: LLMRequest) -> bool:
    # TODO: implement policy, e.g.
    # - cache only if temperature == 0.0
    # - avoid caching very large prompts
    return req.temperature == 0.0 and (len(req.system_prompt) + len(req.user_prompt)) <= 10_000


print("make_cache_key_todo:", make_cache_key_todo(req)[:12])
print("should_cache:", should_cache(req))

## References

- `functools.lru_cache`: https://docs.python.org/3/library/functools.html#functools.lru_cache
- Python logging: https://docs.python.org/3/library/logging.html

## Appendix: Solutions (peek only after trying)

Reference implementations for `make_cache_key_todo` and `should_cache`.

In [None]:
def make_cache_key_todo(req: LLMRequest) -> str:
    # Safe approach: serialize all known fields deterministically.
    raw = json.dumps(asdict(req), sort_keys=True, ensure_ascii=False)
    return hashlib.sha256(raw.encode("utf-8")).hexdigest()


def should_cache(req: LLMRequest) -> bool:
    # Conservative policy for reproducibility + correctness.
    if req.temperature != 0.0:
        return False
    prompt_len = len(req.system_prompt) + len(req.user_prompt)
    return prompt_len <= 10_000


print("solution key:", make_cache_key_todo(req)[:12])
print("solution should_cache:", should_cache(req))