# Week 4 — Part 05: A reusable `llm_client.py` skeleton

**Estimated time:** 90–150 minutes

## Learning Objectives

- Understand the client module as a reliability boundary
- Define a provider-agnostic request payload + stable cache key
- Implement a minimal `LLMClient` skeleton (timeouts, retries, caching, logging)
- Leave clear extension points for provider-specific calls


## Overview

Your goal is a single module you can reuse across projects that provides:

- timeouts
- retries + backoff
- basic rate limit handling
- basic caching
- logging

This is a Level 1 skeleton (provider-agnostic). You can adapt it to OpenAI/Anthropic/etc.

---

## Underlying theory: the client is a reliability boundary

Your application code wants a simple contract:

- “given a request, return text or raise a clear error”

The client module enforces reliability invariants:

- bounded waiting (timeouts)
- bounded retries (caps)
- debuggability (logs with request id / attempt)
- cost control (caching)

Keeping these concerns centralized prevents every script from reinventing them inconsistently.

## Skeleton design

We’ll define:

- a request payload (model + prompt + settings)
- a stable cache key
- a `call()` method

The cache key must represent the “effective input” to the model. If two requests differ in any setting that can change output, they must not share a key.

Next you’ll implement a provider-agnostic skeleton you can adapt later.

In [None]:
from __future__ import annotations

import hashlib
import json
import logging
import time
import uuid
from dataclasses import asdict, dataclass


logger = logging.getLogger(__name__)
logging.basicConfig(level=logging.INFO)


@dataclass(frozen=True)
class LLMRequest:
    model: str
    prompt: str
    temperature: float = 0.0


def make_cache_key(req: LLMRequest) -> str:
    raw = json.dumps(asdict(req), sort_keys=True, ensure_ascii=False)
    return hashlib.sha256(raw.encode("utf-8")).hexdigest()


class SimpleMemoryCache:
    def __init__(self) -> None:
        self._store: dict[str, str] = {}

    def get(self, key: str) -> str | None:
        return self._store.get(key)

    def set(self, key: str, value: str) -> None:
        self._store[key] = value


class LLMClient:
    def __init__(self, cache: SimpleMemoryCache | None = None) -> None:
        self._cache = cache or SimpleMemoryCache()

    def _provider_call(self, req: LLMRequest, *, timeout_s: float) -> str:
        # TODO: Implement provider-specific HTTP/API call.
        # This method must:
        # - respect timeout_s
        # - raise clear exceptions for retry classification
        raise NotImplementedError("Implement provider-specific call")

    def call(self, req: LLMRequest, *, timeout_s: float = 30.0, max_retries: int = 2) -> str:
        request_id = str(uuid.uuid4())
        cache_key = make_cache_key(req)

        cached = self._cache.get(cache_key)
        if cached is not None:
            logger.info("llm_cache_hit", extra={"request_id": request_id, "model": req.model})
            return cached

        last_err: Exception | None = None
        for attempt in range(max_retries + 1):
            t0 = time.time()
            try:
                text = self._provider_call(req, timeout_s=timeout_s)
                logger.info(
                    "llm_call_ok",
                    extra={"request_id": request_id, "model": req.model, "latency_s": time.time() - t0, "attempt": attempt},
                )
                self._cache.set(cache_key, text)
                return text
            except Exception as e:
                last_err = e
                logger.warning(
                    "llm_call_failed",
                    extra={"request_id": request_id, "model": req.model, "latency_s": time.time() - t0, "attempt": attempt, "error_type": type(e).__name__},
                )
                if attempt < max_retries:
                    time.sleep(min(2 ** attempt, 4))

        raise RuntimeError(f"LLM call failed after retries: {last_err}")

## Practice exercises

1) Extend `LLMRequest` to include `system_prompt` and update `make_cache_key()` accordingly.

2) Add jitter to the backoff so many clients do not retry at the same times.

3) Add a simple 429 handler:

- if `Retry-After` is present and small enough, sleep that long
- otherwise backoff

4) Add structured output validation (from Week 3) as an optional mode.

In [None]:
def add_jitter(delay_s: float) -> float:
    # TODO: implement jitter.
    # Example: "full jitter" uniform(0, delay_s).
    raise NotImplementedError


def backoff_delay(attempt: int, *, base: float = 0.5, cap: float = 8.0) -> float:
    # TODO: implement exponential backoff with cap.
    raise NotImplementedError


print("Implement add_jitter() and backoff_delay().")

## Next steps

- Implement `_provider_call()` using your chosen provider SDK.
- Add structured output validation from Week 3.
- Use this client in later pipeline/capstone work.

## References

- Python logging: https://docs.python.org/3/library/logging.html
- Tenacity (for more robust retries): https://tenacity.readthedocs.io/