# Week 4 — Part 02: Retries, Backoff, and Idempotency

**Estimated time:** 90–120 minutes

**Prerequisites:** Python, basic HTTP concepts, exceptions

## Learning Objectives

By the end of this lab, you will be able to:

- Explain when retries help and when they hurt
- Implement retry policies with selective error handling
- Apply exponential backoff with jitter and caps
- Use idempotency keys to avoid duplicate side effects
- Design safe retryable operations with stable request identifiers

---

## Overview

Retries are for **transient** failures. Backoff prevents overload from getting worse. Idempotency ensures retries do not create duplicate side effects.

In this lab, we’ll build practical retry helpers, compare backoff strategies, and implement simple idempotency guards you can reuse in API or LLM workflows.

## Part 1: Retries are a control policy under uncertainty

If a single attempt succeeds with probability $q$, then with up to $R$ retries (so $R+1$ total attempts), the probability of eventual success is:

$$
P(\text{success}) = 1 - (1-q)^{R+1}
$$

Retries improve success probability, but they also increase:

- load on the provider
- cost (tokens, time, money)
- tail latency (some requests will take much longer)

Backoff and caps exist because you want to improve success probability **without amplifying overload**.

---

## Part 2: What to retry (and what not to retry)

Good candidates (usually transient):

- network timeouts
- temporary DNS failures
- HTTP 429 (rate-limited)
- HTTP 503 (service unavailable)
- occasional malformed JSON if you have a repair strategy

Bad candidates (usually permanent):

- invalid API key / auth failures
- “model not found” / wrong endpoint
- deterministic schema mismatch caused by a broken prompt
- invalid parameters (e.g., negative `max_tokens`)

Rule of thumb:

- retry **transient** failures
- do not retry **permanent** failures

Next, you’ll implement retry policies that encode this distinction.

In [None]:
from __future__ import annotations

import hashlib
import random
import time
from dataclasses import dataclass
from typing import Callable, Iterable, Optional, Sequence, Type


class TransientError(Exception):
    pass


class PermanentError(Exception):
    pass


@dataclass(frozen=True)
class RetryDecision:
    should_retry: bool
    reason: str


def classify_exception(exc: BaseException) -> RetryDecision:
    if isinstance(exc, TransientError):
        return RetryDecision(True, "transient")
    if isinstance(exc, PermanentError):
        return RetryDecision(False, "permanent")
    return RetryDecision(False, "unknown")


def simulated_call(p_success: float = 0.3) -> str:
    r = random.random()
    if r < p_success:
        return "ok"
    if r < p_success + 0.5:
        raise TransientError("temporary failure")
    raise PermanentError("bad request")

## Part 3: Implementing retries (selective, capped)

A retry loop needs:

- a maximum number of attempts
- classification of errors (retryable vs not)
- a delay policy between attempts (we’ll add backoff next)
- observability (log attempts, reasons)

In production you would also:

- propagate a request id
- record metrics (attempt count, success rate, reasons)
- consider per-endpoint policies (not one-size-fits-all)

Now implement a small, dependency-free retry helper.

In [None]:
def retry(
    fn: Callable[[], str],
    *,
    max_attempts: int,
    is_retryable: Callable[[BaseException], bool],
    sleep_seconds: Callable[[int], float],
) -> str:
    last: Optional[BaseException] = None
    for attempt in range(1, max_attempts + 1):
        try:
            return fn()
        except BaseException as exc:
            last = exc
            decision = classify_exception(exc)
            print(f"attempt={attempt} error={type(exc).__name__} class={decision.reason}")
            if attempt >= max_attempts:
                break
            if not is_retryable(exc):
                raise
            delay = float(sleep_seconds(attempt))
            if delay > 0:
                time.sleep(delay)
    assert last is not None
    raise last


def is_retryable_transient(exc: BaseException) -> bool:
    return classify_exception(exc).should_retry


def fixed_delay(delay: float) -> Callable[[int], float]:
    return lambda attempt: delay


try:
    out = retry(
        lambda: simulated_call(p_success=0.25),
        max_attempts=6,
        is_retryable=is_retryable_transient,
        sleep_seconds=fixed_delay(0.1),
    )
    print("final=", out)
except Exception as e:
    print("failed=", type(e).__name__, str(e))

## Part 4: Backoff (and why jitter matters)

Backoff is “wait a bit longer each retry”. A common policy is exponential backoff:

$$
 t_k = \min(t_{\max},\ t_0\cdot 2^k)
$$

Even a simple progression helps:

- 0.5s
- 1s
- 2s
- 4s

Always cap retries.

If your system has multiple layers (your code retries + provider retries), be careful: retries can multiply.

### Jitter

If many clients retry at the same time (e.g., after a brief outage), synchronized retries can create a second spike.

Adding **jitter** means each client waits a slightly different amount of time, spreading load more smoothly.

In [None]:
def exp_backoff_delay(
    attempt: int,
    *,
    base: float = 0.5,
    cap: float = 8.0,
) -> float:
    return min(cap, base * (2 ** (attempt - 1)))


def exp_backoff_with_full_jitter(
    attempt: int,
    *,
    base: float = 0.5,
    cap: float = 8.0,
) -> float:
    raw = exp_backoff_delay(attempt, base=base, cap=cap)
    return random.uniform(0.0, raw)


for a in range(1, 8):
    d = exp_backoff_delay(a, base=0.5, cap=8.0)
    j = exp_backoff_with_full_jitter(a, base=0.5, cap=8.0)
    print(f"attempt={a} exp={d:.2f}s jittered={j:.2f}s")

In [None]:
try:
    out = retry(
        lambda: simulated_call(p_success=0.25),
        max_attempts=7,
        is_retryable=is_retryable_transient,
        sleep_seconds=lambda attempt: exp_backoff_with_full_jitter(attempt, base=0.2, cap=2.0),
    )
    print("final=", out)
except Exception as e:
    print("failed=", type(e).__name__, str(e))

## Part 5: Idempotency

Retries are only safe if your operation is **idempotent** (or you make it idempotent).

Even if you are “just calling an LLM”, idempotency becomes critical as soon as you add side effects:

- saving to a database
- creating tickets
- sending emails
- charging money

Mental model:

- “idempotent” means “doing it twice has the same effect as doing it once”
- you often implement this by deduplicating on a stable key (request id / idempotency key)

Best practice:

- generate a request id and log it
- where supported (e.g. payment APIs), send an **Idempotency-Key** header

Below we build a tiny idempotency guard you can reuse in scripts, services, or notebooks.

In [None]:
class IdempotencyStore:
    def __init__(self):
        self._store: dict[str, object] = {}

    def get(self, key: str) -> object:
        return self._store[key]

    def has(self, key: str) -> bool:
        return key in self._store

    def set(self, key: str, value: object) -> None:
        self._store[key] = value


def make_idempotency_key(operation: str, payload: dict) -> str:
    # Stable key: operation + sorted payload items
    data = f"{operation}:{sorted(payload.items())}".encode("utf-8")
    return hashlib.sha256(data).hexdigest()


def run_idempotent(
    *,
    store: IdempotencyStore,
    operation: str,
    payload: dict,
    fn: Callable[[], object],
) -> object:
    key = make_idempotency_key(operation, payload)
    if store.has(key):
        print("dedupe hit:", key[:12])
        return store.get(key)

    print("executing:", operation, "key=", key[:12])
    result = fn()
    store.set(key, result)
    return result


store = IdempotencyStore()

payload = {"user": "alice", "amount": 100}

def side_effectful_transfer() -> dict:
    # Simulate a real side effect (creating a transaction id)
    # If this runs twice without idempotency, you'd duplicate a transfer.
    txn = f"txn_{random.randint(10000, 99999)}"
    return {"transaction_id": txn, "status": "ok"}

r1 = run_idempotent(store=store, operation="transfer", payload=payload, fn=side_effectful_transfer)
r2 = run_idempotent(store=store, operation="transfer", payload=payload, fn=side_effectful_transfer)
print("first:", r1)
print("second:", r2)
print("same result object semantics?", r1 == r2)

### Idempotency key choice

In real systems, you usually want one of these:

- **Client-provided key** (recommended): caller generates a UUID and sends it
- **Derived key**: server derives it from stable request content

Tradeoffs:

- Client-provided keys handle “same intent, different payload ordering” and don’t require perfect canonicalization.
- Derived keys are convenient in scripts but can be risky if the payload contains timestamps or nondeterministic fields.

A very common approach:

- Accept an explicit `request_id` / `idempotency_key` parameter.
- If absent, generate one and return it to the caller.
- Log the key on every attempt and every retry.

---

## Part 6: Practice exercises

These are intentionally incomplete. Implement them to practice the patterns.

In [None]:
def retry_with_policy(
    fn: Callable[[], object],
    *,
    max_attempts: int,
    classify: Callable[[BaseException], RetryDecision],
    delay_for_attempt: Callable[[int], float],
) -> object:
    # TODO: implement
    # Requirements:
    # - Attempt up to max_attempts
    # - If classify(exc).should_retry is False, raise immediately
    # - Otherwise sleep for delay_for_attempt(attempt) and retry
    # - Print attempt number + delay + classification reason
    pass


# TODO: write a quick test that exercises:
# - transient failures get retried
# - permanent failures do not get retried
# - delays are capped (if your delay function caps)
print("Implement retry_with_policy() and tests.")

In [None]:
def canonicalize_payload(payload: dict) -> tuple[tuple[str, object], ...]:
    # TODO: implement a stable canonical form
    # Hint: sort by key; ensure nested dict/list are handled deterministically
    # Return a fully hashable structure.
    pass


def make_idempotency_key_v2(operation: str, payload: dict, *, client_key: str | None = None) -> str:
    # TODO:
    # - If client_key is provided, incorporate it (or return it directly)
    # - Otherwise use canonicalize_payload(payload)
    # - Return a hex digest
    pass


print("Implement canonicalize_payload() and make_idempotency_key_v2().")

## Part 7: Common pitfalls and best practices

- Retry budgets: cap attempts, cap maximum total time, and cap maximum delay.
- Avoid retry storms: add jitter, and consider circuit breaking when error rates spike.
- Watch for retry multiplication: nested retries (client + gateway + provider) can explode attempts.
- Log a stable request id: make debugging and deduplication possible.
- Keep retry policy close to the call site: different operations have different risk profiles.

---

## Summary

You practiced:

- classifying failures into transient vs permanent
- implementing retry loops with caps and policies
- applying exponential backoff with jitter
- ensuring idempotency using a stable key + dedupe store

## References

- Tenacity: https://tenacity.readthedocs.io/
- Stripe idempotency concept: https://stripe.com/docs/idempotency

In [None]:
"""Optional: Using a retry library (Tenacity)

In production, you’ll often prefer a battle-tested library rather than maintaining retry logic yourself.

If you choose a library, verify you still control:
- which errors are retryable
- maximum attempts and/or total retry time
- backoff + jitter behavior
- logging/metrics hooks

Tenacity:
https://tenacity.readthedocs.io/
"""