# Level 2 - Week 9 - 02 Rate Limiting and Retry Policies

**Estimated time:** 60-90 minutes

## Learning Objectives

- Define rate limit policy
- Set retry bounds
- Clarify retryable errors


## Overview

Rate limiting protects the service and controls cost.

Retry is not free:

- retries amplify load

Use:

- bounded retries
- exponential backoff
- idempotency where possible

## Underlying theory: retries amplify load

If your system issues $n$ attempts per user request on average, downstream load multiplies by $n$.

Approximation:

$$
\text{effective QPS} \approx \text{incoming QPS} \cdot \mathbb{E}[\text{attempts}]
$$

So a “small” retry policy can create large load spikes during outages.

## Rate limiting intuition (token bucket)

- tokens refill at rate $r$
- bucket capacity $b$ allows short bursts
- each request consumes 1 token

Practical implication:

- allow bursts while enforcing an average rate
- rate limit expensive endpoints more aggressively (`/ingest`, agent loops)

## Practice Steps

- Write a minimal policy dict.
- Decide which errors are retryable.
- Implement `should_retry` and a backoff schedule.

### Sample code

Policy template and retry decision.


In [None]:
POLICY = {
    'max_attempts': 3,
    'base_delay_ms': 200,
    'max_delay_ms': 2000,
    'rate_limit_rpm': 60,
}

RETRYABLE = {'timeout', '429', '5xx'}

print(POLICY)
print(RETRYABLE)


### Student fill-in

Add a should_retry helper.


In [None]:
def should_retry(error_type: str) -> bool:
    return error_type in {"timeout", "429", "5xx"}


def backoff_delay_ms(attempt_index: int, base_delay_ms: int, max_delay_ms: int) -> int:
    delay = base_delay_ms * (2**attempt_index)
    return min(max_delay_ms, delay)


for e in ["timeout", "validation_error", "429", "401", "5xx"]:
    print(e, "->", should_retry(e))

for k in range(5):
    print("attempt", k, "delay_ms", backoff_delay_ms(k, base_delay_ms=200, max_delay_ms=2000))

## Self-check

- Are retries bounded?
- Are rate limits documented?
