```{contents}
```
## Rate Limiting


**Rate limiting** is a control mechanism that **restricts how many requests a client can make** to a system within a defined time window.

It protects systems from:

* Abuse and misuse
* Traffic spikes
* Resource exhaustion
* Cost overruns (API / LLM usage)

---

### Where Rate Limiting Fits

```
Client Request
      ↓
Rate Limiter ── allowed → Continue Processing
      ↓ blocked
Return 429 Too Many Requests
```

---

### Core Rate Limiting Models

| Model          | Description                      |
| -------------- | -------------------------------- |
| Fixed Window   | Limit N requests per time window |
| Sliding Window | Smooth limit across time         |
| Token Bucket   | Refill tokens over time          |
| Leaky Bucket   | Constant outflow rate            |

---

### Fixed Window Rate Limiting

#### Demonstration (In-Memory)

```python
import time

WINDOW = 60          # seconds
LIMIT = 5

requests = {}

def allow_request(client_id):
    now = time.time()
    window_start = now - WINDOW

    if client_id not in requests:
        requests[client_id] = []

    requests[client_id] = [t for t in requests[client_id] if t > window_start]

    if len(requests[client_id]) < LIMIT:
        requests[client_id].append(now)
        return True

    return False
```

---

### Token Bucket Rate Limiting

#### Demonstration

```python
class TokenBucket:
    def __init__(self, rate, capacity):
        self.rate = rate
        self.capacity = capacity
        self.tokens = capacity
        self.last = time.time()

    def allow(self):
        now = time.time()
        self.tokens = min(self.capacity,
                          self.tokens + (now - self.last) * self.rate)
        self.last = now

        if self.tokens >= 1:
            self.tokens -= 1
            return True
        return False
```

---

### Redis-Based Distributed Rate Limiter

#### Demonstration

```python
import redis, time

r = redis.Redis()

def allow_request(key, limit, window):
    now = int(time.time())
    bucket = now // window
    redis_key = f"rate:{key}:{bucket}"
    
    count = r.incr(redis_key)
    if count == 1:
        r.expire(redis_key, window)

    return count <= limit
```

---

### LLM API Protection Example

```python
def handle_request(user_id, prompt):
    if not allow_request(user_id, 10, 60):
        return "429 Too Many Requests"

    return llm.invoke(prompt).content
```

---

### Mental Model

```
Rate Limiting = Traffic control for your system
```

---

### Key Takeaways

* Essential for protecting APIs and LLM systems
* Prevents abuse and runaway costs
* Redis-based limiters are production standard
* Combine with caching for maximum stability