
---

# 🚧 Rate Limiting (Middleware) & Throttling

> **Intent** → Protect your API and upstream dependencies by **controlling request rates** per client, route, or resource.

---

## 🧭 Why Rate Limit

* Prevent **abuse** & brute force
* Protect **DB/external APIs** from overload
* Ensure **fair use** across tenants/keys
* Stabilize latency during traffic spikes

---

## 🧮 Core Algorithms

* **Token Bucket** → steady average with short bursts (most common)
* **Leaky Bucket** → smooth, constant drain (good for shaping)
* **Fixed Window** → simple, risk of edge bursts
* **Sliding Window** → fairer than fixed, slightly heavier

---

## 🎯 Keys & Dimensions

* **Who**: API key, user ID, IP, tenant, OAuth client
* **What**: route/path, method, resource (e.g., org\_id)
* **Where**: region/edge vs core API
* Combine dimensions for granular control (e.g., `tenant + route`)

---

## 🗄️ State & Storage

* **In-memory** → single-instance demos; not distributed
* **Redis** → production-grade, atomic ops, expirations
* **CDN/Edge** → offload at the edge (per-pop limits)
* Persist **counters** with TTLs; use Lua scripts for atomic updates

---

## 🔁 Policies & Tiers

* Default global limit (e.g., **100 req/min**)
* Per-route stricter limits (e.g., login, search)
* Premium tiers with higher quotas/burst sizes
* **Exempt** internal healthchecks, webhooks from trusted partners (carefully)

---

## 🚦 Responses & Headers

* On limit breach: **429 Too Many Requests** with retry guidance
* Return standard headers:

  * `X-RateLimit-Limit`
  * `X-RateLimit-Remaining`
  * `X-RateLimit-Reset` (epoch seconds)
* Keep error bodies **machine-readable** (code, retry\_after)

---

## 🔐 Security & Fairness

* Normalize **IPs** behind proxies (trust only known `X-Forwarded-For`)
* Separate **auth’d users** from anonymous IP limits
* Block obvious **bot patterns**; slow-path suspicious clients

---

## ⚙️ Implementation Tips

* Place limiter **before** heavy work (auth/DB) when possible
* Use **idempotency keys** to avoid double-charging retries
* Bound **burst** size to protect downstream pools
* Make policies **config-driven** (per env/tenant)

---

## 📊 Observability

* Track **allow/deny counts**, **p95 latency**, **hot routes**, **top keys**
* Alert on **deny spikes** and **sustained saturation**
* Correlate with **upstream errors** (DB 5xx, external 429)

---

## 🧪 Testing

* Simulate burst & steady traffic; assert 429 timing
* Verify headers and **reset semantics**
* Test **distributed** behavior (multiple instances with shared Redis)

---

## ✅ Outcome

A **predictable, resilient** API under load: abusive traffic is **throttled**, legitimate clients get **fair access**, and downstream systems stay **healthy**.
