Skip to content

Rate Limiting

Matt Dula edited this page Apr 18, 2026 · 1 revision

Rate Limiting

Per-API-key fixed-window throttling. Default: off. Opt-in globally, override per key.

Why

Agents loop. Agents retry. Agents have off-by-one bugs. Without a cap, a misbehaving agent can OOM your Postgres or DDoS your own /contacts list.

How it works

Every API-key request runs a single atomic UPDATE:

UPDATE api_keys
   SET usage_window_start = CASE
         WHEN usage_window_start IS NULL
              OR now() - usage_window_start >= interval '60 seconds'
         THEN now()
         ELSE usage_window_start END,
       usage_count = CASE
         WHEN usage_window_start IS NULL
              OR now() - usage_window_start >= interval '60 seconds'
         THEN 1
         ELSE usage_count + 1 END,
       last_used_at = now()
 WHERE id = :key_id
RETURNING usage_count, usage_window_start;

If the returned usage_count exceeds the limit, the request gets 429 with a Retry-After header carrying seconds remaining in the current window.

Fixed-window means a burst right before the reset + a burst right after can spike to 2× the limit for one second. If you need true sliding windows, that's v2. For almost every use case, fixed-window is fine.

Configure

Global default

API_KEY_RATE_LIMIT_PER_MINUTE=120   # requests / minute / key

0 (default) disables rate limiting globally.

Per-key override

At creation time:

curl -X POST http://localhost:8000/workspace/api-keys \
  -H "Authorization: Bearer nk_<admin>" \
  -d '{"name":"sdr-agent","role":"member","rate_limit_per_minute":60}'

Or via PATCH later (endpoint: v2). For now, revoke and re-create.

Resolution order: per-key rate_limit_per_minute → global default → off.

Response on limit

HTTP/1.1 429 Too Many Requests
Retry-After: 42
Content-Type: application/json

{"error":"rate limit exceeded (60/min); try again shortly"}

Agents should respect Retry-After. Claude's MCP client does this automatically; other clients vary.

Observing usage

The api_keys.usage_count + usage_window_start columns reflect the current window. For historical observation, mirror requests into your logging layer (Nakatomi logs to stdout) — dedicated rate-limit metrics are on the v2 roadmap.

Recommended limits

Use case Suggested limit
Interactive agent (Claude Desktop, Cursor) 60-120 / min
Batch import / bulk upsert 600 / min on a dedicated key
Webhook relay agent 30 / min (most loads are low)
Read-only dashboard user 30 / min
Service-account for integrations 300-600 / min

When in doubt: set a conservative number and bump it when you see 429s in logs. Cheaper than over-provisioning Postgres.

What's NOT rate-limited

  • User JWT requests (no counter on the User row in v1). Protect these at the load balancer if it matters.
  • Discovery endpoints (/health, /schema, /llms.txt, /.well-known/) — they're cheap and unauthenticated.

Testing

Set a low limit on a test key, loop:

for i in $(seq 1 100); do
  curl -o /dev/null -s -w "%{http_code}\n" \
    http://localhost:8000/contacts \
    -H "Authorization: Bearer nk_..."
done | sort | uniq -c

Expect a mix of 200s and 429s.

Clone this wiki locally