Rate Limiting

Per-API-key fixed-window throttling. Default: off. Opt-in globally, override per key.

Why

Agents loop. Agents retry. Agents have off-by-one bugs. Without a cap, a misbehaving agent can OOM your Postgres or DDoS your own /contacts list.

How it works

Every API-key request runs a single atomic UPDATE:

UPDATE api_keys
   SET usage_window_start = CASE
         WHEN usage_window_start IS NULL
              OR now() - usage_window_start >= interval '60 seconds'
         THEN now()
         ELSE usage_window_start END,
       usage_count = CASE
         WHEN usage_window_start IS NULL
              OR now() - usage_window_start >= interval '60 seconds'
         THEN 1
         ELSE usage_count + 1 END,
       last_used_at = now()
 WHERE id = :key_id
RETURNING usage_count, usage_window_start;

If the returned usage_count exceeds the limit, the request gets 429 with a Retry-After header carrying seconds remaining in the current window.

Fixed-window means a burst right before the reset + a burst right after can spike to 2× the limit for one second. If you need true sliding windows, that's v2. For almost every use case, fixed-window is fine.

Configure

Global default

API_KEY_RATE_LIMIT_PER_MINUTE=120   # requests / minute / key

0 (default) disables rate limiting globally.

Per-key override

At creation time:

curl -X POST http://localhost:8000/workspace/api-keys \
  -H "Authorization: Bearer nk_<admin>" \
  -d '{"name":"sdr-agent","role":"member","rate_limit_per_minute":60}'

Or via PATCH later (endpoint: v2). For now, revoke and re-create.

Resolution order: per-key rate_limit_per_minute → global default → off.

Response on limit

HTTP/1.1 429 Too Many Requests
Retry-After: 42
Content-Type: application/json

{"error":"rate limit exceeded (60/min); try again shortly"}

Agents should respect Retry-After. Claude's MCP client does this automatically; other clients vary.

Observing usage

The api_keys.usage_count + usage_window_start columns reflect the current window. For historical observation, mirror requests into your logging layer (Nakatomi logs to stdout) — dedicated rate-limit metrics are on the v2 roadmap.

Recommended limits

Use case	Suggested limit
Interactive agent (Claude Desktop, Cursor)	60-120 / min
Batch import / bulk upsert	600 / min on a dedicated key
Webhook relay agent	30 / min (most loads are low)
Read-only dashboard user	30 / min
Service-account for integrations	300-600 / min

When in doubt: set a conservative number and bump it when you see 429s in logs. Cheaper than over-provisioning Postgres.

What's NOT rate-limited

User JWT requests (no counter on the User row in v1). Protect these at the load balancer if it matters.
Discovery endpoints (/health, /schema, /llms.txt, /.well-known/) — they're cheap and unauthenticated.

Testing

Set a low limit on a test key, loop:

for i in $(seq 1 100); do
  curl -o /dev/null -s -w "%{http_code}\n" \
    http://localhost:8000/contacts \
    -H "Authorization: Bearer nk_..."
done | sort | uniq -c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rate Limiting

Rate Limiting

Why

How it works

Configure

Global default

Per-key override

Response on limit

Observing usage

Recommended limits

What's NOT rate-limited

Testing

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Getting started

Agent surface

Subsystems

Reference

Clone this wiki locally