Skip to content

Rate Limit

wody edited this page May 24, 2026 · 2 revisions

Rate Limit (per-IP)

Introduced in v0.56.0 as Phase 35. Per-IP token bucket guarding mutating + WebSocket + login endpoints against credential stuffing and runaway clients. Zero external deps — in-memory state, lock-free refill.

When to use

  • LAN-only deployment — the default enabled: true is harmless; legitimate browser traffic stays well within the 120-token api bucket (≥1 token per request, 2 tok/s refill).
  • External / public-internet deployment — leave it on. The auth bucket (cap 10, 0.2 tok/s) keeps /login attempts under ~12/min — credential-stuffing infeasible.
  • Behind nginx / Cloudflare with their own rate limit — disable in server.yml (security.rateLimit.enabled: false) to avoid double-counting.

Configuration

# server.yml
security:
  rateLimit:
    enabled: true
    apiCapacity: 120        # /api/ + /ws/ buckets per IP
    apiRefillPerSecond: 2.0
    authCapacity: 10        # /login + /api/auth/login per IP
    authRefillPerSecond: 0.2

Env override:

  • VIBECODER_SECURITY_RATE_LIMIT_ENABLED=false (etc.)

Restart the container to apply.

What's covered

Path prefix Bucket Notes
/api/ api Every JSON endpoint
/ws/ api WebSocket handshake. Closed connections re-open later — long-lived streams cost 1 token
/login (POST) auth SSR login form
/api/auth/login auth JSON login

Everything else (/static/*, SSR pages, /setup, /health) skips the limiter.

Admin bypass

Once installAuth recognizes a Bearer token / cookie whose user's role is admin, the limiter short-circuits with null. The rate counter is not bumped for that request. This keeps backup, install, sub-agent fanout, etc. from accidentally throttling the operator.

Detection happens in the limiter phase (before installAuth runs its full lookup) — same device-row read, so the cost is one SHA-256 hash + one PG row read per request. Negligible in practice.

429 response

HTTP/1.1 429 Too Many Requests
Retry-After: 4
Content-Type: application/json

{"code":"rate_limited","message":"too many requests","retryAfter":4}

The Retry-After value is integer seconds, rounded up — the limiter computes how long until 1 token becomes available given the current bucket deficit + the refill rate.

Metrics

Visible via Metrics (v0.55.0+):

Metric Description
vibe_rate_limit_buckets_active{bucket="api"|"auth"} Active IPs in the bucket map
vibe_rate_limit_429_total{path_bucket="api"|"auth"} Cumulative 429 responses

Useful PromQL:

# Reject ratio (per minute)
rate(vibe_rate_limit_429_total[5m])

# How many unique IPs are currently tracked?
vibe_rate_limit_buckets_active

Implementation notes

RateLimiter keeps a ConcurrentHashMap<ip, Bucket> where each bucket stores tokens: Double + lastNanos: Long. Refill is lock-free in the hot path; only the tryAcquire block briefly synchronized(bucket) to atomically subtract.

There's a hard MAX_IPS = 10_000 safety: if exceeded (would require an attacker rotating IPs aggressively at single-user dev server scale), the half oldest buckets get evicted. In normal single-user operation the map stays at ~1–5 IPs.

State is in-memory only — restarting the server resets all buckets. For a single-user dev server this is fine; the burst window is short anyway.

Trade-offs

  • No persistent ban — repeated abusers from the same IP just see 429 forever until they slow down. Hard IP block is the separate AuthService.ipFailures (v0.12.4+) tracker, which catches multiple-account credential stuffing over 24 h.
  • No cluster sync — k8s pod restart loses state. In a multi-replica deployment you'd need an external store (Redis). Single-user single-replica dev server doesn't.

Related

Clone this wiki locally