Skip to content

FWS-10 — Rate limit configurability + orchestration-friendly defaults + cancel-bucket exemption (follow-up to #31, #88) #110

@initializ-mk

Description

@initializ-mk

Part of the Forge backlog. Effort: S (2–3 engineer-days). Risk: low (additive config; behavior change opt-in via config). Depends on: nothing. Follow-up to FWS-4 (#88) — surfaced during manual testing of cancellation scenarios.

Scope

Three orthogonal fixes to the per-IP A2A rate limiter built in issue #31:

  1. Make RateLimitConfig configurable via forge.yaml + CLI flags. Today it's a struct in forge-cli/server/a2a_server.go with no surface — operators can't override without forking.
  2. Raise the default WriteBurst so that bursty orchestrator dispatch (5 parallel tasks, a cron firing several jobs at once) doesn't immediately throttle.
  3. Exempt tasks/cancel from the write bucket (or give it its own permissive bucket). Cancellation is the most rate-limit-sensitive surface in the whole protocol — throttling it amplifies the problem it's trying to solve.

The current code (per defaultRateLimitConfig in forge-cli/server/a2a_server.go:413):

return &RateLimitConfig{
    ReadRPS:    1.0,         // ~60/min
    ReadBurst:  10,
    WriteRPS:   10.0 / 60.0, // ~10/min
    WriteBurst: 3,
}

These defaults match the design intent from #31 (60 req/min reads, 10 req/min writes) and the read side is fine. The write side defaults are too aggressive once you consider orchestrated workloads — see "Why this matters" below.

Why this matters

1. Parallel workflows

A platform orchestrator firing N parallel agent calls (initializ WS-3) blows past WriteBurst=3 after the third dispatch. Subsequent calls wait 1/WriteRPS = 6 seconds each. A 10-step parallel stage becomes a 60-second serialized stage.

2. Cron bursts

A forge.yaml schedule with multiple cron entries that share a firing minute will see the 4th+ task throttled. The agent silently drops tasks the operator scheduled.

3. Cost-ceiling cancel bursts (the FWS-4 case)

When a workflow's cost ceiling is hit, the orchestrator wants to fire tasks/cancel against every in-flight agent in the workflow — possibly dozens. With tasks/cancel sharing the same bucket as tasks/send, the cancels are throttled at exactly the moment cancellation matters most. The FWS-4 manual test surfaced this: after running 4 cancellation scenarios in ~10 seconds, the 5th got -32603: rate limit exceeded from the middleware before reaching the cancel handler.

This is the most concerning of the three because cancellation is the recovery mechanism. Throttling the recovery mechanism turns a recoverable cost overrun into an extended one.

4. Per-IP grouping breaks behind a service IP

In k8s, multiple orchestrator pods sit behind a single service IP (or all hit the agent through one ingress IP). With per-IP rate limiting, the entire orchestrator fleet shares one bucket. The agent's effective dispatch capacity is 10 req/min total, regardless of how many orchestrator replicas are running.

Deliverables

1. Configurability via forge.yaml

New top-level block (alongside cors_origins):

server:
  rate_limit:
    read_rps: 1.0           # default 1.0 (60/min)
    read_burst: 10          # default 10
    write_rps: 1.0          # NEW DEFAULT 60/min (was 10/min)
    write_burst: 20         # NEW DEFAULT 20 (was 3)
    cancel_exempt: true     # default true — tasks/cancel skips the write limiter

Resolution order (mirror cors_origins):

  1. CLI flags: --rate-limit-write-rps, --rate-limit-write-burst, --rate-limit-read-rps, --rate-limit-read-burst, --rate-limit-cancel-exempt
  2. Env: FORGE_RATE_LIMIT_WRITE_RPS, FORGE_RATE_LIMIT_WRITE_BURST, etc.
  3. server.rate_limit in forge.yaml
  4. Defaults

2. Bumped defaults

Field Old default New default Reason
ReadRPS 1.0 (60/min) 1.0 (60/min) unchanged — fine for status polling
ReadBurst 10 10 unchanged
WriteRPS 10/60 (10/min) 1.0 (60/min) parallel workflow dispatch needs 1/sec sustained
WriteBurst 3 20 absorbs orchestrator dispatch bursts + cron-fire bursts without silent drops
CancelExempt (n/a — bug) true see deliverable 3

These still protect against unauthenticated-client DoS (60/min is one task per second sustained) while not breaking normal orchestrated use. Operators can lock down further via config if their threat model is stricter.

3. Cancel exemption

The rate-limiter middleware classifies methods as read vs. write today. Add a third class: cancel. tasks/cancel goes through a separate (much more permissive) bucket — or skips the limiter entirely.

Rationale: cancel is "stop doing work." It's idempotent (FWS-4 made it so), it's cheap (no LLM dispatch, just signal a registered cancel func), and it's most needed exactly when something is going wrong. Sharing the budget with tasks/send turns cost-ceiling enforcement into a serialized 6-second-per-cancel scan, which is a footgun.

Implement as either:

  • (a) cancel_exempt: true skips the limiter for tasks/cancel entirely (simplest, recommended).
  • (b) Separate cancel_rps / cancel_burst bucket (more knobs, less footgun-protection).

(a) is simpler and matches the threat model: cancel is internally rate-limited by the registry (an unknown task ID returns instantly without doing work), so DoS via cancel-spam is bounded by the cost of looking up a map entry — not a real concern.

Architectural notes

  • No protocol change. The A2A wire format is unchanged; this is purely server-side resource control.
  • The 429 response shape stays the same. Existing clients that retry on 429 keep working.
  • Per-IP grouping limitation (behind service IPs) is NOT in scope for this issue — the right fix there is auth-aware rate limiting (authenticated callers get their own bucket keyed by auth.user_id or similar), which is a larger redesign that should reference issue [Bug]: Phase 2 — High Priority: Rate Limiting, Request Size, Webhook Auth, Event Dedup, Trust & Symlinks (H-1 through H-10) #31's original threat model. File separately if/when the per-IP behavior becomes a practical problem.

Out of scope

  • Auth-aware rate limiting (per-user buckets instead of per-IP). Larger redesign; file separately.
  • Distributed rate limiting (Redis-backed limiter for multi-pod agents). Out of scope for the single-binary runtime.
  • Adaptive rate limiting that responds to system load. Premature optimization.

Files expected to change

File Change
forge-cli/server/a2a_server.go Bump defaults in defaultRateLimitConfig; add CancelExempt field; thread cancel-method classification through the limiter middleware
forge-core/types/forge_yaml.go (or wherever ForgeConfig lives) Add Server.RateLimit block to the YAML schema
forge-cli/runtime/runner.go Resolve RateLimitConfig from CLI > env > yaml > defaults; pass into ServerConfig.RateLimit
forge-cli/cmd/run.go Add --rate-limit-* flags
docs/reference/forge-yaml-schema.md Document the new server.rate_limit block
CHANGELOG.md Changed entry for the new defaults + Added for the config surface

Tests

  • forge-cli/server/a2a_server_test.go — extend existing TestRateLimitMiddleware_* tests to cover:
    • CancelExempt=truetasks/cancel always passes regardless of write bucket state
    • Config-from-forge.yaml round-trip
    • CLI flag overrides yaml
  • New: TestRateLimitMiddleware_NewDefaults_AllowsBurst20 to lock in the new defaults so an accidental future change doesn't silently re-tighten.

Acceptance criteria

  1. An agent started with no config matches today's read behavior but new write defaults (60/min, burst 20, cancel exempt).
  2. forge.yaml server.rate_limit block round-trips through restart.
  3. Running the FWS-4 manual test (/tmp/forge-fws4/test.sh from issue FWS-4 — Cancellation signal handling (graceful + hard-cancel + audit flush) #88's verification) without the 20-second warm-up sleep and 13-second inter-iteration sleep — all 4 scenarios still PASS. This is the operational regression check: the test should not need to dodge the rate limiter.
  4. tasks/cancel fires successfully even after the write bucket is fully depleted by tasks/send.
  5. Documentation explains the trade-off and shows the most common stricter-than-default config (e.g. for a public-facing agent on the open internet).

Anti-patterns to avoid

  • Removing the rate limiter entirely. It exists for a reason — anonymous public-facing agents need DoS protection.
  • Auth-aware buckets in this issue. Out of scope; separate redesign.
  • Making tasks/cancel totally unlimited without any internal protection. The registry lookup is the natural rate limit (O(1) map access on unknown ID).
  • Bumping defaults so high that they offer no protection (e.g. 1000/sec). 60/min is a reasonable balance.

Background

This was surfaced during manual testing of issue #88 (FWS-4, cancellation signal handling). The FWS-4 test script ran 4 scenarios of tasks/send + tasks/cancel pairs and hit -32603: rate limit exceeded from the limiter middleware on the cancel side — exactly the case where rate-limiting cancellation amplifies the problem cancellation is trying to solve.

Original rate limiter design: issue #31 (defaults at the time: 60 req/min reads, 10 req/min writes). Those defaults predate parallel workflow execution (WS-3) and cost-ceiling cancellation (FWS-4); they're now out of step with how Forge is operationally driven.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions