FWS-10 — Rate limit configurability + orchestration-friendly defaults + cancel-bucket exemption (follow-up to #31, #88)

> Part of the Forge backlog. Effort: **S (2–3 engineer-days)**. Risk: **low** (additive config; behavior change opt-in via config). Depends on: nothing. **Follow-up to FWS-4 (#88)** — surfaced during manual testing of cancellation scenarios.

## Scope

Three orthogonal fixes to the per-IP A2A rate limiter built in **issue #31**:

1. Make `RateLimitConfig` configurable via `forge.yaml` + CLI flags. Today it's a struct in `forge-cli/server/a2a_server.go` with no surface — operators can't override without forking.
2. Raise the default `WriteBurst` so that bursty orchestrator dispatch (5 parallel tasks, a cron firing several jobs at once) doesn't immediately throttle.
3. **Exempt `tasks/cancel` from the write bucket** (or give it its own permissive bucket). Cancellation is the most rate-limit-sensitive surface in the whole protocol — throttling it amplifies the problem it's trying to solve.

The current code (per `defaultRateLimitConfig` in `forge-cli/server/a2a_server.go:413`):

```go
return &RateLimitConfig{
    ReadRPS:    1.0,         // ~60/min
    ReadBurst:  10,
    WriteRPS:   10.0 / 60.0, // ~10/min
    WriteBurst: 3,
}
```

These defaults match the design intent from **#31** (60 req/min reads, 10 req/min writes) and the read side is fine. The write side defaults are too aggressive once you consider orchestrated workloads — see "Why this matters" below.

## Why this matters

### 1. Parallel workflows

A platform orchestrator firing **N parallel agent calls** (initializ WS-3) blows past `WriteBurst=3` after the third dispatch. Subsequent calls wait `1/WriteRPS = 6 seconds` each. A 10-step parallel stage becomes a 60-second serialized stage.

### 2. Cron bursts

A `forge.yaml` schedule with multiple cron entries that share a firing minute will see the 4th+ task throttled. The agent silently drops tasks the operator scheduled.

### 3. Cost-ceiling cancel bursts (the FWS-4 case)

When a workflow's cost ceiling is hit, the orchestrator wants to fire `tasks/cancel` against every in-flight agent in the workflow — possibly dozens. With `tasks/cancel` sharing the same bucket as `tasks/send`, **the cancels are throttled at exactly the moment cancellation matters most**. The FWS-4 manual test surfaced this: after running 4 cancellation scenarios in ~10 seconds, the 5th got `-32603: rate limit exceeded` from the middleware before reaching the cancel handler.

This is the most concerning of the three because cancellation is the recovery mechanism. Throttling the recovery mechanism turns a recoverable cost overrun into an extended one.

### 4. Per-IP grouping breaks behind a service IP

In k8s, multiple orchestrator pods sit behind a single service IP (or all hit the agent through one ingress IP). With per-IP rate limiting, the entire orchestrator fleet shares one bucket. The agent's effective dispatch capacity is `10 req/min` total, regardless of how many orchestrator replicas are running.

## Deliverables

### 1. Configurability via `forge.yaml`

New top-level block (alongside `cors_origins`):

```yaml
server:
  rate_limit:
    read_rps: 1.0           # default 1.0 (60/min)
    read_burst: 10          # default 10
    write_rps: 1.0          # NEW DEFAULT 60/min (was 10/min)
    write_burst: 20         # NEW DEFAULT 20 (was 3)
    cancel_exempt: true     # default true — tasks/cancel skips the write limiter
```

Resolution order (mirror `cors_origins`):

1. CLI flags: `--rate-limit-write-rps`, `--rate-limit-write-burst`, `--rate-limit-read-rps`, `--rate-limit-read-burst`, `--rate-limit-cancel-exempt`
2. Env: `FORGE_RATE_LIMIT_WRITE_RPS`, `FORGE_RATE_LIMIT_WRITE_BURST`, etc.
3. `server.rate_limit` in `forge.yaml`
4. Defaults

### 2. Bumped defaults

| Field | Old default | New default | Reason |
|---|---|---|---|
| `ReadRPS` | 1.0 (60/min) | 1.0 (60/min) | unchanged — fine for status polling |
| `ReadBurst` | 10 | 10 | unchanged |
| `WriteRPS` | 10/60 (10/min) | **1.0 (60/min)** | parallel workflow dispatch needs 1/sec sustained |
| `WriteBurst` | 3 | **20** | absorbs orchestrator dispatch bursts + cron-fire bursts without silent drops |
| `CancelExempt` | (n/a — bug) | **true** | see deliverable 3 |

These still protect against unauthenticated-client DoS (60/min is one task per second sustained) while not breaking normal orchestrated use. Operators can lock down further via config if their threat model is stricter.

### 3. Cancel exemption

The rate-limiter middleware classifies methods as read vs. write today. Add a third class: **cancel**. `tasks/cancel` goes through a separate (much more permissive) bucket — or skips the limiter entirely.

Rationale: cancel is "stop doing work." It's idempotent (FWS-4 made it so), it's cheap (no LLM dispatch, just signal a registered cancel func), and it's most needed exactly when something is going wrong. Sharing the budget with `tasks/send` turns cost-ceiling enforcement into a serialized 6-second-per-cancel scan, which is a footgun.

Implement as either:
- (a) `cancel_exempt: true` skips the limiter for `tasks/cancel` entirely (simplest, recommended).
- (b) Separate `cancel_rps` / `cancel_burst` bucket (more knobs, less footgun-protection).

(a) is simpler and matches the threat model: cancel is internally rate-limited by the registry (an unknown task ID returns instantly without doing work), so DoS via cancel-spam is bounded by the cost of looking up a map entry — not a real concern.

## Architectural notes

- **No protocol change.** The A2A wire format is unchanged; this is purely server-side resource control.
- **The `429` response shape stays the same.** Existing clients that retry on 429 keep working.
- **Per-IP grouping limitation** (behind service IPs) is **NOT** in scope for this issue — the right fix there is auth-aware rate limiting (authenticated callers get their own bucket keyed by `auth.user_id` or similar), which is a larger redesign that should reference issue #31's original threat model. File separately if/when the per-IP behavior becomes a practical problem.

## Out of scope

- Auth-aware rate limiting (per-user buckets instead of per-IP). Larger redesign; file separately.
- Distributed rate limiting (Redis-backed limiter for multi-pod agents). Out of scope for the single-binary runtime.
- Adaptive rate limiting that responds to system load. Premature optimization.

## Files expected to change

| File | Change |
|---|---|
| `forge-cli/server/a2a_server.go` | Bump defaults in `defaultRateLimitConfig`; add `CancelExempt` field; thread `cancel`-method classification through the limiter middleware |
| `forge-core/types/forge_yaml.go` (or wherever `ForgeConfig` lives) | Add `Server.RateLimit` block to the YAML schema |
| `forge-cli/runtime/runner.go` | Resolve `RateLimitConfig` from CLI > env > yaml > defaults; pass into `ServerConfig.RateLimit` |
| `forge-cli/cmd/run.go` | Add `--rate-limit-*` flags |
| `docs/reference/forge-yaml-schema.md` | Document the new `server.rate_limit` block |
| `CHANGELOG.md` | `Changed` entry for the new defaults + `Added` for the config surface |

## Tests

- `forge-cli/server/a2a_server_test.go` — extend existing `TestRateLimitMiddleware_*` tests to cover:
  - `CancelExempt=true` → `tasks/cancel` always passes regardless of write bucket state
  - Config-from-forge.yaml round-trip
  - CLI flag overrides yaml
- New: `TestRateLimitMiddleware_NewDefaults_AllowsBurst20` to lock in the new defaults so an accidental future change doesn't silently re-tighten.

## Acceptance criteria

1. An agent started with no config matches today's read behavior but new write defaults (60/min, burst 20, cancel exempt).
2. `forge.yaml` `server.rate_limit` block round-trips through restart.
3. Running the FWS-4 manual test (`/tmp/forge-fws4/test.sh` from issue #88's verification) **without** the 20-second warm-up sleep and 13-second inter-iteration sleep — all 4 scenarios still PASS. This is the operational regression check: the test should not need to dodge the rate limiter.
4. `tasks/cancel` fires successfully even after the write bucket is fully depleted by `tasks/send`.
5. Documentation explains the trade-off and shows the most common stricter-than-default config (e.g. for a public-facing agent on the open internet).

## Anti-patterns to avoid

- Removing the rate limiter entirely. It exists for a reason — anonymous public-facing agents need DoS protection.
- Auth-aware buckets in this issue. Out of scope; separate redesign.
- Making `tasks/cancel` totally unlimited without any internal protection. The registry lookup is the natural rate limit (O(1) map access on unknown ID).
- Bumping defaults so high that they offer no protection (e.g. 1000/sec). 60/min is a reasonable balance.

## Background

This was surfaced during manual testing of issue #88 (FWS-4, cancellation signal handling). The FWS-4 test script ran 4 scenarios of `tasks/send` + `tasks/cancel` pairs and hit `-32603: rate limit exceeded` from the limiter middleware on the cancel side — exactly the case where rate-limiting cancellation amplifies the problem cancellation is trying to solve.

Original rate limiter design: issue **#31** (defaults at the time: 60 req/min reads, 10 req/min writes). Those defaults predate parallel workflow execution (WS-3) and cost-ceiling cancellation (FWS-4); they're now out of step with how Forge is operationally driven.


File	Change
`forge-cli/server/a2a_server.go`	Bump defaults in `defaultRateLimitConfig`; add `CancelExempt` field; thread `cancel`-method classification through the limiter middleware
`forge-core/types/forge_yaml.go` (or wherever `ForgeConfig` lives)	Add `Server.RateLimit` block to the YAML schema
`forge-cli/runtime/runner.go`	Resolve `RateLimitConfig` from CLI > env > yaml > defaults; pass into `ServerConfig.RateLimit`
`forge-cli/cmd/run.go`	Add `--rate-limit-*` flags
`docs/reference/forge-yaml-schema.md`	Document the new `server.rate_limit` block
`CHANGELOG.md`	`Changed` entry for the new defaults + `Added` for the config surface

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FWS-10 — Rate limit configurability + orchestration-friendly defaults + cancel-bucket exemption (follow-up to #31, #88) #110

Scope

Why this matters

1. Parallel workflows

2. Cron bursts

3. Cost-ceiling cancel bursts (the FWS-4 case)

4. Per-IP grouping breaks behind a service IP

Deliverables

1. Configurability via `forge.yaml`

2. Bumped defaults

3. Cancel exemption

Architectural notes

Out of scope

Files expected to change

Tests

Acceptance criteria

Anti-patterns to avoid

Background

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Field	Old default	New default	Reason
`ReadRPS`	1.0 (60/min)	1.0 (60/min)	unchanged — fine for status polling
`ReadBurst`	10	10	unchanged
`WriteRPS`	10/60 (10/min)	1.0 (60/min)	parallel workflow dispatch needs 1/sec sustained
`WriteBurst`	3	20	absorbs orchestrator dispatch bursts + cron-fire bursts without silent drops
`CancelExempt`	(n/a — bug)	true	see deliverable 3

FWS-10 — Rate limit configurability + orchestration-friendly defaults + cancel-bucket exemption (follow-up to #31, #88) #110

Description

Scope

Why this matters

1. Parallel workflows

2. Cron bursts

3. Cost-ceiling cancel bursts (the FWS-4 case)

4. Per-IP grouping breaks behind a service IP

Deliverables

1. Configurability via forge.yaml

2. Bumped defaults

3. Cancel exemption

Architectural notes

Out of scope

Files expected to change

Tests

Acceptance criteria

Anti-patterns to avoid

Background

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Configurability via `forge.yaml`