Skip to content

Rate-limit 429 is not honored — drop window with no Retry-After, no typed event #18

@AndresL230

Description

@AndresL230

Rate-limit 429 is not surfaced distinctly in extension or SDKs

Severity: Medium
Affected repos: extension, middleware-node, middleware-python
Component boundary: clients ↔ API rate-limit middleware

Symptom

The API enforces:

  • POST /projects/:id/scans — 10/60s per project.
  • POST /projects/:id/telemetry — 1000/hour per project.
  • POST /projects — 5/hour per user.
  • GET /auth/google — 20/hour per IP.

When a client trips a limit and gets a 429, none of the clients distinguish that response from a generic 5xx. The extension shows "Remote analysis failed" with no actionable text. The SDKs treat 429 as non-retriable (correct, 4xx skip-retry), drop the window, and log a generic warning.

Evidence

  • extension/src/webview-provider.ts — scan submission error handler is a generic catch with no status-specific branch.
  • middleware-node/src/core/transport.ts — retry loop's 4xx skip-retry branch covers 429 by accident, with no distinct logging.
  • api/src/middleware/rate-limit.ts — defines the limits; the body of a 429 response carries a "Retry-After" hint that nobody reads.

Impact

  • A user running a tight scan loop sees a cryptic error and doesn't know to wait 60 seconds.
  • A heavy-traffic application running the SDK never gets a clear "you're being rate-limited, increase your flush interval" signal.
  • The intentional Retry-After value is wasted.

Fix recommendation

In each client, branch on 429:

Extension (webview-provider.ts):

if (err.status === 429) {
  vscode.window.showWarningMessage(
    `ReCost: scan rate limit reached (10 per minute). Try again in a moment.`
  );
  return;
}

SDKs: on 429, honor Retry-After (parse as seconds-from-now), reschedule the next flush, and emit a typed onError({ kind: "rate_limited", retryAfterMs }). Don't drop the window — defer it.

Verification

  • Submit 11 scans in a minute, confirm the 11th shows a clear rate-limit message.
  • Generate 1100 telemetry events within an hour with the SDK, confirm the SDK pauses and reports rate_limited.

Scope on this repo

This tracks the Python SDK behavior: honor Retry-After, reschedule the next flush, emit a typed on_error event. The extension UI surface is tracked separately on recost-dev/extension#100.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions