Skip to content

Responses API swallows upstream HTTP errors (429) — returns 200 with status:failed #511

@Evrard-Nil

Description

@Evrard-Nil

Bug

The /v1/responses endpoint swallows upstream HTTP errors (including 429 rate limits) and returns HTTP 200 with status: "failed", empty content, and 0 usage tokens. It should propagate the upstream error as the corresponding HTTP status code.

Reproduction

# Chat completions correctly returns 429:
curl -s 'https://cloud-api.near.ai/v1/chat/completions' \
  -H 'Authorization: Bearer <key>' \
  -d '{"model":"google/gemini-3-pro","messages":[{"role":"user","content":"hi"}],"max_tokens":32}'
# → HTTP 429: {"error":{"message":"Rate limit exceeded..."}}

# Responses API swallows the 429:
curl -s 'https://cloud-api.near.ai/v1/responses' \
  -H 'Authorization: Bearer <key>' \
  -d '{"model":"google/gemini-3-pro","input":"hi","max_output_tokens":32}'
# → HTTP 200: {"status":"failed","output":[{"content":[{"text":""}]}],"usage":{"total_tokens":0}}

Impact

  • Clients cannot detect rate limits and retry appropriately
  • infra-tests test_responses[gemini-3-pro] persistently fails because request_with_retry() sees HTTP 200 and doesn't retry, while the equivalent chat completion test retries on 429 and eventually succeeds

Root Cause

Traced through the code:

  1. Gemini backend (inference_providers/src/external/gemini/mod.rs:216): Returns CompletionError::HttpError { status_code: 429 } — correct
  2. Completion stream (services/src/responses/service.rs:696-703): Catches the error, sets stream_error = true, breaks — no usage captured
  3. Service (services/src/responses/service.rs:1184-1232): Emits response.failed event but never response.completed — no final response object
  4. Route handler fallback (api/src/routes/responses.rs:500-560): No final_response from completed event → falls through to fallback with Usage::new(0, 0) hardcoded → returns HTTP 200

Suggested Fix

When the completion stream errors with an HTTP error, propagate it as the corresponding HTTP status code from the /v1/responses endpoint, rather than wrapping it in status: "failed" with HTTP 200. At minimum, 429 rate limit errors should be propagated as HTTP 429 so clients can implement retry logic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions