Skip to content

feat(agents): structured FallbackSummaryError with human-friendly rate limit messages#45763

Closed
ToneLoke wants to merge 2 commits intoopenclaw:mainfrom
ToneLoke:feat/fallback-summary-error
Closed

feat(agents): structured FallbackSummaryError with human-friendly rate limit messages#45763
ToneLoke wants to merge 2 commits intoopenclaw:mainfrom
ToneLoke:feat/fallback-summary-error

Conversation

@ToneLoke
Copy link

Problem

When all model fallback candidates are exhausted due to rate limits, users see raw error dumps like:

⚠️ Agent failed before reply: All models failed (3): anthropic/claude-sonnet-4: 429 ...

This is confusing and doesn't tell the user when service will resume.

Solution

1. Structured FallbackSummaryError

Replace the generic Error thrown by throwFallbackFailureSummary() with a typed FallbackSummaryError that carries:

  • attempts: structured metadata per attempt (model, reason, error)
  • soonestCooldownExpiry: Unix ms timestamp of the earliest profile cooldown expiry

2. Human-friendly rate limit messages

The agent runner detects rate limit errors (via isRateLimitErrorMessage + /rate_limit/ pattern) and uses the structured error to show:

⚡ Temporarily unavailable (sonnet rate limited) — back in ~3 min.

Time estimates are derived from actual profile cooldown expiry data, formatted as seconds/minutes/hours as appropriate.

Impact

  • Changes in model-fallback.ts and agent-runner-execution.ts
  • No breaking changes — FallbackSummaryError extends Error
  • Callers not using the typed check continue to work unchanged
  • isFallbackSummaryError() type guard exported for downstream use

@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: S labels Mar 14, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 14, 2026

Greptile Summary

This PR replaces the generic Error thrown when all model fallback candidates are exhausted with a typed FallbackSummaryError that carries structured attempt metadata and the soonest profile cooldown expiry, then uses that data in the agent runner to show human-friendly rate-limit messages ("⚡ Temporarily unavailable (model rate limited) — back in ~3 min.") instead of raw error dumps.

Key observations:

  • Non-breaking designFallbackSummaryError extends Error and the exported isFallbackSummaryError() type guard means existing callers that don't opt in continue to work unchanged.
  • Single-attempt gapthrowFallbackFailureSummary re-throws the bare lastError when attempts.length <= 1, bypassing FallbackSummaryError. For single-model configurations, isFallbackSummaryError(err) returns false in the new rate-limit handler, so the displayed message falls back to "~60 seconds" with no model name and no accurate timing.
  • Image model fallback missing timingrunWithImageModelFallback never computes or threads soonestCooldownExpiry into throwFallbackFailureSummary, so image-model rate-limit errors always carry null for that field.
  • Readability — the rate-limit message construction is implemented as an IIFE inside a multi-level ternary in agent-runner-execution.ts; extracting it to a named helper function would significantly improve maintainability.
  • Cooldown expiry computation is correct — the post-loop loop over candidates correctly finds the minimum (soonest) expiry across all providers using <, which is the right strategy for "when will any provider be available again?"

Confidence Score: 4/5

  • Safe to merge — no breaking changes, logic is sound, issues found are UX gaps and style concerns rather than correctness bugs.
  • The core implementation is correct and well-structured. The two functional gaps (single-attempt path not wrapping in FallbackSummaryError, image model fallback missing soonestCooldownExpiry) degrade the feature for those specific cases but don't cause errors or regressions — they silently fall back to less informative messages. The IIFE style issue is cosmetic. No existing behaviour is broken.
  • Minor attention on src/agents/model-fallback.ts — specifically the throwFallbackFailureSummary single-attempt early-exit and the runWithImageModelFallback omission.

Comments Outside Diff (1)

  1. src/agents/model-fallback.ts, line 863-869 (link)

    runWithImageModelFallback omits soonestCooldownExpiry

    runWithImageModelFallback calls throwFallbackFailureSummary without computing or passing soonestCooldownExpiry, so its FallbackSummaryError will always have soonestCooldownExpiry === null. If a caller surfaces rate-limit messages from image model failures using the same isFallbackSummaryError path, the displayed time will always fall back to the default "~60 seconds" rather than showing an accurate estimate.

    If image models can hit rate limits (they use the same provider API keys), the same soonestCooldownExpiry calculation that was added before the runWithModelFallback summary throw should be applied here too. The function currently receives no authStore reference, so it would need to be threaded in (or at minimum the omission should be documented with a // TODO).

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/agents/model-fallback.ts
    Line: 863-869
    
    Comment:
    **`runWithImageModelFallback` omits `soonestCooldownExpiry`**
    
    `runWithImageModelFallback` calls `throwFallbackFailureSummary` without computing or passing `soonestCooldownExpiry`, so its `FallbackSummaryError` will always have `soonestCooldownExpiry === null`. If a caller surfaces rate-limit messages from image model failures using the same `isFallbackSummaryError` path, the displayed time will always fall back to the default `"~60 seconds"` rather than showing an accurate estimate.
    
    If image models can hit rate limits (they use the same provider API keys), the same `soonestCooldownExpiry` calculation that was added before the `runWithModelFallback` summary throw should be applied here too. The function currently receives no `authStore` reference, so it would need to be threaded in (or at minimum the omission should be documented with a `// TODO`).
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/agent-runner-execution.ts
Line: 629-658

Comment:
**IIFE inside ternary makes this hard to maintain**

The immediately-invoked function expression inside the multi-level ternary chain significantly reduces readability. If the rate limit message logic ever needs to grow (e.g. adding support for `overloaded` sub-cases), it will be difficult to locate and modify here.

Consider extracting the logic into a named helper:

```
function buildRateLimitFallbackText(err: unknown): string {
  let backIn = "~60 seconds";
  let modelInfo = "";
  if (isFallbackSummaryError(err)) {
    if (err.soonestCooldownExpiry !== null) {
      const secsRemaining = Math.ceil((err.soonestCooldownExpiry - Date.now()) / 1000);
      if (secsRemaining > 0) {
        if (secsRemaining < 60) {
          backIn = `~${secsRemaining}s`;
        } else if (secsRemaining < 3600) {
          backIn = `~${Math.ceil(secsRemaining / 60)} min`;
        } else {
          backIn = `~${Math.ceil(secsRemaining / 3600)} hr`;
        }
      } else {
        backIn = "any moment";
      }
    }
    const limitedModels = err.attempts
      .filter((a) => a.reason === "rate_limit")
      .map((a) => a.model);
    if (limitedModels.length > 0) {
      modelInfo = ` (${limitedModels.join(", ")} rate limited)`;
    }
  }
  return `⚡ Temporarily unavailable${modelInfo} — back in ${backIn}.`;
}
```

Then the ternary simply becomes:
```
: isRateLimit
  ? buildRateLimitFallbackText(err)
  : `⚠️ Agent failed before reply: …`
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/model-fallback.ts
Line: 863-869

Comment:
**`runWithImageModelFallback` omits `soonestCooldownExpiry`**

`runWithImageModelFallback` calls `throwFallbackFailureSummary` without computing or passing `soonestCooldownExpiry`, so its `FallbackSummaryError` will always have `soonestCooldownExpiry === null`. If a caller surfaces rate-limit messages from image model failures using the same `isFallbackSummaryError` path, the displayed time will always fall back to the default `"~60 seconds"` rather than showing an accurate estimate.

If image models can hit rate limits (they use the same provider API keys), the same `soonestCooldownExpiry` calculation that was added before the `runWithModelFallback` summary throw should be applied here too. The function currently receives no `authStore` reference, so it would need to be threaded in (or at minimum the omission should be documented with a `// TODO`).

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/model-fallback.ts
Line: 217-219

Comment:
**Single-attempt path bypasses `FallbackSummaryError`**

When `params.attempts.length <= 1 && params.lastError`, the raw `lastError` (a plain `Error`) is re-thrown. This means for users with only a single model configured, `isFallbackSummaryError(err)` in `agent-runner-execution.ts` will return `false`, and the rate-limit handler will display `"⚡ Temporarily unavailable — back in ~60 seconds."` with no model name and no accurate cooldown estimate — the same defaults used when no structured data is available.

The PR description acknowledges this implicitly ("callers not using the typed check continue to work unchanged"), but since the new rate-limit branch in `agent-runner-execution.ts` relies on the typed check, this is a real UX gap for single-model setups. Consider wrapping `params.lastError` in a `FallbackSummaryError` here too (passing the single attempt and the known `soonestCooldownExpiry`) rather than re-throwing it bare:

```typescript
if (params.attempts.length <= 1 && params.lastError) {
  if (params.attempts.length === 0) {
    throw params.lastError;
  }
  throw new FallbackSummaryError(
    params.attempts[0]
      ? `${params.label} failed: ${params.formatAttempt(params.attempts[0])}`
      : String(params.lastError),
    {
      attempts: params.attempts,
      soonestCooldownExpiry: params.soonestCooldownExpiry ?? null,
      cause: params.lastError instanceof Error ? params.lastError : undefined,
    },
  );
}
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 25adf11

Comment on lines +629 to +658
? (() => {
// Build a human-friendly rate limit message with model info and time estimate.
let backIn = "~60 seconds";
let modelInfo = "";
if (isFallbackSummaryError(err)) {
if (err.soonestCooldownExpiry !== null) {
const secsRemaining = Math.ceil(
(err.soonestCooldownExpiry - Date.now()) / 1000,
);
if (secsRemaining > 0) {
if (secsRemaining < 60) {
backIn = `~${secsRemaining}s`;
} else if (secsRemaining < 3600) {
backIn = `~${Math.ceil(secsRemaining / 60)} min`;
} else {
backIn = `~${Math.ceil(secsRemaining / 3600)} hr`;
}
} else {
backIn = "any moment";
}
}
const limitedModels = err.attempts
.filter((a) => a.reason === "rate_limit")
.map((a) => a.model);
if (limitedModels.length > 0) {
modelInfo = ` (${limitedModels.join(", ")} rate limited)`;
}
}
return `⚡ Temporarily unavailable${modelInfo} — back in ${backIn}.`;
})()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIFE inside ternary makes this hard to maintain

The immediately-invoked function expression inside the multi-level ternary chain significantly reduces readability. If the rate limit message logic ever needs to grow (e.g. adding support for overloaded sub-cases), it will be difficult to locate and modify here.

Consider extracting the logic into a named helper:

function buildRateLimitFallbackText(err: unknown): string {
  let backIn = "~60 seconds";
  let modelInfo = "";
  if (isFallbackSummaryError(err)) {
    if (err.soonestCooldownExpiry !== null) {
      const secsRemaining = Math.ceil((err.soonestCooldownExpiry - Date.now()) / 1000);
      if (secsRemaining > 0) {
        if (secsRemaining < 60) {
          backIn = `~${secsRemaining}s`;
        } else if (secsRemaining < 3600) {
          backIn = `~${Math.ceil(secsRemaining / 60)} min`;
        } else {
          backIn = `~${Math.ceil(secsRemaining / 3600)} hr`;
        }
      } else {
        backIn = "any moment";
      }
    }
    const limitedModels = err.attempts
      .filter((a) => a.reason === "rate_limit")
      .map((a) => a.model);
    if (limitedModels.length > 0) {
      modelInfo = ` (${limitedModels.join(", ")} rate limited)`;
    }
  }
  return `⚡ Temporarily unavailable${modelInfo} — back in ${backIn}.`;
}

Then the ternary simply becomes:

: isRateLimit
  ? buildRateLimitFallbackText(err)
  : `⚠️ Agent failed before reply: …`
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/agent-runner-execution.ts
Line: 629-658

Comment:
**IIFE inside ternary makes this hard to maintain**

The immediately-invoked function expression inside the multi-level ternary chain significantly reduces readability. If the rate limit message logic ever needs to grow (e.g. adding support for `overloaded` sub-cases), it will be difficult to locate and modify here.

Consider extracting the logic into a named helper:

```
function buildRateLimitFallbackText(err: unknown): string {
  let backIn = "~60 seconds";
  let modelInfo = "";
  if (isFallbackSummaryError(err)) {
    if (err.soonestCooldownExpiry !== null) {
      const secsRemaining = Math.ceil((err.soonestCooldownExpiry - Date.now()) / 1000);
      if (secsRemaining > 0) {
        if (secsRemaining < 60) {
          backIn = `~${secsRemaining}s`;
        } else if (secsRemaining < 3600) {
          backIn = `~${Math.ceil(secsRemaining / 60)} min`;
        } else {
          backIn = `~${Math.ceil(secsRemaining / 3600)} hr`;
        }
      } else {
        backIn = "any moment";
      }
    }
    const limitedModels = err.attempts
      .filter((a) => a.reason === "rate_limit")
      .map((a) => a.model);
    if (limitedModels.length > 0) {
      modelInfo = ` (${limitedModels.join(", ")} rate limited)`;
    }
  }
  return `⚡ Temporarily unavailable${modelInfo} — back in ${backIn}.`;
}
```

Then the ternary simply becomes:
```
: isRateLimit
  ? buildRateLimitFallbackText(err)
  : `⚠️ Agent failed before reply: …`
```

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

This comment was marked as spam.

Comment on lines 217 to 219
if (params.attempts.length <= 1 && params.lastError) {
throw params.lastError;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single-attempt path bypasses FallbackSummaryError

When params.attempts.length <= 1 && params.lastError, the raw lastError (a plain Error) is re-thrown. This means for users with only a single model configured, isFallbackSummaryError(err) in agent-runner-execution.ts will return false, and the rate-limit handler will display "⚡ Temporarily unavailable — back in ~60 seconds." with no model name and no accurate cooldown estimate — the same defaults used when no structured data is available.

The PR description acknowledges this implicitly ("callers not using the typed check continue to work unchanged"), but since the new rate-limit branch in agent-runner-execution.ts relies on the typed check, this is a real UX gap for single-model setups. Consider wrapping params.lastError in a FallbackSummaryError here too (passing the single attempt and the known soonestCooldownExpiry) rather than re-throwing it bare:

if (params.attempts.length <= 1 && params.lastError) {
  if (params.attempts.length === 0) {
    throw params.lastError;
  }
  throw new FallbackSummaryError(
    params.attempts[0]
      ? `${params.label} failed: ${params.formatAttempt(params.attempts[0])}`
      : String(params.lastError),
    {
      attempts: params.attempts,
      soonestCooldownExpiry: params.soonestCooldownExpiry ?? null,
      cause: params.lastError instanceof Error ? params.lastError : undefined,
    },
  );
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/model-fallback.ts
Line: 217-219

Comment:
**Single-attempt path bypasses `FallbackSummaryError`**

When `params.attempts.length <= 1 && params.lastError`, the raw `lastError` (a plain `Error`) is re-thrown. This means for users with only a single model configured, `isFallbackSummaryError(err)` in `agent-runner-execution.ts` will return `false`, and the rate-limit handler will display `"⚡ Temporarily unavailable — back in ~60 seconds."` with no model name and no accurate cooldown estimate — the same defaults used when no structured data is available.

The PR description acknowledges this implicitly ("callers not using the typed check continue to work unchanged"), but since the new rate-limit branch in `agent-runner-execution.ts` relies on the typed check, this is a real UX gap for single-model setups. Consider wrapping `params.lastError` in a `FallbackSummaryError` here too (passing the single attempt and the known `soonestCooldownExpiry`) rather than re-throwing it bare:

```typescript
if (params.attempts.length <= 1 && params.lastError) {
  if (params.attempts.length === 0) {
    throw params.lastError;
  }
  throw new FallbackSummaryError(
    params.attempts[0]
      ? `${params.label} failed: ${params.formatAttempt(params.attempts[0])}`
      : String(params.lastError),
    {
      attempts: params.attempts,
      soonestCooldownExpiry: params.soonestCooldownExpiry ?? null,
      cause: params.lastError instanceof Error ? params.lastError : undefined,
    },
  );
}
```

How can I resolve this? If you propose a fix, please make it concise.

This comment was marked as spam.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 25adf11056

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const isSessionCorruption = /function call turn comes immediately after/i.test(message);
const isRoleOrderingError = /incorrect role information|roles must alternate/i.test(message);
const isTransientHttp = isTransientHttpError(message);
const isRateLimit = isRateLimitErrorMessage(message) || /\(rate_limit\)/i.test(message);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict rate-limit fallback messaging to rate-limit-only errors

This condition marks the run as rate-limited whenever the summary text contains "(rate_limit)", but throwFallbackFailureSummary() includes reasons from all attempts in one message; in a mixed failure chain (e.g., first candidate rate-limited, later candidate fails with auth/model-not-found), users will still get ⚡ Temporarily unavailable ... back in ... instead of the real terminal error. That hides actionable failures and can send users down the wrong recovery path, so this branch should require structured FallbackSummaryError evidence that the blocking failure set is actually rate-limit driven.

Useful? React with 👍 / 👎.

This comment was marked as spam.

@dsantoreis

This comment was marked as spam.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 080a724cd7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +803 to +807
for (const candidate of candidates) {
const profileIds = resolveAuthProfileOrder({
cfg: params.cfg,
store: authStore,
provider: candidate.provider,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Derive cooldown ETA from rate-limited attempts only

runWithModelFallback currently computes soonestCooldownExpiry by scanning every candidate provider, not just the attempts that actually failed with reason === "rate_limit". Because auth/billing-disabled profiles also contribute unusableUntil timestamps, the value passed to buildRateLimitFallbackText() can come from an unrelated non-rate-limit path, which under mixed failures can show an overly optimistic “back in …” estimate and prompt users to retry long before the rate-limited model can recover.

Useful? React with 👍 / 👎.

This comment was marked as spam.

@xkonjin

This comment was marked as spam.

kiranvk-2011 added a commit to kiranvk-2011/openclaw that referenced this pull request Mar 18, 2026
…mit message

Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three
cooldown-related issues:

1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive
   exponential formula (1m → 5m → 25m → 1h) that locked out providers
   for far longer than the actual API rate-limit window.

2. Per-model cooldown scoping: rate_limit cooldowns now record which
   model triggered them. When a different model on the same auth profile
   is requested, the cooldown is bypassed — so one model hitting a 429
   no longer blocks all other models on the same provider.

3. FallbackSummaryError with soonest-expiry countdown: when all
   candidates are exhausted, the user sees a clear message like
   '⚠️ Rate-limited — ready in ~28s' instead of a generic failure.

Files changed:
- types.ts: add cooldownReason/cooldownModel to ProfileUsageStats
- usage.ts: stepped formula, model-aware isProfileInCooldown, modelId
  threading through computeNextProfileUsageStats/markAuthProfileFailure
- model-fallback.ts: FallbackSummaryError class, model-aware availability
  check, soonestCooldownExpiry computation
- pi-embedded-runner/run.ts: thread modelId into failure recording
- agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit
  detection branch in error handler
- usage.test.ts: update expected cooldown value (60s → 30s)
kiranvk-2011 added a commit to kiranvk-2011/openclaw that referenced this pull request Mar 18, 2026
…mit message

Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three
cooldown-related issues:

1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive
   exponential formula (1m → 5m → 25m → 1h) that locked out providers
   for far longer than the actual API rate-limit window.

2. Per-model cooldown scoping: rate_limit cooldowns now record which
   model triggered them. When a different model on the same auth profile
   is requested, the cooldown is bypassed — so one model hitting a 429
   no longer blocks all other models on the same provider.

3. FallbackSummaryError with soonest-expiry countdown: when all
   candidates are exhausted, the user sees a clear message like
   '⚠️ Rate-limited — ready in ~28s' instead of a generic failure.

Files changed:
- types.ts: add cooldownReason/cooldownModel to ProfileUsageStats
- usage.ts: stepped formula, model-aware isProfileInCooldown, modelId
  threading through computeNextProfileUsageStats/markAuthProfileFailure
- model-fallback.ts: FallbackSummaryError class, model-aware availability
  check, soonestCooldownExpiry computation
- pi-embedded-runner/run.ts: thread modelId into failure recording
- agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit
  detection branch in error handler
- usage.test.ts: update expected cooldown value (60s → 30s)
kiranvk-2011 added a commit to kiranvk-2011/openclaw that referenced this pull request Mar 18, 2026
…mit message

Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three
cooldown-related issues:

1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive
   exponential formula (1m → 5m → 25m → 1h) that locked out providers
   for far longer than the actual API rate-limit window.

2. Per-model cooldown scoping: rate_limit cooldowns now record which
   model triggered them. When a different model on the same auth profile
   is requested, the cooldown is bypassed — so one model hitting a 429
   no longer blocks all other models on the same provider.

3. FallbackSummaryError with soonest-expiry countdown: when all
   candidates are exhausted, the user sees a clear message like
   '⚠️ Rate-limited — ready in ~28s' instead of a generic failure.

Files changed:
- types.ts: add cooldownReason/cooldownModel to ProfileUsageStats
- usage.ts: stepped formula, model-aware isProfileInCooldown, modelId
  threading through computeNextProfileUsageStats/markAuthProfileFailure
- model-fallback.ts: FallbackSummaryError class, model-aware availability
  check, soonestCooldownExpiry computation
- pi-embedded-runner/run.ts: thread modelId into failure recording
- agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit
  detection branch in error handler
- usage.test.ts: update expected cooldown value (60s → 30s)
…e limit messages

Replace generic Error with FallbackSummaryError when all model fallback
candidates are exhausted. The new error carries:
- Structured attempt metadata (model, reason, error per attempt)
- Soonest cooldown expiry timestamp across all candidate providers

The agent runner uses this to show users a friendly message instead of
raw error dumps:
  ⚡ Temporarily unavailable (sonnet rate limited) — back in ~3 min.

Instead of:
  ⚠️ Agent failed before reply: All models failed (3): anthropic/...

Also adds isRateLimitErrorMessage detection to the error classifier so
rate limit errors are caught even when not wrapped in FallbackSummaryError.
…lper

- Wrap single-attempt failures in FallbackSummaryError so single-model
  configs get accurate cooldown timing and model names in rate limit UX
- Add explicit TODO + null pass for image model fallback cooldown expiry
- Extract IIFE rate limit message builder to named buildRateLimitFallbackText()
  helper for maintainability
@ToneLoke ToneLoke force-pushed the feat/fallback-summary-error branch from 080a724 to 53628d2 Compare March 19, 2026 07:41
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53628d2343

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +226 to +229
if (params.attempts.length === 1 && params.lastError) {
throw new FallbackSummaryError(
`${params.label} failed: ${params.formatAttempt(params.attempts[0])}`,
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve raw transient HTTP errors on single-candidate failure

Wrapping the single-attempt failure in FallbackSummaryError changes messages like "521 ..."/"503 ..." into "models failed: ...", which prevents the retry path in runAgentTurnWithFallback from recognizing transient HTTP failures (isTransientHttpError only matches leading HTTP status text). In single-model setups (no fallbacks), this turns a previously retried transient outage into an immediate user-facing failure, reducing reliability for short-lived provider incidents.

Useful? React with 👍 / 👎.

kiranvk-2011 added a commit to kiranvk-2011/openclaw that referenced this pull request Mar 19, 2026
…mit message

Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three
cooldown-related issues:

1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive
   exponential formula (1m → 5m → 25m → 1h) that locked out providers
   for far longer than the actual API rate-limit window.

2. Per-model cooldown scoping: rate_limit cooldowns now record which
   model triggered them. When a different model on the same auth profile
   is requested, the cooldown is bypassed — so one model hitting a 429
   no longer blocks all other models on the same provider.

3. FallbackSummaryError with soonest-expiry countdown: when all
   candidates are exhausted, the user sees a clear message like
   '⚠️ Rate-limited — ready in ~28s' instead of a generic failure.

Files changed:
- types.ts: add cooldownReason/cooldownModel to ProfileUsageStats
- usage.ts: stepped formula, model-aware isProfileInCooldown, modelId
  threading through computeNextProfileUsageStats/markAuthProfileFailure
- model-fallback.ts: FallbackSummaryError class, model-aware availability
  check, soonestCooldownExpiry computation
- pi-embedded-runner/run.ts: thread modelId into failure recording
- agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit
  detection branch in error handler
- usage.test.ts: update expected cooldown value (60s → 30s)
kiranvk-2011 added a commit to kiranvk-2011/openclaw that referenced this pull request Mar 19, 2026
…mit message

Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three
cooldown-related issues:

1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive
   exponential formula (1m → 5m → 25m → 1h) that locked out providers
   for far longer than the actual API rate-limit window.

2. Per-model cooldown scoping: rate_limit cooldowns now record which
   model triggered them. When a different model on the same auth profile
   is requested, the cooldown is bypassed — so one model hitting a 429
   no longer blocks all other models on the same provider.

3. FallbackSummaryError with soonest-expiry countdown: when all
   candidates are exhausted, the user sees a clear message like
   '⚠️ Rate-limited — ready in ~28s' instead of a generic failure.

Files changed:
- types.ts: add cooldownReason/cooldownModel to ProfileUsageStats
- usage.ts: stepped formula, model-aware isProfileInCooldown, modelId
  threading through computeNextProfileUsageStats/markAuthProfileFailure
- model-fallback.ts: FallbackSummaryError class, model-aware availability
  check, soonestCooldownExpiry computation
- pi-embedded-runner/run.ts: thread modelId into failure recording
- agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit
  detection branch in error handler
- usage.test.ts: update expected cooldown value (60s → 30s)
altaywtf pushed a commit to kiranvk-2011/openclaw that referenced this pull request Mar 24, 2026
…mit message

Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three
cooldown-related issues:

1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive
   exponential formula (1m → 5m → 25m → 1h) that locked out providers
   for far longer than the actual API rate-limit window.

2. Per-model cooldown scoping: rate_limit cooldowns now record which
   model triggered them. When a different model on the same auth profile
   is requested, the cooldown is bypassed — so one model hitting a 429
   no longer blocks all other models on the same provider.

3. FallbackSummaryError with soonest-expiry countdown: when all
   candidates are exhausted, the user sees a clear message like
   '⚠️ Rate-limited — ready in ~28s' instead of a generic failure.

Files changed:
- types.ts: add cooldownReason/cooldownModel to ProfileUsageStats
- usage.ts: stepped formula, model-aware isProfileInCooldown, modelId
  threading through computeNextProfileUsageStats/markAuthProfileFailure
- model-fallback.ts: FallbackSummaryError class, model-aware availability
  check, soonestCooldownExpiry computation
- pi-embedded-runner/run.ts: thread modelId into failure recording
- agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit
  detection branch in error handler
- usage.test.ts: update expected cooldown value (60s → 30s)
altaywtf pushed a commit to kiranvk-2011/openclaw that referenced this pull request Mar 24, 2026
…mit message

Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three
cooldown-related issues:

1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive
   exponential formula (1m → 5m → 25m → 1h) that locked out providers
   for far longer than the actual API rate-limit window.

2. Per-model cooldown scoping: rate_limit cooldowns now record which
   model triggered them. When a different model on the same auth profile
   is requested, the cooldown is bypassed — so one model hitting a 429
   no longer blocks all other models on the same provider.

3. FallbackSummaryError with soonest-expiry countdown: when all
   candidates are exhausted, the user sees a clear message like
   '⚠️ Rate-limited — ready in ~28s' instead of a generic failure.

Files changed:
- types.ts: add cooldownReason/cooldownModel to ProfileUsageStats
- usage.ts: stepped formula, model-aware isProfileInCooldown, modelId
  threading through computeNextProfileUsageStats/markAuthProfileFailure
- model-fallback.ts: FallbackSummaryError class, model-aware availability
  check, soonestCooldownExpiry computation
- pi-embedded-runner/run.ts: thread modelId into failure recording
- agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit
  detection branch in error handler
- usage.test.ts: update expected cooldown value (60s → 30s)
altaywtf pushed a commit to kiranvk-2011/openclaw that referenced this pull request Mar 24, 2026
…mit message

Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three
cooldown-related issues:

1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive
   exponential formula (1m → 5m → 25m → 1h) that locked out providers
   for far longer than the actual API rate-limit window.

2. Per-model cooldown scoping: rate_limit cooldowns now record which
   model triggered them. When a different model on the same auth profile
   is requested, the cooldown is bypassed — so one model hitting a 429
   no longer blocks all other models on the same provider.

3. FallbackSummaryError with soonest-expiry countdown: when all
   candidates are exhausted, the user sees a clear message like
   '⚠️ Rate-limited — ready in ~28s' instead of a generic failure.

Files changed:
- types.ts: add cooldownReason/cooldownModel to ProfileUsageStats
- usage.ts: stepped formula, model-aware isProfileInCooldown, modelId
  threading through computeNextProfileUsageStats/markAuthProfileFailure
- model-fallback.ts: FallbackSummaryError class, model-aware availability
  check, soonestCooldownExpiry computation
- pi-embedded-runner/run.ts: thread modelId into failure recording
- agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit
  detection branch in error handler
- usage.test.ts: update expected cooldown value (60s → 30s)
altaywtf pushed a commit to kiranvk-2011/openclaw that referenced this pull request Mar 25, 2026
…mit message

Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three
cooldown-related issues:

1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive
   exponential formula (1m → 5m → 25m → 1h) that locked out providers
   for far longer than the actual API rate-limit window.

2. Per-model cooldown scoping: rate_limit cooldowns now record which
   model triggered them. When a different model on the same auth profile
   is requested, the cooldown is bypassed — so one model hitting a 429
   no longer blocks all other models on the same provider.

3. FallbackSummaryError with soonest-expiry countdown: when all
   candidates are exhausted, the user sees a clear message like
   '⚠️ Rate-limited — ready in ~28s' instead of a generic failure.

Files changed:
- types.ts: add cooldownReason/cooldownModel to ProfileUsageStats
- usage.ts: stepped formula, model-aware isProfileInCooldown, modelId
  threading through computeNextProfileUsageStats/markAuthProfileFailure
- model-fallback.ts: FallbackSummaryError class, model-aware availability
  check, soonestCooldownExpiry computation
- pi-embedded-runner/run.ts: thread modelId into failure recording
- agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit
  detection branch in error handler
- usage.test.ts: update expected cooldown value (60s → 30s)
altaywtf pushed a commit to kiranvk-2011/openclaw that referenced this pull request Mar 25, 2026
…mit message

Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three
cooldown-related issues:

1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive
   exponential formula (1m → 5m → 25m → 1h) that locked out providers
   for far longer than the actual API rate-limit window.

2. Per-model cooldown scoping: rate_limit cooldowns now record which
   model triggered them. When a different model on the same auth profile
   is requested, the cooldown is bypassed — so one model hitting a 429
   no longer blocks all other models on the same provider.

3. FallbackSummaryError with soonest-expiry countdown: when all
   candidates are exhausted, the user sees a clear message like
   '⚠️ Rate-limited — ready in ~28s' instead of a generic failure.

Files changed:
- types.ts: add cooldownReason/cooldownModel to ProfileUsageStats
- usage.ts: stepped formula, model-aware isProfileInCooldown, modelId
  threading through computeNextProfileUsageStats/markAuthProfileFailure
- model-fallback.ts: FallbackSummaryError class, model-aware availability
  check, soonestCooldownExpiry computation
- pi-embedded-runner/run.ts: thread modelId into failure recording
- agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit
  detection branch in error handler
- usage.test.ts: update expected cooldown value (60s → 30s)
altaywtf pushed a commit to kiranvk-2011/openclaw that referenced this pull request Mar 25, 2026
…mit message

Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three
cooldown-related issues:

1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive
   exponential formula (1m → 5m → 25m → 1h) that locked out providers
   for far longer than the actual API rate-limit window.

2. Per-model cooldown scoping: rate_limit cooldowns now record which
   model triggered them. When a different model on the same auth profile
   is requested, the cooldown is bypassed — so one model hitting a 429
   no longer blocks all other models on the same provider.

3. FallbackSummaryError with soonest-expiry countdown: when all
   candidates are exhausted, the user sees a clear message like
   '⚠️ Rate-limited — ready in ~28s' instead of a generic failure.

Files changed:
- types.ts: add cooldownReason/cooldownModel to ProfileUsageStats
- usage.ts: stepped formula, model-aware isProfileInCooldown, modelId
  threading through computeNextProfileUsageStats/markAuthProfileFailure
- model-fallback.ts: FallbackSummaryError class, model-aware availability
  check, soonestCooldownExpiry computation
- pi-embedded-runner/run.ts: thread modelId into failure recording
- agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit
  detection branch in error handler
- usage.test.ts: update expected cooldown value (60s → 30s)
@altaywtf
Copy link
Member

Closing this as superseded by the merged implementation.

What shipped instead:

Why this is superseded:

If there is any remaining UX detail here that you think the landed path missed, please point to it on #49834 and we can evaluate a focused follow-up.

Thank you for the contribution, @ToneLoke.

@altaywtf altaywtf closed this Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants