feat(agents): structured FallbackSummaryError with human-friendly rate limit messages#45763
feat(agents): structured FallbackSummaryError with human-friendly rate limit messages#45763ToneLoke wants to merge 2 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR replaces the generic Key observations:
Confidence Score: 4/5
|
| ? (() => { | ||
| // Build a human-friendly rate limit message with model info and time estimate. | ||
| let backIn = "~60 seconds"; | ||
| let modelInfo = ""; | ||
| if (isFallbackSummaryError(err)) { | ||
| if (err.soonestCooldownExpiry !== null) { | ||
| const secsRemaining = Math.ceil( | ||
| (err.soonestCooldownExpiry - Date.now()) / 1000, | ||
| ); | ||
| if (secsRemaining > 0) { | ||
| if (secsRemaining < 60) { | ||
| backIn = `~${secsRemaining}s`; | ||
| } else if (secsRemaining < 3600) { | ||
| backIn = `~${Math.ceil(secsRemaining / 60)} min`; | ||
| } else { | ||
| backIn = `~${Math.ceil(secsRemaining / 3600)} hr`; | ||
| } | ||
| } else { | ||
| backIn = "any moment"; | ||
| } | ||
| } | ||
| const limitedModels = err.attempts | ||
| .filter((a) => a.reason === "rate_limit") | ||
| .map((a) => a.model); | ||
| if (limitedModels.length > 0) { | ||
| modelInfo = ` (${limitedModels.join(", ")} rate limited)`; | ||
| } | ||
| } | ||
| return `⚡ Temporarily unavailable${modelInfo} — back in ${backIn}.`; | ||
| })() |
There was a problem hiding this comment.
IIFE inside ternary makes this hard to maintain
The immediately-invoked function expression inside the multi-level ternary chain significantly reduces readability. If the rate limit message logic ever needs to grow (e.g. adding support for overloaded sub-cases), it will be difficult to locate and modify here.
Consider extracting the logic into a named helper:
function buildRateLimitFallbackText(err: unknown): string {
let backIn = "~60 seconds";
let modelInfo = "";
if (isFallbackSummaryError(err)) {
if (err.soonestCooldownExpiry !== null) {
const secsRemaining = Math.ceil((err.soonestCooldownExpiry - Date.now()) / 1000);
if (secsRemaining > 0) {
if (secsRemaining < 60) {
backIn = `~${secsRemaining}s`;
} else if (secsRemaining < 3600) {
backIn = `~${Math.ceil(secsRemaining / 60)} min`;
} else {
backIn = `~${Math.ceil(secsRemaining / 3600)} hr`;
}
} else {
backIn = "any moment";
}
}
const limitedModels = err.attempts
.filter((a) => a.reason === "rate_limit")
.map((a) => a.model);
if (limitedModels.length > 0) {
modelInfo = ` (${limitedModels.join(", ")} rate limited)`;
}
}
return `⚡ Temporarily unavailable${modelInfo} — back in ${backIn}.`;
}
Then the ternary simply becomes:
: isRateLimit
? buildRateLimitFallbackText(err)
: `⚠️ Agent failed before reply: …`
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/agent-runner-execution.ts
Line: 629-658
Comment:
**IIFE inside ternary makes this hard to maintain**
The immediately-invoked function expression inside the multi-level ternary chain significantly reduces readability. If the rate limit message logic ever needs to grow (e.g. adding support for `overloaded` sub-cases), it will be difficult to locate and modify here.
Consider extracting the logic into a named helper:
```
function buildRateLimitFallbackText(err: unknown): string {
let backIn = "~60 seconds";
let modelInfo = "";
if (isFallbackSummaryError(err)) {
if (err.soonestCooldownExpiry !== null) {
const secsRemaining = Math.ceil((err.soonestCooldownExpiry - Date.now()) / 1000);
if (secsRemaining > 0) {
if (secsRemaining < 60) {
backIn = `~${secsRemaining}s`;
} else if (secsRemaining < 3600) {
backIn = `~${Math.ceil(secsRemaining / 60)} min`;
} else {
backIn = `~${Math.ceil(secsRemaining / 3600)} hr`;
}
} else {
backIn = "any moment";
}
}
const limitedModels = err.attempts
.filter((a) => a.reason === "rate_limit")
.map((a) => a.model);
if (limitedModels.length > 0) {
modelInfo = ` (${limitedModels.join(", ")} rate limited)`;
}
}
return `⚡ Temporarily unavailable${modelInfo} — back in ${backIn}.`;
}
```
Then the ternary simply becomes:
```
: isRateLimit
? buildRateLimitFallbackText(err)
: `⚠️ Agent failed before reply: …`
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
src/agents/model-fallback.ts
Outdated
| if (params.attempts.length <= 1 && params.lastError) { | ||
| throw params.lastError; | ||
| } |
There was a problem hiding this comment.
Single-attempt path bypasses FallbackSummaryError
When params.attempts.length <= 1 && params.lastError, the raw lastError (a plain Error) is re-thrown. This means for users with only a single model configured, isFallbackSummaryError(err) in agent-runner-execution.ts will return false, and the rate-limit handler will display "⚡ Temporarily unavailable — back in ~60 seconds." with no model name and no accurate cooldown estimate — the same defaults used when no structured data is available.
The PR description acknowledges this implicitly ("callers not using the typed check continue to work unchanged"), but since the new rate-limit branch in agent-runner-execution.ts relies on the typed check, this is a real UX gap for single-model setups. Consider wrapping params.lastError in a FallbackSummaryError here too (passing the single attempt and the known soonestCooldownExpiry) rather than re-throwing it bare:
if (params.attempts.length <= 1 && params.lastError) {
if (params.attempts.length === 0) {
throw params.lastError;
}
throw new FallbackSummaryError(
params.attempts[0]
? `${params.label} failed: ${params.formatAttempt(params.attempts[0])}`
: String(params.lastError),
{
attempts: params.attempts,
soonestCooldownExpiry: params.soonestCooldownExpiry ?? null,
cause: params.lastError instanceof Error ? params.lastError : undefined,
},
);
}Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/model-fallback.ts
Line: 217-219
Comment:
**Single-attempt path bypasses `FallbackSummaryError`**
When `params.attempts.length <= 1 && params.lastError`, the raw `lastError` (a plain `Error`) is re-thrown. This means for users with only a single model configured, `isFallbackSummaryError(err)` in `agent-runner-execution.ts` will return `false`, and the rate-limit handler will display `"⚡ Temporarily unavailable — back in ~60 seconds."` with no model name and no accurate cooldown estimate — the same defaults used when no structured data is available.
The PR description acknowledges this implicitly ("callers not using the typed check continue to work unchanged"), but since the new rate-limit branch in `agent-runner-execution.ts` relies on the typed check, this is a real UX gap for single-model setups. Consider wrapping `params.lastError` in a `FallbackSummaryError` here too (passing the single attempt and the known `soonestCooldownExpiry`) rather than re-throwing it bare:
```typescript
if (params.attempts.length <= 1 && params.lastError) {
if (params.attempts.length === 0) {
throw params.lastError;
}
throw new FallbackSummaryError(
params.attempts[0]
? `${params.label} failed: ${params.formatAttempt(params.attempts[0])}`
: String(params.lastError),
{
attempts: params.attempts,
soonestCooldownExpiry: params.soonestCooldownExpiry ?? null,
cause: params.lastError instanceof Error ? params.lastError : undefined,
},
);
}
```
How can I resolve this? If you propose a fix, please make it concise.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 25adf11056
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const isSessionCorruption = /function call turn comes immediately after/i.test(message); | ||
| const isRoleOrderingError = /incorrect role information|roles must alternate/i.test(message); | ||
| const isTransientHttp = isTransientHttpError(message); | ||
| const isRateLimit = isRateLimitErrorMessage(message) || /\(rate_limit\)/i.test(message); |
There was a problem hiding this comment.
Restrict rate-limit fallback messaging to rate-limit-only errors
This condition marks the run as rate-limited whenever the summary text contains "(rate_limit)", but throwFallbackFailureSummary() includes reasons from all attempts in one message; in a mixed failure chain (e.g., first candidate rate-limited, later candidate fails with auth/model-not-found), users will still get ⚡ Temporarily unavailable ... back in ... instead of the real terminal error. That hides actionable failures and can send users down the wrong recovery path, so this branch should require structured FallbackSummaryError evidence that the blocking failure set is actually rate-limit driven.
Useful? React with 👍 / 👎.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 080a724cd7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for (const candidate of candidates) { | ||
| const profileIds = resolveAuthProfileOrder({ | ||
| cfg: params.cfg, | ||
| store: authStore, | ||
| provider: candidate.provider, |
There was a problem hiding this comment.
Derive cooldown ETA from rate-limited attempts only
runWithModelFallback currently computes soonestCooldownExpiry by scanning every candidate provider, not just the attempts that actually failed with reason === "rate_limit". Because auth/billing-disabled profiles also contribute unusableUntil timestamps, the value passed to buildRateLimitFallbackText() can come from an unrelated non-rate-limit path, which under mixed failures can show an overly optimistic “back in …” estimate and prompt users to retry long before the rate-limited model can recover.
Useful? React with 👍 / 👎.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
…mit message Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three cooldown-related issues: 1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive exponential formula (1m → 5m → 25m → 1h) that locked out providers for far longer than the actual API rate-limit window. 2. Per-model cooldown scoping: rate_limit cooldowns now record which model triggered them. When a different model on the same auth profile is requested, the cooldown is bypassed — so one model hitting a 429 no longer blocks all other models on the same provider. 3. FallbackSummaryError with soonest-expiry countdown: when all candidates are exhausted, the user sees a clear message like '⚠️ Rate-limited — ready in ~28s' instead of a generic failure. Files changed: - types.ts: add cooldownReason/cooldownModel to ProfileUsageStats - usage.ts: stepped formula, model-aware isProfileInCooldown, modelId threading through computeNextProfileUsageStats/markAuthProfileFailure - model-fallback.ts: FallbackSummaryError class, model-aware availability check, soonestCooldownExpiry computation - pi-embedded-runner/run.ts: thread modelId into failure recording - agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit detection branch in error handler - usage.test.ts: update expected cooldown value (60s → 30s)
…mit message Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three cooldown-related issues: 1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive exponential formula (1m → 5m → 25m → 1h) that locked out providers for far longer than the actual API rate-limit window. 2. Per-model cooldown scoping: rate_limit cooldowns now record which model triggered them. When a different model on the same auth profile is requested, the cooldown is bypassed — so one model hitting a 429 no longer blocks all other models on the same provider. 3. FallbackSummaryError with soonest-expiry countdown: when all candidates are exhausted, the user sees a clear message like '⚠️ Rate-limited — ready in ~28s' instead of a generic failure. Files changed: - types.ts: add cooldownReason/cooldownModel to ProfileUsageStats - usage.ts: stepped formula, model-aware isProfileInCooldown, modelId threading through computeNextProfileUsageStats/markAuthProfileFailure - model-fallback.ts: FallbackSummaryError class, model-aware availability check, soonestCooldownExpiry computation - pi-embedded-runner/run.ts: thread modelId into failure recording - agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit detection branch in error handler - usage.test.ts: update expected cooldown value (60s → 30s)
…mit message Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three cooldown-related issues: 1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive exponential formula (1m → 5m → 25m → 1h) that locked out providers for far longer than the actual API rate-limit window. 2. Per-model cooldown scoping: rate_limit cooldowns now record which model triggered them. When a different model on the same auth profile is requested, the cooldown is bypassed — so one model hitting a 429 no longer blocks all other models on the same provider. 3. FallbackSummaryError with soonest-expiry countdown: when all candidates are exhausted, the user sees a clear message like '⚠️ Rate-limited — ready in ~28s' instead of a generic failure. Files changed: - types.ts: add cooldownReason/cooldownModel to ProfileUsageStats - usage.ts: stepped formula, model-aware isProfileInCooldown, modelId threading through computeNextProfileUsageStats/markAuthProfileFailure - model-fallback.ts: FallbackSummaryError class, model-aware availability check, soonestCooldownExpiry computation - pi-embedded-runner/run.ts: thread modelId into failure recording - agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit detection branch in error handler - usage.test.ts: update expected cooldown value (60s → 30s)
…e limit messages Replace generic Error with FallbackSummaryError when all model fallback candidates are exhausted. The new error carries: - Structured attempt metadata (model, reason, error per attempt) - Soonest cooldown expiry timestamp across all candidate providers The agent runner uses this to show users a friendly message instead of raw error dumps: ⚡ Temporarily unavailable (sonnet rate limited) — back in ~3 min. Instead of:⚠️ Agent failed before reply: All models failed (3): anthropic/... Also adds isRateLimitErrorMessage detection to the error classifier so rate limit errors are caught even when not wrapped in FallbackSummaryError.
…lper - Wrap single-attempt failures in FallbackSummaryError so single-model configs get accurate cooldown timing and model names in rate limit UX - Add explicit TODO + null pass for image model fallback cooldown expiry - Extract IIFE rate limit message builder to named buildRateLimitFallbackText() helper for maintainability
080a724 to
53628d2
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 53628d2343
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (params.attempts.length === 1 && params.lastError) { | ||
| throw new FallbackSummaryError( | ||
| `${params.label} failed: ${params.formatAttempt(params.attempts[0])}`, | ||
| { |
There was a problem hiding this comment.
Preserve raw transient HTTP errors on single-candidate failure
Wrapping the single-attempt failure in FallbackSummaryError changes messages like "521 ..."/"503 ..." into "models failed: ...", which prevents the retry path in runAgentTurnWithFallback from recognizing transient HTTP failures (isTransientHttpError only matches leading HTTP status text). In single-model setups (no fallbacks), this turns a previously retried transient outage into an immediate user-facing failure, reducing reliability for short-lived provider incidents.
Useful? React with 👍 / 👎.
…mit message Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three cooldown-related issues: 1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive exponential formula (1m → 5m → 25m → 1h) that locked out providers for far longer than the actual API rate-limit window. 2. Per-model cooldown scoping: rate_limit cooldowns now record which model triggered them. When a different model on the same auth profile is requested, the cooldown is bypassed — so one model hitting a 429 no longer blocks all other models on the same provider. 3. FallbackSummaryError with soonest-expiry countdown: when all candidates are exhausted, the user sees a clear message like '⚠️ Rate-limited — ready in ~28s' instead of a generic failure. Files changed: - types.ts: add cooldownReason/cooldownModel to ProfileUsageStats - usage.ts: stepped formula, model-aware isProfileInCooldown, modelId threading through computeNextProfileUsageStats/markAuthProfileFailure - model-fallback.ts: FallbackSummaryError class, model-aware availability check, soonestCooldownExpiry computation - pi-embedded-runner/run.ts: thread modelId into failure recording - agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit detection branch in error handler - usage.test.ts: update expected cooldown value (60s → 30s)
…mit message Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three cooldown-related issues: 1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive exponential formula (1m → 5m → 25m → 1h) that locked out providers for far longer than the actual API rate-limit window. 2. Per-model cooldown scoping: rate_limit cooldowns now record which model triggered them. When a different model on the same auth profile is requested, the cooldown is bypassed — so one model hitting a 429 no longer blocks all other models on the same provider. 3. FallbackSummaryError with soonest-expiry countdown: when all candidates are exhausted, the user sees a clear message like '⚠️ Rate-limited — ready in ~28s' instead of a generic failure. Files changed: - types.ts: add cooldownReason/cooldownModel to ProfileUsageStats - usage.ts: stepped formula, model-aware isProfileInCooldown, modelId threading through computeNextProfileUsageStats/markAuthProfileFailure - model-fallback.ts: FallbackSummaryError class, model-aware availability check, soonestCooldownExpiry computation - pi-embedded-runner/run.ts: thread modelId into failure recording - agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit detection branch in error handler - usage.test.ts: update expected cooldown value (60s → 30s)
…mit message Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three cooldown-related issues: 1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive exponential formula (1m → 5m → 25m → 1h) that locked out providers for far longer than the actual API rate-limit window. 2. Per-model cooldown scoping: rate_limit cooldowns now record which model triggered them. When a different model on the same auth profile is requested, the cooldown is bypassed — so one model hitting a 429 no longer blocks all other models on the same provider. 3. FallbackSummaryError with soonest-expiry countdown: when all candidates are exhausted, the user sees a clear message like '⚠️ Rate-limited — ready in ~28s' instead of a generic failure. Files changed: - types.ts: add cooldownReason/cooldownModel to ProfileUsageStats - usage.ts: stepped formula, model-aware isProfileInCooldown, modelId threading through computeNextProfileUsageStats/markAuthProfileFailure - model-fallback.ts: FallbackSummaryError class, model-aware availability check, soonestCooldownExpiry computation - pi-embedded-runner/run.ts: thread modelId into failure recording - agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit detection branch in error handler - usage.test.ts: update expected cooldown value (60s → 30s)
…mit message Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three cooldown-related issues: 1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive exponential formula (1m → 5m → 25m → 1h) that locked out providers for far longer than the actual API rate-limit window. 2. Per-model cooldown scoping: rate_limit cooldowns now record which model triggered them. When a different model on the same auth profile is requested, the cooldown is bypassed — so one model hitting a 429 no longer blocks all other models on the same provider. 3. FallbackSummaryError with soonest-expiry countdown: when all candidates are exhausted, the user sees a clear message like '⚠️ Rate-limited — ready in ~28s' instead of a generic failure. Files changed: - types.ts: add cooldownReason/cooldownModel to ProfileUsageStats - usage.ts: stepped formula, model-aware isProfileInCooldown, modelId threading through computeNextProfileUsageStats/markAuthProfileFailure - model-fallback.ts: FallbackSummaryError class, model-aware availability check, soonestCooldownExpiry computation - pi-embedded-runner/run.ts: thread modelId into failure recording - agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit detection branch in error handler - usage.test.ts: update expected cooldown value (60s → 30s)
…mit message Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three cooldown-related issues: 1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive exponential formula (1m → 5m → 25m → 1h) that locked out providers for far longer than the actual API rate-limit window. 2. Per-model cooldown scoping: rate_limit cooldowns now record which model triggered them. When a different model on the same auth profile is requested, the cooldown is bypassed — so one model hitting a 429 no longer blocks all other models on the same provider. 3. FallbackSummaryError with soonest-expiry countdown: when all candidates are exhausted, the user sees a clear message like '⚠️ Rate-limited — ready in ~28s' instead of a generic failure. Files changed: - types.ts: add cooldownReason/cooldownModel to ProfileUsageStats - usage.ts: stepped formula, model-aware isProfileInCooldown, modelId threading through computeNextProfileUsageStats/markAuthProfileFailure - model-fallback.ts: FallbackSummaryError class, model-aware availability check, soonestCooldownExpiry computation - pi-embedded-runner/run.ts: thread modelId into failure recording - agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit detection branch in error handler - usage.test.ts: update expected cooldown value (60s → 30s)
…mit message Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three cooldown-related issues: 1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive exponential formula (1m → 5m → 25m → 1h) that locked out providers for far longer than the actual API rate-limit window. 2. Per-model cooldown scoping: rate_limit cooldowns now record which model triggered them. When a different model on the same auth profile is requested, the cooldown is bypassed — so one model hitting a 429 no longer blocks all other models on the same provider. 3. FallbackSummaryError with soonest-expiry countdown: when all candidates are exhausted, the user sees a clear message like '⚠️ Rate-limited — ready in ~28s' instead of a generic failure. Files changed: - types.ts: add cooldownReason/cooldownModel to ProfileUsageStats - usage.ts: stepped formula, model-aware isProfileInCooldown, modelId threading through computeNextProfileUsageStats/markAuthProfileFailure - model-fallback.ts: FallbackSummaryError class, model-aware availability check, soonestCooldownExpiry computation - pi-embedded-runner/run.ts: thread modelId into failure recording - agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit detection branch in error handler - usage.test.ts: update expected cooldown value (60s → 30s)
…mit message Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three cooldown-related issues: 1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive exponential formula (1m → 5m → 25m → 1h) that locked out providers for far longer than the actual API rate-limit window. 2. Per-model cooldown scoping: rate_limit cooldowns now record which model triggered them. When a different model on the same auth profile is requested, the cooldown is bypassed — so one model hitting a 429 no longer blocks all other models on the same provider. 3. FallbackSummaryError with soonest-expiry countdown: when all candidates are exhausted, the user sees a clear message like '⚠️ Rate-limited — ready in ~28s' instead of a generic failure. Files changed: - types.ts: add cooldownReason/cooldownModel to ProfileUsageStats - usage.ts: stepped formula, model-aware isProfileInCooldown, modelId threading through computeNextProfileUsageStats/markAuthProfileFailure - model-fallback.ts: FallbackSummaryError class, model-aware availability check, soonestCooldownExpiry computation - pi-embedded-runner/run.ts: thread modelId into failure recording - agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit detection branch in error handler - usage.test.ts: update expected cooldown value (60s → 30s)
…mit message Combines ideas from PRs openclaw#45113, openclaw#31962, and openclaw#45763 to address three cooldown-related issues: 1. Stepped cooldown (30s → 1m → 5m cap) replaces the aggressive exponential formula (1m → 5m → 25m → 1h) that locked out providers for far longer than the actual API rate-limit window. 2. Per-model cooldown scoping: rate_limit cooldowns now record which model triggered them. When a different model on the same auth profile is requested, the cooldown is bypassed — so one model hitting a 429 no longer blocks all other models on the same provider. 3. FallbackSummaryError with soonest-expiry countdown: when all candidates are exhausted, the user sees a clear message like '⚠️ Rate-limited — ready in ~28s' instead of a generic failure. Files changed: - types.ts: add cooldownReason/cooldownModel to ProfileUsageStats - usage.ts: stepped formula, model-aware isProfileInCooldown, modelId threading through computeNextProfileUsageStats/markAuthProfileFailure - model-fallback.ts: FallbackSummaryError class, model-aware availability check, soonestCooldownExpiry computation - pi-embedded-runner/run.ts: thread modelId into failure recording - agent-runner-execution.ts: buildCopilotCooldownMessage helper, rate-limit detection branch in error handler - usage.test.ts: update expected cooldown value (60s → 30s)
|
Closing this as superseded by the merged implementation. What shipped instead:
Why this is superseded:
If there is any remaining UX detail here that you think the landed path missed, please point to it on #49834 and we can evaluate a focused follow-up. Thank you for the contribution, @ToneLoke. |
Problem
When all model fallback candidates are exhausted due to rate limits, users see raw error dumps like:
This is confusing and doesn't tell the user when service will resume.
Solution
1. Structured
FallbackSummaryErrorReplace the generic
Errorthrown bythrowFallbackFailureSummary()with a typedFallbackSummaryErrorthat carries:attempts: structured metadata per attempt (model, reason, error)soonestCooldownExpiry: Unix ms timestamp of the earliest profile cooldown expiry2. Human-friendly rate limit messages
The agent runner detects rate limit errors (via
isRateLimitErrorMessage+/rate_limit/pattern) and uses the structured error to show:Time estimates are derived from actual profile cooldown expiry data, formatted as seconds/minutes/hours as appropriate.
Impact
model-fallback.tsandagent-runner-execution.tsFallbackSummaryErrorextendsErrorisFallbackSummaryError()type guard exported for downstream use