fix(agents): announce auto model fallback transitions#82676
Conversation
|
Codex review: needs changes before merge. Summary Reproducibility: yes. from source: the PR head builds fallback notices only in runReplyAgent, while createFollowupRunner still performs model fallback and sends followup payloads without any fallback notice construction. I did not run a failing test in this read-only review. Real behavior proof Next step before merge Security Review findings
Review detailsBest possible solution: Keep the merged recovery behavior from #82707 and land fallback/recovery notices after the same notice contract covers normal replies, side-effect-only replies, ACP/TTS exclusions, and queued followup delivery. Do we have a high-confidence way to reproduce the issue? Yes, from source: the PR head builds fallback notices only in runReplyAgent, while createFollowupRunner still performs model fallback and sends followup payloads without any fallback notice construction. I did not run a failing test in this read-only review. Is this the best way to solve the issue? No, not as patched. The notice approach fits the merged primary-probe recovery baseline, but it should be shared with or mirrored in the queued followup path before merge. Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 532e42213d03. |
6711684 to
0302eba
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0302eba4f4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const shouldClearDirectAutoFallbackOverride = | ||
| hasDirectAutoFallbackOverride && | ||
| params.clearDirectAutoFallbackOverride !== false && | ||
| !hasOneTurnModelOverride; |
There was a problem hiding this comment.
Preserve read-only status lookups from clearing auto overrides
createModelSelectionState now clears direct auto fallback overrides unless callers explicitly pass clearDirectAutoFallbackOverride: false. That default is safe for real reply runs, but read-only callers like resolveDirectStatusReplyForSession (in src/plugin-sdk/command-status.runtime.ts) call this helper without that flag, so a /status-style inspection can silently persist a session mutation and drop the active fallback model. This changes a read operation into a write and can unexpectedly switch subsequent turns back to primary even when no reply turn occurred.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d75195817f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| modelOverrideFallbackOriginProvider: params.originProvider, | ||
| modelOverrideFallbackOriginModel: params.originModel, | ||
| modelOverrideFallbackLastProbeAt: params.now, | ||
| authProfileOverride: params.authProfileId, |
There was a problem hiding this comment.
Keep primary-probe timestamp tied to actual primary attempts
modelOverrideFallbackLastProbeAt is set unconditionally whenever fallback persistence writes a new auto override, even when the run started from an already-fallback model and only moved to a different fallback candidate. In that case resolveFallbackSelectionOrigin keeps the original primary in modelOverrideFallbackOrigin*, but this new timestamp still refreshes the cooldown used by shouldProbeAutoFallbackPrimary, so repeated fallback-to-fallback transitions can keep postponing primary recovery probes without ever retrying the primary. This makes primary recovery slower or effectively starved under unstable fallback chains.
Useful? React with 👍 / 👎.
1a9b17a to
0441b24
Compare
|
Landed on main as 4b0f16d. Gate:
|
Summary
Behavior addressed
Auto fallback was effectively sticky and mostly invisible: after a selected/default model failed, OpenClaw could keep using the fallback model without a clear per-session chat notice. The PR makes fallback/recovery visible and probes the selected model again every five minutes while auto fallback is pinned.
Real environment tested
Local OpenClaw source checkout on macOS, rebased onto current
origin/main.Exact steps or command run after this patch
pnpm docs:listOPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=180000 OPENCLAW_TEST_PROJECTS_SERIAL=1 node scripts/run-vitest.mjs src/agents/agent-scope.test.ts src/auto-reply/reply/reply-utils.test.ts src/auto-reply/reply/dispatch-acp-delivery.test.ts src/auto-reply/reply/dispatch-from-config.test.tsOPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=180000 OPENCLAW_TEST_PROJECTS_SERIAL=1 node scripts/run-vitest.mjs run --config test/vitest/vitest.e2e.config.ts src/auto-reply/reply/agent-runner.runreplyagent.e2e.test.tspnpm tsgo:coregit diff --check origin/main...HEADcodex-review --mode branch --base origin/mainEvidence after fix
Observed result after fix
When auto fallback changes model state, OpenClaw emits visible status notices once per transition:
↪️ Model Fallback: <fallback> (selected <primary>; <reason>)↪️ Model Fallback cleared: <primary> (was <fallback>)The fallback notice bypasses source-reply suppression, does not consume the first assistant reply slot, and is excluded from ACP transcript accumulation and TTS.
What was not tested
No live provider outage was forced against production credentials; the behavior is covered by focused unit/E2E tests with simulated model failure and fallback recovery.