fix(model-selection): clear auto-failover overrides so primary is retried on each turn#69365
Conversation
Greptile SummaryThis PR fixes a session-pinning bug where a transient primary-model failure would permanently route all subsequent turns through the fallback provider until the user manually ran Confidence Score: 5/5Safe to merge — the logic is correct, well-guarded, and follows existing patterns in the codebase. The fix correctly targets only auto-sourced direct session overrides (source === 'session' && modelOverrideSource === 'auto'), preserves user and legacy overrides unchanged, and leaves parent-session overrides untouched. The auth-profile clearing that occurs via applyModelOverrideToSessionEntry with isDefault: true is benign because applyFallbackCandidateSelectionToEntry already replaces or deletes any user auth-profile override when the fallback fires. The persistent-store update follows the identical pattern used in the existing allowedModelKeys reset block. All four new test cases exercise the discriminating conditions. No files require special attention. Reviews (1): Last reviewed commit: "fix(model-selection): clear auto-failove..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7006763d11
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aa9ce6a43b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
aa9ce6a to
08ac24f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 08ac24ffd8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
08ac24f to
71fe22a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1622c8ab02
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
fe1b1f8 to
bbb1c7c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bbb1c7c34a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| provider = autoHealProvider; | ||
| model = autoHealModel; |
There was a problem hiding this comment.
Use self-healed model for status surfaces in same request
This branch now rewrites the in-memory selection to autoHealProvider/autoHealModel after clearing an auto failover override, but the /status path still snapshots selectedProvider/selectedModel before calling createModelSelectionState and renders those stale values (src/plugin-sdk/command-status.runtime.ts:51-57 and :105-114). In sessions with modelOverrideSource: "auto", the first /status after recovery will report the fallback model even though this code has already cleared the override in session storage, so operators get incorrect model state for that turn.
Useful? React with 👍 / 👎.
bbb1c7c to
9605e45
Compare
|
Thanks for the P2 feedback — all valid. Addressed in 9605e45:
|
|
@Chevron7Locked I was preparing the MR for the merge, doing some refactoring. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9605e45098
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (isAutoSessionOverride && sessionEntry && sessionStore && sessionKey && !resetModelOverride) { | ||
| const { updated } = applyModelOverrideToSessionEntry({ |
There was a problem hiding this comment.
Skip auto-heal reset when heartbeat model override is active
This branch runs even when hasResolvedHeartbeatModelOverride is true (skipStoredOverride), and it forces provider/model back to defaultProvider/defaultModel. In heartbeat turns with an explicit heartbeat.model, that means the first run after an auto-failover override no longer uses the configured heartbeat model and instead executes on the default model, which contradicts the explicit heartbeat selection path and can change cost/behavior for that run.
Useful? React with 👍 / 👎.
| const { updated } = applyModelOverrideToSessionEntry({ | ||
| entry: sessionEntry, | ||
| selection: { provider: defaultProvider, model: defaultModel, isDefault: true }, | ||
| }); |
There was a problem hiding this comment.
Keep user auth profile when clearing auto model override
Calling applyModelOverrideToSessionEntry here without profileOverride clears authProfileOverride fields as part of its default-reset behavior. If failover only switched models within the same provider, fallback persistence intentionally keeps the user's scoped auth profile; this auto-heal path then drops it on the next turn, causing an unintended credentials/profile reset after a transient fallback.
Useful? React with 👍 / 👎.
9605e45 to
9bd50c1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9bd50c1944
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const storedOverride = hadDirectAutoSessionOverride | ||
| ? undefined | ||
| : resolveStoredModelOverride({ |
There was a problem hiding this comment.
Reapply parent override after clearing direct auto override
When a child session has a direct modelOverrideSource: "auto", this branch forces storedOverride to undefined after clearing the direct override, so the same turn cannot fall back to a parent-session override. In get-reply.ts, child sessions can inherit a parent /model selection, but after transient failover the next turn now runs on defaultProvider/defaultModel instead of the parent-selected model because resolveStoredModelOverride is bypassed here; that causes at least one request on the wrong model and temporarily ignores the user’s parent override.
Useful? React with 👍 / 👎.
9bd50c1 to
f7ea39b
Compare
f7ea39b to
d70cc44
Compare
…ried on each turn When runWithModelFallback falls back to a secondary provider it writes providerOverride/modelOverride/modelOverrideSource:"auto" to the session. On subsequent turns createModelSelectionState read this stored override and passed the fallback provider directly to runWithModelFallback, so the configured primary was never retried — the session was permanently pinned to the fallback even after the primary recovered. Fix: at model-selection ingress, when the direct session override has modelOverrideSource "auto" (set by a previous automatic fallback, not a user /model command), clear the override and retry the configured primary. If the primary is still down runWithModelFallback will fall back and re-set the auto override for that turn. Once the primary recovers the override stays clear. User-selected overrides (modelOverrideSource "user" or legacy undefined+model) are preserved unchanged. Covered by four new unit tests in model-selection.test.ts: - auto-failover override cleared and primary retried - user-selected override preserved - legacy override without source field preserved - parent-session auto-override applied to child (not cleared by child logic)
…verride clearing Three corrections to the auto-failover self-healing introduced in the prior commit: 1. Reset in-memory provider/model to configured primary after clearing auto override. get-reply-directives.ts preloads provider/model from the stored override before calling createModelSelectionState, so clearing only session state still ran the current turn on the fallback. Now provider/model are reset to defaultProvider/ defaultModel so this turn retries the primary immediately, not on the next turn. 2. Remove resetModelOverride = true from the auto-heal path. That flag triggers a "Model override not allowed for this agent" system event in applyInlineDirectiveOverrides, which is incorrect: the override was valid and set by the fallback loop — it just expired once the primary recovered. Auto-heal is not an allowlist violation. 3. Add a test case that verifies the in-memory reset when the caller pre-loads the fallback provider/model (simulating the get-reply-directives.ts preload path). Known limitation (noted in comment): channel model overrides (channels.modelByChannel) are skipped on the recovery turn because hasSessionModelOverride was true when they were evaluated at preload time. They resume on the following turn once session state is clear. Fixing this cleanly requires changes to the get-reply-directives preload flow and is out of scope for this PR.
d70cc44 to
fcbf830
Compare
|
Landed after maintainer review and focused regression coverage.
Thanks @Chevron7Locked! |
Summary
runWithModelFallbackfalls back to a secondary provider it writesproviderOverride/modelOverride/modelOverrideSource:"auto"to the session. On subsequent turnscreateModelSelectionStatere-applies the stored override and passes the fallback provider directly torunWithModelFallback— the configured primary is never tried again. The session is permanently pinned to the fallback even after the primary recovers./model default. For local/self-hosted primaries that recover in seconds, this silently burns paid API quota on fallbacks for the rest of the session.createModelSelectionState, when the direct session override hasmodelOverrideSource: "auto"(set by a previous automatic fallback, not a user/modelcommand), the override is cleared and the configured primary is retried. If the primary is still downrunWithModelFallbackfalls back and re-sets the override for that turn. Once the primary recovers the override stays clear.modelOverrideSource: "user") and legacy overrides (no source field, backward-compat treated as user) are preserved unchanged. Parent-session auto-overrides are applied to children as before.Change Type
Scope
Root cause trace
Fix
Tests
Four new cases in
model-selection.test.tscovering the happy path, user-override preservation, legacy backward-compat, and parent-session auto-override pass-through.