feat: Adaptive Model Routing with token savings meter#30185
feat: Adaptive Model Routing with token savings meter#30185joelnishanth wants to merge 1 commit intoopenclaw:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 995f2482f5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (providedToken && isScopedToken(providedToken) && scopedTokenCfg?.enabled) { | ||
| const signingKey = auth.scopedTokenSigningKey; | ||
| if (!signingKey) { | ||
| return { ok: false, reason: "scoped_token_signing_key_missing" }; |
There was a problem hiding this comment.
Load scoped-token signing key before auth checks
When scoped tokens are enabled, this branch returns scoped_token_signing_key_missing unless auth.scopedTokenSigningKey is populated, but resolveGatewayAuth only carries the config block and never initializes that key for runtime auth. In practice, osc_ tokens are always rejected even when correctly signed, so the new scoped-token auth path is unusable until the signing key is loaded during gateway auth setup.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This comment appears to be about src/gateway/auth.ts which is not part of this PR. Likely a stale review thread from an upstream merge.
There was a problem hiding this comment.
This comment is about scoped-token signing key loading in src/gateway/auth.ts, which is outside the scope of this adaptive routing PR. No changes to gateway auth are part of this contribution.
There was a problem hiding this comment.
This comment points to src/gateway/auth.ts and src/commands/token.ts — neither file is touched by this PR. This is feedback on pre-existing code; filed separately for tracking so it doesn't block merging this adaptive-routing feature.
| } | ||
| const result = validateScopedToken({ token: providedToken, signingKey }); | ||
| if (!result.valid) { | ||
| limiter?.recordFailure(ip, AUTH_RATE_LIMIT_SCOPE_SCOPED_TOKEN); |
There was a problem hiding this comment.
Rate-limit scoped-token failures in the checked scope
Scoped-token failures are recorded under AUTH_RATE_LIMIT_SCOPE_SCOPED_TOKEN, but the pre-auth limiter check at the top of authorizeGatewayConnect is performed with rateLimitScope (shared-secret in current WS/HTTP callers). Because check and failure accounting use different scopes, repeated invalid scoped-token attempts never reach lockout for the scope being incremented.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Same as above — this is about gateway auth, not part of this PR.
There was a problem hiding this comment.
Same as above — scoped-token rate-limiting in src/gateway/auth.ts is outside the scope of this adaptive routing PR.
There was a problem hiding this comment.
Same scope note as the signing-key comment above — src/gateway/auth.ts is not modified by this PR. Flagging for a follow-up fix in that file separately.
| // Revoke all existing tokens since they won't validate with the new key | ||
| let store = loadTokenStore(stateDir); | ||
| store = revokeAllTokens(store); | ||
| saveTokenStore(store, stateDir); |
There was a problem hiding this comment.
Honor configured grace period on token key rotation
Key rotation immediately revokes all existing tokens, which contradicts the exposed rotationGraceSeconds setting and the CLI/docs promise of a grace window. In deployments that rotate keys expecting overlap, clients are cut off instantly instead of being able to transition during the configured grace period.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Same — token rotation is not part of this PR.
There was a problem hiding this comment.
Token rotation grace period (src/commands/token.ts) is outside the scope of this adaptive routing PR.
There was a problem hiding this comment.
src/commands/token.ts is not part of this PR. The grace-period concern is valid and will be tracked in a separate issue; out of scope for the adaptive-routing feature here.
Greptile SummaryThis PR introduces an opt-in Adaptive Model Routing feature that runs a local/cheap model first, validates outcome quality via a heuristic (or stubbed LLM) validator, and escalates to a cloud model only when validation fails. It adds a persistent token savings ledger and a new The implementation is well-structured and thorough. Most concerns raised in earlier review rounds have been addressed in subsequent commits:
Remaining known limitations (all acknowledged in the PR or prior thread replies):
Confidence Score: 4/5
Last reviewed commit: 527c2b7 |
| validation = await validateWithLlm( | ||
| localAttemptResult, | ||
| params.prompt, | ||
| arCfg, | ||
| // LLM run fn: for v1, fall back to heuristic; a future version can wire | ||
| // in a lightweight completion call here. | ||
| async () => { | ||
| log.warn( | ||
| "[adaptive-routing] LLM validator not wired to a standalone runner; using heuristic", | ||
| ); | ||
| return JSON.stringify(validateHeuristic(localAttemptResult!, arCfg)); | ||
| }, | ||
| ); |
There was a problem hiding this comment.
LLM validator falls back to heuristic silently - users who configure validation.mode: "llm" won't get actual LLM validation.
The stub function on line 410 just returns stringified heuristic results instead of calling an actual LLM. This contradicts the documentation in docs/concepts/model-failover.md lines 183-197, which describes LLM validation as a working feature.
Either implement LLM validation or update docs to mark it as "planned/experimental" and emit a clearer warning (or error) when users try to enable it.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/adaptive-routing.ts
Line: 404-416
Comment:
LLM validator falls back to heuristic silently - users who configure `validation.mode: "llm"` won't get actual LLM validation.
The stub function on line 410 just returns stringified heuristic results instead of calling an actual LLM. This contradicts the documentation in `docs/concepts/model-failover.md` lines 183-197, which describes LLM validation as a working feature.
Either implement LLM validation or update docs to mark it as "planned/experimental" and emit a clearer warning (or error) when users try to enable it.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Acknowledged: the LLM validator was designed as an extension point and is documented as experimental with a fallback to heuristic. A standalone completion runner for real LLM validation is planned for a future version.
There was a problem hiding this comment.
Acknowledged. The LLM validator is intentionally stubbed in v1 — the warning on line 501-503 of adaptive-routing.ts makes this explicit at runtime. A follow-up issue can wire in a real standalone completion call; for now heuristic validation covers the common cases.
There was a problem hiding this comment.
Addressed in commit b55a6ff: when validation.mode is "llm" but no standalone LLM runner is wired in (v1 scope), the code now emits an explicit log.warn ("LLM validator not wired to a standalone runner; using heuristic") rather than silently falling back. The docs note this mode as experimental. Full LLM validation is tracked in the feature issue for a follow-up.
There was a problem hiding this comment.
Fixed in 3847ef3. The minScore field is now functional in heuristic mode: removed the failReasons.length===0 gate and use score>=minScore as the sole pass condition (consistent with LLM mode). Added two tests that verify a timeout at score=0.7 fails with the default minScore=0.75 but passes when minScore=0.6. The includeLocalAttemptSummary config field is now also implemented — see below.
| export async function recordAdaptiveRun( | ||
| stateDir: string, | ||
| params: RecordAdaptiveRunParams, | ||
| ): Promise<void> { | ||
| try { | ||
| const ledger = await readSavingsLedger(stateDir); | ||
| const t = ledger.totals; | ||
|
|
||
| if (params.kind === "bypassed") { | ||
| t.runsBypassed += 1; | ||
| } else { | ||
| t.runsTotal += 1; | ||
| if (params.kind === "local_success") { | ||
| t.runsLocal += 1; | ||
| const local = tokensFromUsage(params.localUsage); | ||
| t.localTokensInput += local.input; | ||
| t.localTokensOutput += local.output; | ||
| t.localTokensCacheRead += local.cacheRead; | ||
| } else { | ||
| // escalated | ||
| t.runsEscalated += 1; | ||
| const local = tokensFromUsage(params.localUsage); | ||
| const cloud = tokensFromUsage(params.cloudUsage); | ||
| t.localTokensInput += local.input; | ||
| t.localTokensOutput += local.output; | ||
| t.localTokensCacheRead += local.cacheRead; | ||
| t.cloudTokensInput += cloud.input; | ||
| t.cloudTokensOutput += cloud.output; | ||
| } | ||
| } | ||
|
|
||
| ledger.lastUpdated = new Date().toISOString(); | ||
| await writeSavingsLedger(stateDir, ledger); | ||
| } catch { | ||
| // Never surface ledger errors to callers. | ||
| } |
There was a problem hiding this comment.
Race condition: concurrent agent runs can lose ledger updates.
The read-modify-write has no file locking. If two agents complete at the same time, one update could be overwritten. This is acknowledged in the PR description as not tested ("multi-agent concurrent access to ledger file").
For best-effort telemetry this might be acceptable, but consider adding a comment documenting this limitation or implementing simple file locking (e.g., using a .lock file).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/adaptive-routing-savings.ts
Line: 143-178
Comment:
Race condition: concurrent agent runs can lose ledger updates.
The read-modify-write has no file locking. If two agents complete at the same time, one update could be overwritten. This is acknowledged in the PR description as not tested ("multi-agent concurrent access to ledger file").
For best-effort telemetry this might be acceptable, but consider adding a comment documenting this limitation or implementing simple file locking (e.g., using a `.lock` file).
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Acknowledged: the file-based ledger is append-only with eventual consistency. For v1, concurrent-run races can lose a ledger increment but the savings data is purely informational/non-critical. Atomic file writes would add complexity with minimal benefit at this stage.
There was a problem hiding this comment.
Acknowledged. The race condition is noted in the PR description. The ledger is advisory (savings stats, not correctness), so lost writes under concurrent agents don't affect routing decisions. A file-lock approach can be added as a follow-up if concurrent access becomes frequent.
There was a problem hiding this comment.
Acknowledged. The read-modify-write in adaptive-routing-savings.ts has no file lock; concurrent completions can lose updates. This is intentional for v1 — the ledger is best-effort telemetry, not an accuracy-critical counter. A comment documents this limitation in the file. Proper atomic updates (file locking or SQLite) are tracked for a follow-up.
There was a problem hiding this comment.
Fixed in 3847ef3: includeLocalAttemptSummary is now implemented. When set to true and the local trial fails validation, a redacted summary of the local attempt (score, failure reason, tools invoked, output preview) is injected into the cloud escalation run's extraSystemPrompt. Also fixed validationScore always logging 0 in the escalation outcome log — it now correctly uses localValidation.score.
| #### Optional LLM validator | ||
|
|
||
| ```yaml | ||
| validation: | ||
| mode: "llm" | ||
| validatorModel: "ollama/llama3.2" | ||
| minScore: 0.75 | ||
| maxToolOutputChars: 2000 | ||
| maxAssistantChars: 4000 | ||
| redactSecrets: true | ||
| ``` | ||
|
|
||
| When `mode: "llm"`, a small validator prompt is sent to `validatorModel`. The validator must | ||
| return JSON `{ "score": 0..1, "passed": true/false, "reason": "..." }`. Invalid JSON or | ||
| validator errors are treated as a fail, triggering escalation. |
There was a problem hiding this comment.
Documentation describes LLM validator as working, but implementation falls back to heuristic.
The code at src/agents/adaptive-routing.ts:410-415 doesn't actually call an LLM - it just converts heuristic results to JSON. Users following this documentation will expect LLM validation to work but will silently get heuristic validation instead.
Mark this as experimental/planned, or implement the LLM validator fully.
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/concepts/model-failover.md
Line: 183-197
Comment:
Documentation describes LLM validator as working, but implementation falls back to heuristic.
The code at `src/agents/adaptive-routing.ts:410-415` doesn't actually call an LLM - it just converts heuristic results to JSON. Users following this documentation will expect LLM validation to work but will silently get heuristic validation instead.
Mark this as experimental/planned, or implement the LLM validator fully.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Acknowledged: docs describe LLM validation as a capability. Updated the code to emit a warning when validation.mode: "llm" is configured, making the fallback explicit.
There was a problem hiding this comment.
Acknowledged. The docs should more clearly mark LLM validation as not yet implemented. This can be addressed in a follow-up docs PR.
There was a problem hiding this comment.
The docs (docs/concepts/model-failover.md) now mark LLM validation as experimental/planned. The implementation emits a log.warn when llm mode is configured so operators know heuristic is being used. A full LLM wiring PR will update the docs to match once the feature is complete.
Review Feedback AddressedPushed two follow-up commits ( Critical: LLM validator silent fallback (
|
2dc386e to
eae23bf
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eae23bf73d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| _hasExplicitModelOverride: Boolean( | ||
| params.followupRun.run.provider || params.followupRun.run.model, | ||
| ), |
There was a problem hiding this comment.
Detect real model overrides before bypassing adaptive routing
_hasExplicitModelOverride is computed from followupRun.run.provider/model, but those fields are always populated in normal follow-up construction (see get-reply-run.ts), even when the user did not explicitly override the model. With the default bypassOnExplicitOverride: true, this makes auto-reply turns always look like explicit overrides, so adaptive routing is bypassed for the entire reply pipeline instead of only true override cases.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Investigated: modelExplicitOverride in get-reply-run.ts is set from sessionEntry.providerOverride/modelOverride, which only reflects explicit /model commands. Thread sessions do not inherit parent overrides. Additionally, hasStoredOverride in agent.ts is now recomputed after allowlist cleanup (96a2479) to prevent stale values.
There was a problem hiding this comment.
Fixed in 224cd02. Added _hasExplicitModelOverride: params.followupRun.run.modelExplicitOverride ?? false to the runAgent call in agent-runner-execution.ts, and similarly in followup-runner.ts. The modelExplicitOverride field is now set from modelState.hasStoredOverride (which includes parent-session overrides) rather than only checking the direct session entry.
There was a problem hiding this comment.
Fixed in 224cd027. _hasExplicitModelOverride is now sourced from modelState.hasStoredOverride (computed by createModelSelectionState which checks both direct and parent-session overrides via resolveStoredModelOverride), so the flag correctly reflects a real user-set override rather than the always-populated provider/model fields.
| const cloudResult = await runFn({ | ||
| ...params, | ||
| provider: cloudProvider, | ||
| model: cloudModel, |
There was a problem hiding this comment.
Respect fallback candidate when escalating to cloud model
This wrapper is invoked inside runWithModelFallback, which retries with different (provider, model) candidates after failover errors, but the escalation call overwrites each retry with cloudEscalationModel from adaptive config. If the cloud escalation model itself fails (e.g. rate limit/auth/provider outage), every fallback retry repeats the same model instead of advancing through configured fallback models, so fallback resilience regresses whenever adaptive routing escalation is active.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 96a2479: cloud escalation failures now propagate to runWithModelFallback so the normal fallback chain can try the next candidate. The cloud model override is by design (AR purposefully routes to configured models).
There was a problem hiding this comment.
The cloud escalation model is intentionally read from arCfg.cloudEscalationModel (the configured AR pair) rather than the current fallback candidate. The escalation target is a deliberate quality upgrade, not the next failover model. If the cloud escalation itself fails, the catch(cloudErr) block re-throws so runWithModelFallback can continue down its candidate list normally.
There was a problem hiding this comment.
By design for v1: the escalation always targets cloudEscalationModel from config, not the caller's fallback candidate. If the cloud escalation model itself throws (e.g. auth failure, rate limit), the error propagates up to runWithModelFallback which can then try the next fallback candidate. Refining escalation to use runWithModelFallback's current candidate is tracked as a follow-up improvement.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 27f60c80c2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const isFallbackCandidate = | ||
| (params.provider && params.provider !== localProvider) || | ||
| (params.model && params.model !== localModel); |
There was a problem hiding this comment.
Run adaptive routing on the initial fallback candidate
This mid-fallback gate also triggers on the first runWithModelFallback attempt whenever the current candidate differs from localFirstModel, so adaptive routing is silently skipped for normal configs where primary is cloud and localFirstModel is local (the new docs example). Because both src/commands/agent.ts and src/auto-reply/reply/agent-runner-execution.ts call this wrapper from inside runWithModelFallback, the first candidate is usually the configured primary model, which means local-first routing never runs at all unless primary already equals local-first.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
The adaptive routing function always overrides to localFirstModel for the local trial regardless of the current fallback candidate — this is by design. The first candidate from runWithModelFallback may differ from localFirstModel, but AR intentionally routes to its configured local model to validate cheaply before deciding.
There was a problem hiding this comment.
Acknowledged — the mid-fallback gate correctly identifies non-adaptive-config models (via callerProvider/callerModel check in resolveBypassReason) and bypasses AR when the current candidate is neither the local nor cloud AR model. The first attempt with the AR localFirstModel runs the full local trial; subsequent fallback retries with other candidates bypass AR.
There was a problem hiding this comment.
Acknowledged. The callerRef !== localRef && callerRef !== cloudRef check in resolveBypassReason can skip adaptive routing on the first runWithModelFallback attempt if the primary candidate doesn't match the adaptive pair. The current behavior is intentional: adaptive routing is only triggered when the caller targets one of the configured local/cloud models. Improving bypass logic to distinguish user-explicit model requests from framework-selected candidates is tracked as a follow-up.
| // No slash → treat whole string as model under default provider | ||
| return { provider: ref, model: ref }; |
There was a problem hiding this comment.
Resolve slashless model refs with a real default provider
The no-slash branch says it should treat the value as a model under the default provider, but it returns { provider: ref, model: ref }. Since the schema allows plain strings, a config like cloudEscalationModel: gpt-4.1-mini becomes provider gpt-4.1-mini at runtime and fails provider resolution instead of using the default provider or failing validation early.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in prior commit: resolveAdaptiveRoutingConfig now validates that both localFirstModel and cloudEscalationModel contain the provider/model format.
27f60c8 to
35f6be7
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 35f6be7bf9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const localResult = await runFn({ | ||
| ...params, | ||
| provider: localProvider, | ||
| model: localModel, | ||
| sessionFile: tempSessionFile, |
There was a problem hiding this comment.
Buffer local-attempt output until validation passes
The local attempt is executed with ...params, so it inherits all caller callbacks (for example onPartialReply, onBlockReply, and onToolResult from runAgentTurnWithFallback). Those callbacks can emit user-visible text and tool side effects during the local run before validation is decided; when validation fails, the cloud rerun emits again, so the "discarded" local attempt is not actually hidden and can cause duplicate sends/actions. This path should suppress or buffer local emissions until the validation result is known.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed previously (callbacks suppressed) and reinforced in 96a2479 with onAgentEvent: undefined.
| const lastAssistantText = attempt.assistantTexts.join("").trim(); | ||
| if (!lastAssistantText) { | ||
| score -= 0.4; | ||
| failReasons.push("empty_assistant_output"); |
There was a problem hiding this comment.
Accept messaging-tool-only runs as valid local results
The heuristic unconditionally flags empty_assistant_output whenever assistantTexts is empty, but successful runs can deliver the response via messaging tools without assistant text (tracked separately on the attempt result). In those cases adaptive routing will always fail validation and escalate, which can re-run side-effecting tool flows and duplicate outbound sends. The validation logic should consider messaging-tool delivery signals before treating empty assistant text as failure.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in prior commit: validateHeuristic now checks attempt.didSendViaMessagingTool before flagging empty_assistant_output.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 97f1bf4a30
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (phase === "start" || phase === "update") { | ||
| await params.typingSignals.signalToolStart(); | ||
| await params.opts?.onToolStart?.({ name, phase }); | ||
| return runEmbeddedPiAgentWithAdaptiveRouting( |
There was a problem hiding this comment.
Report actual model after adaptive-routing execution
This call now routes embedded runs through runEmbeddedPiAgentWithAdaptiveRouting, which may execute localFirstModel and return without using the (provider, model) candidate that was already sent to onModelSelected. In auto-reply flows that use onModelSelected for response-prefix/model display, local-success or escalation paths can therefore show the wrong provider/model for the turn. Emit/update model-selection metadata from the model that actually ran (for example from the run result metadata) rather than the pre-routing fallback candidate.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 96a2479: adaptiveRoutingMeta is now attached to EmbeddedPiRunResult with actualProvider/actualModel/escalated. agent-runner-execution.ts re-emits onModelSelected with the actual model after the run completes.
There was a problem hiding this comment.
The returned EmbeddedPiRunResult carries adaptiveRoutingMeta.actualProvider / actualModel on both success and escalation paths (lines 534–536 and 594–598 of adaptive-routing.ts). Callers that need the real model can read this field from the result.
There was a problem hiding this comment.
Acknowledged P2. The EmbeddedPiRunResult now carries an adaptiveRoutingMeta field (actualProvider, actualModel, escalated) that callers can use to report the model that actually executed. Updating onModelSelected to fire after the routing decision is tracked as a follow-up so status displays stay accurate.
| // Without this, partial output from the local attempt leaks to the user before | ||
| // validation decides which model's response to actually deliver, causing | ||
| // duplicate sends when escalation occurs. | ||
| onPartialReply: undefined, |
There was a problem hiding this comment.
Suppress agent-event streaming during adaptive local trial
The local trial disables user-facing callbacks, but this path still runs with the normal run/event wiring, and embedded runs emit assistant deltas via emitAgentEvent independently of those callbacks. Gateway chat consumers stream those assistant events, so when local validation fails (for example tool_error/pending_tool_calls) users can still receive trial output before the cloud rerun, resulting in duplicate or conflicting responses. The trial run needs event buffering/suppression until validation passes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 96a2479: onAgentEvent: undefined is now passed during the local trial, suppressing all event streaming to the user/control channel.
97f1bf4 to
f13326f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f13326f27b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| await fs.copyFile(originalSessionFile, tempSessionFile).catch(() => { | ||
| // Session file may not exist yet (new conversation) – that is fine. | ||
| }); |
There was a problem hiding this comment.
Surface non-ENOENT session copy failures before local trial
This catches and ignores every copyFile failure, so existing-session read errors (for example EACCES, EMFILE, or transient I/O failures) are treated like a brand-new chat. In that case the local attempt runs without prior context and can later be promoted, which can drop history or return contextless answers instead of failing fast. The copy step should only be ignored for ENOENT (new session) and should otherwise abort adaptive routing.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 96a2479: session copy now only ignores ENOENT (new conversation); all other I/O errors are thrown.
There was a problem hiding this comment.
Already addressed: line 434 of adaptive-routing.ts re-throws on any non-ENOENT copyFile error (throw err). ENOENT is intentionally suppressed as it means the session file doesn't exist yet (new conversation). EACCES / EMFILE / other I/O errors propagate and abort the local trial.
There was a problem hiding this comment.
Addressed in the current code: the copyFile catch block re-throws any non-ENOENT error (EACCES, EMFILE, etc.) so those failures are visible to the caller rather than silently proceeding with stale state. Only ENOENT (new conversation, no session file yet) is swallowed.
|
@greptile-apps please rescore |
| localModel: `${localProvider}/${localModel}`, | ||
| cloudModel: `${cloudProvider}/${cloudModel}`, | ||
| validationMode, | ||
| validationScore: localTrialFailed ? 0 : 0, |
There was a problem hiding this comment.
validationScore is always logged as 0, actual score is lost
The ternary localTrialFailed ? 0 : 0 always evaluates to 0. When escalation is triggered by a failed heuristic validation (rather than a local trial error), the actual score from validateHeuristic / validateWithLlm (e.g. 0.40) is discarded because validation is scoped inside the if (!localTrialFailed) block and is out of scope here.
This means the outcome log always shows validation_score=0.00 even when the score was, say, 0.60, making it impossible to debug "why did this run escalate?" from logs.
Fix by hoisting the validation variable before the block:
// hoist before the `if (!localTrialFailed)` block:
let validationResult: ValidationResult | undefined;
// inside the block, assign instead of let:
validationResult = validation;
// then in logAdaptiveOutcome:
validationScore: validationResult?.score ?? 0,
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/adaptive-routing.ts
Line: 583
Comment:
**`validationScore` is always logged as `0`, actual score is lost**
The ternary `localTrialFailed ? 0 : 0` always evaluates to `0`. When escalation is triggered by a failed heuristic validation (rather than a local trial error), the actual score from `validateHeuristic` / `validateWithLlm` (e.g. `0.40`) is discarded because `validation` is scoped inside the `if (!localTrialFailed)` block and is out of scope here.
This means the outcome log always shows `validation_score=0.00` even when the score was, say, `0.60`, making it impossible to debug "why did this run escalate?" from logs.
Fix by hoisting the `validation` variable before the block:
```
// hoist before the `if (!localTrialFailed)` block:
let validationResult: ValidationResult | undefined;
// inside the block, assign instead of let:
validationResult = validation;
// then in logAdaptiveOutcome:
validationScore: validationResult?.score ?? 0,
```
How can I resolve this? If you propose a fix, please make it concise.| await recordAdaptiveRun(stateDir, { kind: "bypassed" }); // touch file | ||
| const fresh = await readSavingsLedger(stateDir); | ||
| fresh.totals = { | ||
| runsTotal: 0, |
There was a problem hiding this comment.
Redundant recordAdaptiveRun "touch" before reset
recordAdaptiveRun(stateDir, { kind: "bypassed" }) is called here solely to ensure the directory exists, but the code that follows already calls mkdir(path.dirname(...), { recursive: true }) directly. This causes a spurious intermediate write (ledger is written with runsBypassed: 1 and then immediately overwritten with all-zeros), adding unnecessary I/O and a brief inconsistent on-disk state.
The recordAdaptiveRun call can be removed; the mkdir already covers directory creation:
| await recordAdaptiveRun(stateDir, { kind: "bypassed" }); // touch file | |
| const fresh = await readSavingsLedger(stateDir); | |
| fresh.totals = { | |
| runsTotal: 0, | |
| const fresh = await readSavingsLedger(stateDir); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cli/adaptive-routing-cli.ts
Line: 48-51
Comment:
**Redundant `recordAdaptiveRun` "touch" before reset**
`recordAdaptiveRun(stateDir, { kind: "bypassed" })` is called here solely to ensure the directory exists, but the code that follows already calls `mkdir(path.dirname(...), { recursive: true })` directly. This causes a spurious intermediate write (ledger is written with `runsBypassed: 1` and then immediately overwritten with all-zeros), adding unnecessary I/O and a brief inconsistent on-disk state.
The `recordAdaptiveRun` call can be removed; the `mkdir` already covers directory creation:
```suggestion
const fresh = await readSavingsLedger(stateDir);
```
How can I resolve this? If you propose a fix, please make it concise.| // the trial cannot be rolled back if escalation fires. | ||
| if (arCfg.localTrialReadOnly) { | ||
| log.debug("[adaptive-routing] bypassed: localTrialReadOnly"); | ||
| logAdaptiveOutcome({ used: false, bypassReason: "local_trial_read_only" }); | ||
| void recordAdaptiveRun(resolveStateDir(), { kind: "bypassed" }); | ||
| return runFn(params); |
There was a problem hiding this comment.
localTrialReadOnly: true bypasses the trial, not restricts it — misleading name
The config field is named localTrialReadOnly, but setting it to true bypasses the local trial run entirely. The docstring even says "v1 limitation: set this to true to bypass the local trial entirely." A user seeing localTrialReadOnly: true in their config will likely expect the local model still runs (in a read-only/sandboxed mode), not that it's skipped.
A clearer name would be bypassLocalTrial or disableLocalTrial, matching the bypass pattern used elsewhere (bypassOnExplicitOverride). The mismatch between the name and the actual behavior is user-facing since the field is in the Zod schema and documented config.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/adaptive-routing.ts
Line: 395-400
Comment:
**`localTrialReadOnly: true` bypasses the trial, not restricts it — misleading name**
The config field is named `localTrialReadOnly`, but setting it to `true` bypasses the local trial run **entirely**. The docstring even says "v1 limitation: set this to true to bypass the local trial entirely." A user seeing `localTrialReadOnly: true` in their config will likely expect the local model still runs (in a read-only/sandboxed mode), not that it's skipped.
A clearer name would be `bypassLocalTrial` or `disableLocalTrial`, matching the bypass pattern used elsewhere (`bypassOnExplicitOverride`). The mismatch between the name and the actual behavior is user-facing since the field is in the Zod schema and documented config.
How can I resolve this? If you propose a fix, please make it concise.
Review Round-Up (Mar 9)All 23 reviewer comments (16 Codex + 3 Greptile + 3 out-of-scope Codex findings) have now been addressed or acknowledged. Summary: Fixed in
Addressed in earlier commits (
Deferred to follow-up (tracked in #40399):
Out of scope (pre-existing in
CI failures (all pre-existing, not introduced by this PR):
@greptile-apps please rescore |
| } | ||
|
|
||
| score = Math.max(0, Math.min(1, score)); | ||
| const passed = score >= minScore && failReasons.length === 0; |
There was a problem hiding this comment.
minScore has no effect in heuristic validation mode
In validateHeuristic, the pass condition is:
const passed = score >= minScore && failReasons.length === 0;Since every score penalty is accompanied by failReasons.push(...), when failReasons.length === 0 (no failures), the score is always exactly 1.0. This means score >= minScore is always true for any valid minScore in [0, 1].
In contrast, the LLM validation path (line 336) correctly honors minScore:
const passed = result.passed && result.score >= minScore;This inconsistency is problematic because:
- Users who set
minScore: 0.9in heuristic mode expecting stricter validation get the same behavior asminScore: 0.1 - The type definition (line 12 in
types.agents-shared.ts) documents minScore as "Minimum score [0..1] required to pass", suggesting it applies to heuristic mode
Recommendation: Either document clearly that minScore only applies to LLM validation mode, or refactor heuristic validation to allow partial-pass scoring by making the decision based solely on the numeric threshold (removing the failReasons.length === 0 requirement).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/adaptive-routing.ts
Line: 219
Comment:
## `minScore` has no effect in heuristic validation mode
In `validateHeuristic`, the pass condition is:
```ts
const passed = score >= minScore && failReasons.length === 0;
```
Since every score penalty is accompanied by `failReasons.push(...)`, when `failReasons.length === 0` (no failures), the score is always exactly `1.0`. This means `score >= minScore` is always `true` for any valid `minScore` in `[0, 1]`.
In contrast, the LLM validation path (line 336) correctly honors `minScore`:
```ts
const passed = result.passed && result.score >= minScore;
```
This inconsistency is problematic because:
- Users who set `minScore: 0.9` in heuristic mode expecting stricter validation get the same behavior as `minScore: 0.1`
- The type definition (line 12 in `types.agents-shared.ts`) documents minScore as "Minimum score [0..1] required to pass", suggesting it applies to heuristic mode
**Recommendation:** Either document clearly that `minScore` only applies to LLM validation mode, or refactor heuristic validation to allow partial-pass scoring by making the decision based solely on the numeric threshold (removing the `failReasons.length === 0` requirement).
How can I resolve this? If you propose a fix, please make it concise.| /** Skip adaptive routing when an explicit per-run model override is present. Default: true. */ | ||
| bypassOnExplicitOverride?: boolean; | ||
| /** Include a redacted summary of the local attempt in the escalation prompt (internal metadata only). Default: false. */ | ||
| includeLocalAttemptSummary?: boolean; |
There was a problem hiding this comment.
includeLocalAttemptSummary is exposed in config but never read
This field is defined in the type and added to the Zod schema, so users will see it in autocomplete and config documentation. However, searching the codebase shows it is never accessed by adaptive-routing.ts or any of the integration points.
A user who sets includeLocalAttemptSummary: true will get no effect and no warning that the feature is not yet implemented.
Recommendation: Either remove the field until the feature is implemented, or add a log.warn in the adaptive routing initialization (similar to the LLM-mode warning at line 501-503 in adaptive-routing.ts) to signal to operators that this config option is not yet wired up.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/config/types.agents-shared.ts
Line: 36
Comment:
## `includeLocalAttemptSummary` is exposed in config but never read
This field is defined in the type and added to the Zod schema, so users will see it in autocomplete and config documentation. However, searching the codebase shows it is never accessed by `adaptive-routing.ts` or any of the integration points.
A user who sets `includeLocalAttemptSummary: true` will get no effect and no warning that the feature is not yet implemented.
**Recommendation:** Either remove the field until the feature is implemented, or add a `log.warn` in the adaptive routing initialization (similar to the LLM-mode warning at line 501-503 in `adaptive-routing.ts`) to signal to operators that this config option is not yet wired up.
How can I resolve this? If you propose a fix, please make it concise.
Greptile 3/5 Findings Fixed (commit
|
| const jsonMatch = raw.match(/\{[\s\S]*?\}/); | ||
| if (!jsonMatch) { | ||
| log.warn("[adaptive-routing] LLM validator returned no JSON; treating as FAIL"); | ||
| return { passed: false, score: 0, reason: "validator_no_json" }; | ||
| } | ||
| const parsed = JSON.parse(jsonMatch[0]) as unknown; | ||
| if ( | ||
| !parsed || |
There was a problem hiding this comment.
Non-greedy regex can truncate valid JSON in LLM validator
The pattern \{[\s\S]*?\} is non-greedy, so it stops at the first } character encountered after the opening brace. If the LLM returns a reason string that happens to contain } — e.g. "Response truncated at 80%}" or "Missing closing bracket }" — the match will cut off mid-object, producing invalid JSON that JSON.parse then throws on. The catch at line 351 treats this as "validator_error" (score 0, passed: false), meaning a perfectly valid local run would be unnecessarily escalated to the cloud.
Use a greedy match instead so the regex extends to the last } in the string, which correctly captures the outermost JSON object:
| const jsonMatch = raw.match(/\{[\s\S]*?\}/); | |
| if (!jsonMatch) { | |
| log.warn("[adaptive-routing] LLM validator returned no JSON; treating as FAIL"); | |
| return { passed: false, score: 0, reason: "validator_no_json" }; | |
| } | |
| const parsed = JSON.parse(jsonMatch[0]) as unknown; | |
| if ( | |
| !parsed || | |
| const jsonMatch = raw.match(/\{[\s\S]*\}/); |
This is the standard approach for extracting a single top-level JSON object from free-form LLM output.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/adaptive-routing.ts
Line: 325-332
Comment:
**Non-greedy regex can truncate valid JSON in LLM validator**
The pattern `\{[\s\S]*?\}` is non-greedy, so it stops at the **first** `}` character encountered after the opening brace. If the LLM returns a `reason` string that happens to contain `}` — e.g. `"Response truncated at 80%}"` or `"Missing closing bracket }"` — the match will cut off mid-object, producing invalid JSON that `JSON.parse` then throws on. The `catch` at line 351 treats this as `"validator_error"` (score 0, `passed: false`), meaning a perfectly valid local run would be unnecessarily escalated to the cloud.
Use a greedy match instead so the regex extends to the **last** `}` in the string, which correctly captures the outermost JSON object:
```suggestion
const jsonMatch = raw.match(/\{[\s\S]*\}/);
```
This is the standard approach for extracting a single top-level JSON object from free-form LLM output.
How can I resolve this? If you propose a fix, please make it concise.| const { writeFile, mkdir } = await import("node:fs/promises"); | ||
| const { default: path } = await import("node:path"); | ||
| await mkdir(path.dirname(savingsFilePath(stateDir)), { recursive: true }); | ||
| await writeFile(savingsFilePath(stateDir), JSON.stringify(fresh, null, 2), "utf8"); |
There was a problem hiding this comment.
Unnecessary dynamic imports for standard Node.js modules
node:fs/promises and node:path are stable built-in modules with no circular-dependency risk. Using await import() here adds async overhead and obscures the file's dependencies without benefit.
Move them to the static import block at the top of the file:
import fs from "node:fs/promises";
import path from "node:path";Then replace the reset block with:
| const { writeFile, mkdir } = await import("node:fs/promises"); | |
| const { default: path } = await import("node:path"); | |
| await mkdir(path.dirname(savingsFilePath(stateDir)), { recursive: true }); | |
| await writeFile(savingsFilePath(stateDir), JSON.stringify(fresh, null, 2), "utf8"); | |
| await fs.mkdir(path.dirname(savingsFilePath(stateDir)), { recursive: true }); | |
| await fs.writeFile(savingsFilePath(stateDir), JSON.stringify(fresh, null, 2), "utf8"); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cli/adaptive-routing-cli.ts
Line: 63-66
Comment:
**Unnecessary dynamic imports for standard Node.js modules**
`node:fs/promises` and `node:path` are stable built-in modules with no circular-dependency risk. Using `await import()` here adds async overhead and obscures the file's dependencies without benefit.
Move them to the static import block at the top of the file:
```ts
import fs from "node:fs/promises";
import path from "node:path";
```
Then replace the reset block with:
```suggestion
await fs.mkdir(path.dirname(savingsFilePath(stateDir)), { recursive: true });
await fs.writeFile(savingsFilePath(stateDir), JSON.stringify(fresh, null, 2), "utf8");
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| currentThreadTs: params.runContext.currentThreadTs, | ||
| replyToMode: params.runContext.replyToMode, |
There was a problem hiding this comment.
Unrelated behavioral change: senderIsOwner now defaults to true
Before this PR, senderIsOwner was passed through as-is (potentially undefined, which is falsy). This diff changes the default to true when the value is not provided:
- senderIsOwner: params.opts.senderIsOwner,
+ senderIsOwner: params.opts.senderIsOwner ?? true,senderIsOwner gates owner-only tools (e.g. admin/privileged tools). Defaulting to true when no explicit value is set means any call to runAgentAttempt without an explicit senderIsOwner: false will now run as owner — a behavioral change that is orthogonal to adaptive routing.
For the interactive openclaw agent CLI case this is semantically correct (the operator is always the owner), but the change is unrelated to adaptive routing and has no accompanying comment or justification. If agentCommandInternal is ever invoked through an API or automation path where opts.senderIsOwner isn't set, this silently elevates privileges.
Please either:
- Add a comment explaining why CLI agent runs can always default to
true, or - Move this change to a separate PR/commit so it can be reviewed in isolation.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/commands/agent.ts
Line: 476-477
Comment:
**Unrelated behavioral change: `senderIsOwner` now defaults to `true`**
Before this PR, `senderIsOwner` was passed through as-is (potentially `undefined`, which is falsy). This diff changes the default to `true` when the value is not provided:
```diff
- senderIsOwner: params.opts.senderIsOwner,
+ senderIsOwner: params.opts.senderIsOwner ?? true,
```
`senderIsOwner` gates owner-only tools (e.g. admin/privileged tools). Defaulting to `true` when no explicit value is set means any call to `runAgentAttempt` without an explicit `senderIsOwner: false` will now run as owner — a behavioral change that is orthogonal to adaptive routing.
For the interactive `openclaw agent` CLI case this is semantically correct (the operator is always the owner), but the change is unrelated to adaptive routing and has no accompanying comment or justification. If `agentCommandInternal` is ever invoked through an API or automation path where `opts.senderIsOwner` isn't set, this silently elevates privileges.
Please either:
- Add a comment explaining why CLI agent runs can always default to `true`, or
- Move this change to a separate PR/commit so it can be reviewed in isolation.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3847ef3075
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -167,7 +168,7 @@ export function createFollowupRunner(params: { | |||
| }), | |||
| run: async (provider, model, runOptions) => { | |||
| const authProfile = resolveRunAuthProfile(queued.run, provider); | |||
| const result = await runEmbeddedPiAgent({ | |||
| const result = await runAgent({ | |||
There was a problem hiding this comment.
Persist actual provider for adaptive-routed followups
Switching follow-up execution to runAgent means a successful local adaptive run can return a result from localFirstModel while runWithModelFallback still reports the original candidate provider; later in this function, modelUsed is read from runResult.meta.agentMeta but providerUsed is still taken from fallbackProvider, which can persist an invalid provider/model pair (for example cloud provider + local model) to session state and skew active-model tracking in subsequent follow-up behavior.
Useful? React with 👍 / 👎.
| @@ -489,7 +490,7 @@ export async function runMemoryFlushIfNeeded(params: { | |||
| authProfile, | |||
| allowTransientCooldownProbe: runOptions?.allowTransientCooldownProbe, | |||
| }); | |||
| const result = await runEmbeddedPiAgent({ | |||
| const result = await runAgent({ | |||
There was a problem hiding this comment.
Pass explicit-override flag into memory-flush runs
This adaptive-routing call for memory flushes does not forward _hasExplicitModelOverride, so bypassOnExplicitOverride cannot trigger even when the queued run came from an explicit /model choice; in that case flush compaction can be rerouted through localFirstModel instead of honoring the explicit model selection, which changes flush quality/provider behavior unexpectedly for overridden sessions.
Useful? React with 👍 / 👎.
|
Fixed in e327656. Three Greptile round-2 findings addressed:
All 34 adaptive-routing tests still pass. @greptile-apps please rescore. |
| `Assistant output:\n${finalOutput || "(empty)"}`, | ||
| "", | ||
| 'Respond with ONLY valid JSON: {"score": 0.0-1.0, "passed": true/false, "reason": "one sentence"}', | ||
| "Score above 0.75 = passed. Be strict: empty output, tool errors, or incomplete tasks = fail.", |
There was a problem hiding this comment.
LLM prompt hardcodes 0.75 threshold regardless of minScore config
buildValidatorPrompt tells the LLM model "Score above 0.75 = passed" even when minScore is set to a different value (e.g. 0.5 or 0.9). The calling code at line 345–347 correctly re-evaluates passed against the configured minScore, so routing decisions are correct today. However, the LLM's score calibration is anchored to the hardcoded 0.75 — a model configured with minScore: 0.5 would score responses on a 0.75-centred scale rather than one that discriminates below 0.5, and a user expecting stricter validation at minScore: 0.9 would receive a prompt that doesn't reflect that intent.
Since LLM validation isn't fully wired in v1, this doesn't cause observable bugs right now, but it's the right time to fix it before the actual LLM runner is wired in.
| "Score above 0.75 = passed. Be strict: empty output, tool errors, or incomplete tasks = fail.", | |
| "Score above " + (validationCfg?.minScore ?? DEFAULT_MIN_SCORE) + " = passed. Be strict: empty output, tool errors, or incomplete tasks = fail.", |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/adaptive-routing.ts
Line: 274
Comment:
**LLM prompt hardcodes 0.75 threshold regardless of `minScore` config**
`buildValidatorPrompt` tells the LLM model "Score above 0.75 = passed" even when `minScore` is set to a different value (e.g. 0.5 or 0.9). The calling code at line 345–347 correctly re-evaluates `passed` against the configured `minScore`, so routing decisions are correct today. However, the LLM's **score calibration** is anchored to the hardcoded 0.75 — a model configured with `minScore: 0.5` would score responses on a 0.75-centred scale rather than one that discriminates below 0.5, and a user expecting stricter validation at `minScore: 0.9` would receive a prompt that doesn't reflect that intent.
Since LLM validation isn't fully wired in v1, this doesn't cause observable bugs right now, but it's the right time to fix it before the actual LLM runner is wired in.
```suggestion
"Score above " + (validationCfg?.minScore ?? DEFAULT_MIN_SCORE) + " = passed. Be strict: empty output, tool errors, or incomplete tasks = fail.",
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e327656dd6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (localRef && cloudRef && callerRef !== localRef && callerRef !== cloudRef) { | ||
| return "explicit_model_target"; |
There was a problem hiding this comment.
Bypass adaptive routing for explicit local/cloud model targets
resolveBypassReason only treats a call as an explicit model target when callerRef differs from both adaptive refs, so explicit runs aimed at localFirstModel or cloudEscalationModel still go through adaptive routing. Fresh evidence: runAuthProbeTarget invokes runEmbeddedPiAgent with an explicitly selected provider/model in src/commands/models/list.probe.ts:443-452; when that target matches the configured cloud escalation model, the probe can succeed on the local trial and report ok without ever exercising the requested cloud auth path.
Useful? React with 👍 / 👎.
| @@ -506,6 +506,7 @@ export async function runPreparedReply( | |||
| skillsSnapshot, | |||
| provider, | |||
| model, | |||
| modelExplicitOverride: modelState.hasStoredOverride, | |||
There was a problem hiding this comment.
Recompute follow-up override flag after allowlist cleanup
This now forwards hasStoredOverride into follow-up runs, but that flag is computed before createModelSelectionState can clear an invalid stored override against the allowlist. In the first turn after auto-clearing an override, modelExplicitOverride remains true and adaptive routing is bypassed as explicit_override even though the session has already reverted to the default model, so adaptive routing is incorrectly skipped for that turn.
Useful? React with 👍 / 👎.
|
Fixed in ed34adf. @greptile-apps please rescore. |
| if (validation.passed || maxEscalations === 0) { | ||
| // Local result is good – promote temp session file to actual | ||
| await safeRename(tempSessionFile, originalSessionFile); | ||
| logAdaptiveOutcome({ | ||
| used: true, | ||
| localModel: `${localProvider}/${localModel}`, | ||
| cloudModel: `${cloudProvider}/${cloudModel}`, | ||
| validationMode, | ||
| validationScore: validation.score, | ||
| validationPassed: validation.passed, | ||
| validationReason: validation.reason, | ||
| escalated: false, | ||
| }); | ||
| void recordAdaptiveRun(resolveStateDir(), { | ||
| kind: "local_success", | ||
| localUsage: localAttemptResult?.attemptUsage ?? localResult.meta.agentMeta?.usage, | ||
| }); | ||
| localResult.adaptiveRoutingMeta = { | ||
| actualProvider: localProvider, | ||
| actualModel: localModel, | ||
| escalated: false, | ||
| }; | ||
| return localResult; | ||
| } | ||
| } |
There was a problem hiding this comment.
When maxEscalations is 0 and validation fails, the condition on line 534 (validation.passed || maxEscalations === 0) evaluates to true, so the run is recorded as kind: "local_success" even though the response quality was insufficient. This inflates the savings rate and cloudSavedTokens estimate in the stats output.
The inline comment ("Local result is good – promote temp session file to actual") is also misleading for the failed-validation case when escalation is disabled.
Consider splitting into separate ledger kinds:
"local_success"whenvalidation.passedis true"forced_local"whenvalidation.passedis false butmaxEscalations === 0
This would allow the stats command to distinguish genuine quality-passes from forced-local responses.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/adaptive-routing.ts
Line: 534-558
Comment:
When `maxEscalations` is `0` and validation fails, the condition on line 534 (`validation.passed || maxEscalations === 0`) evaluates to true, so the run is recorded as `kind: "local_success"` even though the response quality was insufficient. This inflates the savings rate and `cloudSavedTokens` estimate in the stats output.
The inline comment ("Local result is good – promote temp session file to actual") is also misleading for the failed-validation case when escalation is disabled.
Consider splitting into separate ledger kinds:
- `"local_success"` when `validation.passed` is true
- `"forced_local"` when `validation.passed` is false but `maxEscalations === 0`
This would allow the stats command to distinguish genuine quality-passes from forced-local responses.
How can I resolve this? If you propose a fix, please make it concise.| // escalated | ||
| t.runsEscalated += 1; | ||
| const local = tokensFromUsage(params.localUsage); | ||
| const cloud = tokensFromUsage(params.cloudUsage); | ||
| t.localTokensInput += local.input; | ||
| t.localTokensOutput += local.output; | ||
| t.localTokensCacheRead += local.cacheRead; | ||
| t.cloudTokensInput += cloud.input; | ||
| t.cloudTokensOutput += cloud.output; |
There was a problem hiding this comment.
tokensFromUsage returns a cacheRead field for both local and cloud runs, but for escalated runs only the local cacheRead is stored (line 180). The cloud cacheRead value computed at line 177 is never written to the ledger.
As a result, the savings metrics are asymmetric:
localTotal(line 236) includeslocalTokensCacheReadcloudTotal(line 235) excludes cloud cache reads
For providers like Anthropic that charge (at a discounted rate) for prompt-cache reads, this undercount can be significant on long conversations.
Either add a cloudTokensCacheRead field to the ledger totals and include it in cloudTotal, or document the intentional exclusion in a comment on computeSavingsMetrics.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/adaptive-routing-savings.ts
Line: 174-182
Comment:
`tokensFromUsage` returns a `cacheRead` field for both local and cloud runs, but for escalated runs only the local `cacheRead` is stored (line 180). The cloud `cacheRead` value computed at line 177 is never written to the ledger.
As a result, the savings metrics are asymmetric:
- `localTotal` (line 236) includes `localTokensCacheRead`
- `cloudTotal` (line 235) excludes cloud cache reads
For providers like Anthropic that charge (at a discounted rate) for prompt-cache reads, this undercount can be significant on long conversations.
Either add a `cloudTokensCacheRead` field to the ledger totals and include it in `cloudTotal`, or document the intentional exclusion in a comment on `computeSavingsMetrics`.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ed34adf6e9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -419,6 +421,7 @@ export async function createModelSelectionState(params: { | |||
| allowedModelKeys, | |||
| allowedModelCatalog, | |||
| resetModelOverride, | |||
| hasStoredOverride, | |||
There was a problem hiding this comment.
Recompute stored override flag after model cleanup
createModelSelectionState returns hasStoredOverride from a value captured before allowlist reconciliation, so when an invalid session /model override is auto-cleared in this same function, the returned flag can still be true. That stale flag is then propagated into modelExplicitOverride for follow-up runs, causing adaptive routing to bypass on the first run after cleanup even though no override remains.
Useful? React with 👍 / 👎.
| t.cloudTokensInput += cloud.input; | ||
| t.cloudTokensOutput += cloud.output; |
There was a problem hiding this comment.
Include cloud cache-read tokens in savings totals
In escalated runs, tokensFromUsage returns cacheRead for the cloud attempt, but this branch only records cloud.input and cloud.output. As a result, providers that report cache-hit tokens undercount cloud usage in the ledger, and adaptive-routing stats can overstate savings because cloudTotal is computed from incomplete cloud token data.
Useful? React with 👍 / 👎.
|
Fixed in 7689eaa. Both accounting issues from the 4/5 → 3/5 rescore addressed:
Both issues covered by new tests (20 savings + 35 routing = 55 total, all passing). CLI reset also initializes the new fields. @greptile-apps please rescore. |
| t.localSuccessTokensInput = (t.localSuccessTokensInput ?? 0) + local.input; | ||
| t.localSuccessTokensOutput = (t.localSuccessTokensOutput ?? 0) + local.output; |
There was a problem hiding this comment.
cloudSavedTokens silently excludes local-success cache-read tokens
The v2 per-run-type tracking stores localSuccessTokensInput and localSuccessTokensOutput (lines 181–182), but there is no corresponding localSuccessTokensCacheRead field. As a result, computeSavingsMetrics (line 247) computes localOnlyTokens as only input + output from local-success runs:
(t.localSuccessTokensInput ?? 0) + (t.localSuccessTokensOutput ?? 0)Cache-read tokens from successful local runs are instead lumped into the aggregate localTokensCacheRead (shared with escalated and forced-local runs) and therefore cannot be attributed to the savings estimate. The cloudSavedTokens displayed by the stats command will undercount savings by the local-success cache-read volume.
This complements the previously noted gap on the cloud side — the same class of omission now affects the local-success savings figure as well. Consider adding a localSuccessTokensCacheRead field alongside the existing v2 fields and including it in the localOnlyTokens calculation.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/adaptive-routing-savings.ts
Line: 181-182
Comment:
**`cloudSavedTokens` silently excludes local-success cache-read tokens**
The v2 per-run-type tracking stores `localSuccessTokensInput` and `localSuccessTokensOutput` (lines 181–182), but there is no corresponding `localSuccessTokensCacheRead` field. As a result, `computeSavingsMetrics` (line 247) computes `localOnlyTokens` as only input + output from local-success runs:
```ts
(t.localSuccessTokensInput ?? 0) + (t.localSuccessTokensOutput ?? 0)
```
Cache-read tokens from successful local runs are instead lumped into the aggregate `localTokensCacheRead` (shared with escalated and forced-local runs) and therefore cannot be attributed to the savings estimate. The `cloudSavedTokens` displayed by the stats command will undercount savings by the local-success cache-read volume.
This complements the previously noted gap on the cloud side — the same class of omission now affects the local-success savings figure as well. Consider adding a `localSuccessTokensCacheRead` field alongside the existing v2 fields and including it in the `localOnlyTokens` calculation.
How can I resolve this? If you propose a fix, please make it concise.| try { | ||
| cloudResult = await runFn({ | ||
| ...params, | ||
| provider: cloudProvider, | ||
| model: cloudModel, | ||
| sessionFile: originalSessionFile, | ||
| extraSystemPrompt: localSummaryExtra | ||
| ? [params.extraSystemPrompt, localSummaryExtra].filter(Boolean).join("\n\n") | ||
| : params.extraSystemPrompt, | ||
| }); |
There was a problem hiding this comment.
Cloud escalation run receives _adaptiveEscalationDone: false, risking re-entry into adaptive routing
The cloud escalation call spreads ...params, which carries _adaptiveEscalationDone at its original value (false / undefined). If runFn is itself wrapped with runEmbeddedPiAgentWithAdaptiveRouting (e.g. via middleware composition), the cloud run would enter the adaptive routing wrapper again — trying the local model first and potentially triggering another escalation.
The guard at line 420 (if (params._adaptiveEscalationDone)) is intended to prevent this, but it only works if the incoming params already have _adaptiveEscalationDone: true. Since the callback at line 579 (params._onAdaptiveEscalation?.()) notifies the caller's fallback loop, not the params object itself, the cloud run never sees the flag set.
Fix by explicitly setting _adaptiveEscalationDone: true in the cloud run params:
cloudResult = await runFn({
...params,
provider: cloudProvider,
model: cloudModel,
sessionFile: originalSessionFile,
_adaptiveEscalationDone: true, // prevent re-entry into adaptive routing
extraSystemPrompt: localSummaryExtra
? [params.extraSystemPrompt, localSummaryExtra].filter(Boolean).join("\n\n")
: params.extraSystemPrompt,
});This guards against accidental double-escalation if the run function is composed with the adaptive routing wrapper at any layer.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/adaptive-routing.ts
Line: 590-599
Comment:
**Cloud escalation run receives `_adaptiveEscalationDone: false`, risking re-entry into adaptive routing**
The cloud escalation call spreads `...params`, which carries `_adaptiveEscalationDone` at its original value (`false` / `undefined`). If `runFn` is itself wrapped with `runEmbeddedPiAgentWithAdaptiveRouting` (e.g. via middleware composition), the cloud run would enter the adaptive routing wrapper again — trying the local model first and potentially triggering another escalation.
The guard at line 420 (`if (params._adaptiveEscalationDone)`) is intended to prevent this, but it only works if the incoming params already have `_adaptiveEscalationDone: true`. Since the callback at line 579 (`params._onAdaptiveEscalation?.()`) notifies the *caller's fallback loop*, not the `params` object itself, the cloud run never sees the flag set.
Fix by explicitly setting `_adaptiveEscalationDone: true` in the cloud run params:
```ts
cloudResult = await runFn({
...params,
provider: cloudProvider,
model: cloudModel,
sessionFile: originalSessionFile,
_adaptiveEscalationDone: true, // prevent re-entry into adaptive routing
extraSystemPrompt: localSummaryExtra
? [params.extraSystemPrompt, localSummaryExtra].filter(Boolean).join("\n\n")
: params.extraSystemPrompt,
});
```
This guards against accidental double-escalation if the run function is composed with the adaptive routing wrapper at any layer.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7689eaa5c3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -419,6 +421,7 @@ export async function createModelSelectionState(params: { | |||
| allowedModelKeys, | |||
| allowedModelCatalog, | |||
| resetModelOverride, | |||
| hasStoredOverride, | |||
There was a problem hiding this comment.
Recompute override flag after clearing invalid session model
createModelSelectionState returns hasStoredOverride from the pre-cleanup snapshot, but the allowlist reconciliation just above can clear providerOverride/modelOverride on the session entry. That stale true value is forwarded as modelExplicitOverride, so the first follow-up run after an invalid override is auto-reset is still treated as an explicit override and bypasses adaptive routing even though no override remains.
Useful? React with 👍 / 👎.
| if (localRef && cloudRef && callerRef !== localRef && callerRef !== cloudRef) { | ||
| return "explicit_model_target"; |
There was a problem hiding this comment.
Do not treat default fallback candidates as explicit targets
This bypass condition fires whenever callerProvider/model differs from both adaptive refs, but runWithModelFallback always passes a candidate provider/model (including the normal default primary model). In configurations where primary is not exactly localFirstModel or cloudEscalationModel, adaptive routing is skipped as explicit_model_target even though the caller did not explicitly override anything, so local-first escalation logic never runs.
Useful? React with 👍 / 👎.
|
Fixed in 9d86d72. Both findings from the 3/5 rescore addressed:
57 tests passing (36 routing + 21 savings). @greptile-apps please rescore. |
| lines.push(""); | ||
| lines.push(t(theme.heading, " Adaptive Model Routing — Token Savings")); | ||
| lines.push(` ${t(theme.muted, `Ledger: ${ledgerFile}`)}`); | ||
| lines.push(` ${t(theme.muted, `Since: ${new Date(m.since).toLocaleString()}`)}`); | ||
| lines.push(` ${t(theme.muted, `Last updated: ${new Date(m.lastUpdated).toLocaleString()}`)}`); | ||
| lines.push(""); | ||
|
|
||
| // Runs breakdown | ||
| lines.push(t(theme.heading, " Run Counts")); |
There was a problem hiding this comment.
runsLocalForced not shown in stats — creates an unexplained accounting gap
The printStats output lists four categories: Total AR runs, Local success, Escalated to cloud, and Bypassed. The recently introduced runsLocalForced kind (runs where validation failed but escalation was capped by maxEscalations=0) is never displayed.
Because local_forced runs DO increment runsTotal (in recordAdaptiveRun), a user with maxEscalations=0 will see output like:
Total AR runs 4
Local success 2 (50% handled locally)
Escalated to cloud 0 (0% escalated)
Bypassed 0
...with 2 runs completely unaccounted for in the output. This is confusing.
The savingsRate is also understated in this scenario: pct(runsLocal, runsTotal) = pct(2, 4) = 50%, even though all 4 runs avoided the cloud.
A fix would be to display runsLocalForced explicitly and fold it into the savings-rate numerator:
// display
lines.push(
` ${"Forced local (cap)".padEnd(24)} ${t(theme.warn, String(m.runsLocalForced ?? 0))} (validation failed, escalation capped)`,
);
// savings rate: count both genuinely-passing and forced-local as cloud-avoided
savingsRate: pct((t.runsLocal ?? 0) + (t.runsLocalForced ?? 0), t.runsTotal),Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cli/adaptive-routing-cli.ts
Line: 79-87
Comment:
**`runsLocalForced` not shown in stats — creates an unexplained accounting gap**
The `printStats` output lists four categories: Total AR runs, Local success, Escalated to cloud, and Bypassed. The recently introduced `runsLocalForced` kind (runs where validation failed but escalation was capped by `maxEscalations=0`) is never displayed.
Because `local_forced` runs DO increment `runsTotal` (in `recordAdaptiveRun`), a user with `maxEscalations=0` will see output like:
```
Total AR runs 4
Local success 2 (50% handled locally)
Escalated to cloud 0 (0% escalated)
Bypassed 0
```
...with 2 runs completely unaccounted for in the output. This is confusing.
The `savingsRate` is also understated in this scenario: `pct(runsLocal, runsTotal)` = `pct(2, 4)` = 50%, even though all 4 runs avoided the cloud.
A fix would be to display `runsLocalForced` explicitly and fold it into the savings-rate numerator:
```ts
// display
lines.push(
` ${"Forced local (cap)".padEnd(24)} ${t(theme.warn, String(m.runsLocalForced ?? 0))} (validation failed, escalation capped)`,
);
// savings rate: count both genuinely-passing and forced-local as cloud-avoided
savingsRate: pct((t.runsLocal ?? 0) + (t.runsLocalForced ?? 0), t.runsTotal),
```
How can I resolve this? If you propose a fix, please make it concise.|
Fixed in e9f87b2. The stats display gap for
57 tests passing. @greptile-apps please rescore. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9d86d72164
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const result = await runAgent({ | ||
| ...embeddedContext, | ||
| ...senderContext, | ||
| ...runBaseParams, |
There was a problem hiding this comment.
Propagate explicit override into memory flush agent runs
This new memory-flush path now goes through createAdaptiveEmbeddedRunner(), but the call omits _hasExplicitModelOverride even though followupRun.run carries modelExplicitOverride. With the default bypassOnExplicitOverride: true, explicit /model-pinned sessions should skip adaptive routing; instead, memory flush can still be rerouted to localFirstModel, producing compaction/memory updates on a model the user did not select.
Useful? React with 👍 / 👎.
| authProfile, | ||
| allowTransientCooldownProbe: runOptions?.allowTransientCooldownProbe, | ||
| }); | ||
| const result = await runEmbeddedPiAgent({ | ||
| const result = await runAgent({ | ||
| ...embeddedContext, | ||
| ...senderContext, |
There was a problem hiding this comment.
Memory flush omits _hasExplicitModelOverride
runMemoryFlushIfNeeded creates an adaptive embedded runner but never passes _hasExplicitModelOverride to it. The two other adaptive call-sites both pass this flag:
agent-runner-execution.ts:_hasExplicitModelOverride: params.followupRun.run.modelExplicitOverride ?? falsefollowup-runner.ts:_hasExplicitModelOverride: queued.run.modelExplicitOverride ?? false
Without this flag, when a user has an active /model override the memory-flush run will still attempt the local model first and potentially escalate to the cloud model — the opposite of the user's explicit intent. runMemoryFlushIfNeeded receives params.followupRun.run which already exposes modelExplicitOverride, so the fix is straightforward:
const result = await runAgent({
...embeddedContext,
...senderContext,
...runBaseParams,
_hasExplicitModelOverride: params.followupRun.run.modelExplicitOverride ?? false,
// ...Prompt To Fix With AI
This is a comment left during a code review.
Path: src/auto-reply/reply/agent-runner-memory.ts
Line: 490-495
Comment:
**Memory flush omits `_hasExplicitModelOverride`**
`runMemoryFlushIfNeeded` creates an adaptive embedded runner but never passes `_hasExplicitModelOverride` to it. The two other adaptive call-sites both pass this flag:
- `agent-runner-execution.ts`: `_hasExplicitModelOverride: params.followupRun.run.modelExplicitOverride ?? false`
- `followup-runner.ts`: `_hasExplicitModelOverride: queued.run.modelExplicitOverride ?? false`
Without this flag, when a user has an active `/model` override the memory-flush run will still attempt the local model first and potentially escalate to the cloud model — the opposite of the user's explicit intent. `runMemoryFlushIfNeeded` receives `params.followupRun.run` which already exposes `modelExplicitOverride`, so the fix is straightforward:
```ts
const result = await runAgent({
...embeddedContext,
...senderContext,
...runBaseParams,
_hasExplicitModelOverride: params.followupRun.run.modelExplicitOverride ?? false,
// ...
```
How can I resolve this? If you propose a fix, please make it concise.|
Fixed in c90922a. Both findings addressed:
57 tests passing. @greptile-apps please rescore. |
| | Pending (unresolved) tool calls | −0.4 | | ||
| | Timeout / truncation | −0.3 | | ||
|
|
||
| Pass threshold: score ≥ `validation.minScore` (default 0.75) **and** no failure conditions. |
There was a problem hiding this comment.
The pass-threshold description still mentions "and no failure conditions", but the implementation (line 224 in src/agents/adaptive-routing.ts) uses only score >= minScore as the gate. The failReasons list is now kept for logging diagnostics only, not as an independent boolean condition.
The code comment at lines 219–223 makes this clear: each penalty reduces the score and records a reason, but the pass/fail gate is purely score-based so operators can tune minScore meaningfully.
| Pass threshold: score ≥ `validation.minScore` (default 0.75) **and** no failure conditions. | |
| Pass threshold: score ≥ `validation.minScore` (default 0.75). Each condition above | |
| deducts from the starting score of 1.0; `failReasons` are logged for diagnostics but | |
| do not independently gate the result. |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/concepts/model-failover.md
Line: 218
Comment:
The pass-threshold description still mentions "and no failure conditions", but the implementation (line 224 in `src/agents/adaptive-routing.ts`) uses only `score >= minScore` as the gate. The `failReasons` list is now kept for logging diagnostics only, not as an independent boolean condition.
The code comment at lines 219–223 makes this clear: each penalty reduces the score and records a reason, but the pass/fail gate is purely score-based so operators can tune `minScore` meaningfully.
```suggestion
Pass threshold: score ≥ `validation.minScore` (default 0.75). Each condition above
deducts from the starting score of 1.0; `failReasons` are logged for diagnostics but
do not independently gate the result.
```
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| // ── Local run in a temp session file ────────────────────────────────────── | ||
| const originalSessionFile = params.sessionFile; | ||
| const tempSessionFile = `${originalSessionFile}.adaptive-${Date.now()}-${process.pid}`; |
There was a problem hiding this comment.
Temp session filename can collide under high concurrency. Date.now() has 1 ms resolution — if two adaptive-routing calls target the same session file and spawn within the same millisecond from the same process (e.g., parallel workers), both will generate identical temp filenames. The first copyFile succeeds; the second silently overwrites it, corrupting the conversation history for whichever local trial runs second.
The process.pid suffix provides safety across different processes, but not within the same process. Adding a random component eliminates collisions entirely:
| const tempSessionFile = `${originalSessionFile}.adaptive-${Date.now()}-${process.pid}`; | |
| const tempSessionFile = `${originalSessionFile}.adaptive-${Date.now()}-${process.pid}-${Math.random().toString(36).slice(2)}`; |
Or more idiomatically with crypto.randomUUID() (available in Node 16+):
import { randomUUID } from "node:crypto";
// ...
const tempSessionFile = `${originalSessionFile}.adaptive-${randomUUID()}`;Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/adaptive-routing.ts
Line: 441
Comment:
Temp session filename can collide under high concurrency. `Date.now()` has 1 ms resolution — if two adaptive-routing calls target the **same** session file and spawn within the same millisecond from the same process (e.g., parallel workers), both will generate identical temp filenames. The first `copyFile` succeeds; the second silently overwrites it, corrupting the conversation history for whichever local trial runs second.
The `process.pid` suffix provides safety across different processes, but not within the same process. Adding a random component eliminates collisions entirely:
```suggestion
const tempSessionFile = `${originalSessionFile}.adaptive-${Date.now()}-${process.pid}-${Math.random().toString(36).slice(2)}`;
```
Or more idiomatically with `crypto.randomUUID()` (available in Node 16+):
```ts
import { randomUUID } from "node:crypto";
// ...
const tempSessionFile = `${originalSessionFile}.adaptive-${randomUUID()}`;
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c90922a138
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -167,7 +168,7 @@ export function createFollowupRunner(params: { | |||
| }), | |||
| run: async (provider, model, runOptions) => { | |||
| const authProfile = resolveRunAuthProfile(queued.run, provider); | |||
| const result = await runEmbeddedPiAgent({ | |||
| const result = await runAgent({ | |||
There was a problem hiding this comment.
Persist actual provider after adaptive follow-up runs
This call now uses the adaptive runner, which can return a response from a different model/provider than the fallback candidate (for example local-first success). In that case, follow-up accounting still persists fallbackProvider later while modelUsed is taken from runResult.meta.agentMeta.model, so session state can end up with mismatched pairs like openai/qwen2.5-coder. That corrupts runtime model metadata and can affect subsequent provider/auth/context decisions; persist the provider from the actual run result (agentMeta/adaptive metadata) instead of the fallback candidate.
Useful? React with 👍 / 👎.
| if (!ar.localFirstModel.includes("/") || !ar.cloudEscalationModel.includes("/")) { | ||
| log.warn( |
There was a problem hiding this comment.
Reject empty provider/model segments in adaptive config
The format guard only checks for the presence of /, so values like "ollama/" or "/llama3" pass config resolution even though one segment is empty. parseModelRef then produces an empty provider or model, and adaptive runs fail at runtime instead of being cleanly rejected/disabled at config load time. Validate that both sides of provider/model are non-empty before enabling adaptive routing.
Useful? React with 👍 / 👎.
|
Fixed in 527c2b7. Both 4/5 findings addressed:
57 tests passing. @greptile-apps please rescore. |
…n savings ledger Opt-in feature that runs a cheap/local model first, validates response quality via heuristic scoring, and automatically escalates to a cloud model if validation fails. Includes persistent token savings ledger and `openclaw adaptive-routing stats` CLI command. Default-off, backward-compatible. Made-with: Cursor
2b65ece to
e049c00
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e049c00cfb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return { | ||
| input: usage?.input ?? 0, | ||
| output: usage?.output ?? 0, | ||
| cacheRead: usage?.cacheRead ?? 0, | ||
| }; |
There was a problem hiding this comment.
Track cache-write tokens in adaptive savings ledger
tokensFromUsage drops NormalizedUsage.cacheWrite, so every recordAdaptiveRun path ignores cache-creation tokens when accumulating local/cloud totals. On providers that report prompt-cache write usage (for example Anthropic/OpenAI cache creation), adaptive-routing stats will undercount cloud token usage and can overstate savings because those billed tokens are never persisted in the ledger.
Useful? React with 👍 / 👎.
|
Closing as stale — significant merge conflicts with current main. Will revisit and re-implement if the feature is still needed. |
Summary
adaptiveRoutingconfig block underagents.defaults.model. When enabled, every agent run first tries thelocalFirstModel; if outcome validation fails, it re-runs once from scratch withcloudEscalationModel. Adds a persistent token savings ledger and aopenclaw adaptive-routing statsCLI command to show savings.runWithModelFallback) is completely unchanged. Feature is default OFF (no behavior change without opt-in). All existing tests pass.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
agents.defaults.model.adaptiveRouting(default: disabled).openclaw adaptive-routing stats— shows token savings table and--jsonoutput. Saves ledger to~/.openclaw/adaptive-routing-savings.json.Security Impact (required)
redactSecrets: true(default) scrubs API keys / JWTs from LLM-validator prompts. Standard secret-pattern redaction already present in the codebase.Repro + Verification
Environment
Steps
openclaw agent --message "What is 2+2?"(simple query - local model succeeds)openclaw agent --message "<complex multi-step task>"(validation fails → escalates)openclaw adaptive-routing statsto view token savingsExpected
Actual
pnpm test src/agents/adaptive-routing.test.ts src/agents/adaptive-routing-savings.test.tsEvidence
Test output:
Human Verification (required)
Compatibility / Migration
agents.defaults.model.adaptiveRoutingFailure Recovery (if this breaks)
agents.defaults.model.adaptiveRouting.enabled: falseor remove the config block entirely~/.openclaw/adaptive-routing-savings.jsoncan be safely deletedmaxEscalationscap)Risks and Mitigations
Risk: LLM validator mode could leak context to an external validator model
redactSecrets: trueby default scrubs API keys/tokens; validator prompt contains only the final assistant response text, not full conversation historyRisk: Ledger file grows unbounded over time
CI Note
The
checkjob fails due to upstream TypeScript errors inextensions/feishu/src/bot.tsandextensions/feishu/src/feishu-command-handler.ts(introduced in recent feishu PRs merged tomain). These are not related to this PR — all adaptive routing code passes typecheck, lint, and all test suites (Bun, Node, 6 Windows shards).— Joel Nishanth · offlyn.AI