[codex] Add Pi/Codex harness extension seams#70772
Conversation
Greptile SummaryThis PR adds Pi/Codex harness extension seams: provider-owned hooks for transport extra-param patches, prompt overlays, auth profile preference, and followup delivery routing; a Confidence Score: 5/5Safe to merge — all open findings are P2 (behavioral/design notes, no definitive runtime breakage). The three comments are P2: (1) the expanded orphan-merge trigger scope is a behavioral change that may be intentional, (2) removeLeaf: false creating consecutive user turns is a provider-contract footgun rather than a core defect, and (3) the global strategy singleton is correctly guarded for the documented startup-registration use case. Core correctness — WS pool lifecycle, fallback rollback pairing, auth alias propagation, and the three-way prompt-overlay merge — all look sound. src/agents/pi-embedded-runner/run/attempt.prompt-helpers.ts (trigger guard removal), src/agents/pi-embedded-runner/run/message-merge-strategy.ts (global singleton), src/agents/pi-embedded-runner/run/attempt.ts (removeLeaf=false contract)
|
There was a problem hiding this comment.
Pull request overview
This PR advances the Pi/Codex harness plan by adding additive, provider-owned “seams” (extra param patching, prompt overlays, auth profile preference, and follow-up routing), improving operator observability, and centralizing a few provider-family behaviors (API-family detection, fallback result classification, message merge strategy, and optional WS session pooling).
Changes:
- Add new provider plugin hook surfaces: transport-scoped extra param patching, prompt overlay composition, follow-up fallback routing override, and auth profile preference.
- Improve runtime behavior/observability: resolved provider/model refs in hook events, shared GPT Responses-family detection helper, and GPT-5 embedded-run result classification for fallback eligibility.
- Refine cross-channel message tool discovery by allowing schema contributions to declare which actions they gate, and hide only those actions cross-channel.
Reviewed changes
Copilot reviewed 57 out of 57 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/plugins/types.ts | Adds new provider-owned hook types and contexts for transport params, overlays, auth profile resolution, and follow-up routing. |
| src/plugins/provider-runtime.ts | Exposes new provider hook runtime entrypoints and composes prompt overlays with the built-in GPT-5 overlay. |
| src/plugins/provider-runtime.test.ts | Adds coverage for the new provider runtime seams and prompt overlay composition behavior. |
| src/plugins/provider-hook-runtime.ts | Implements hook-runtime dispatch for the new provider seams. |
| src/plugins/hook-types.ts | Adds llm_output.resolvedRef to improve operator visibility into actual provider/model refs. |
| src/plugin-sdk/agent-harness-runtime.ts | Exports the new harness result-classification type via the plugin SDK surface. |
| src/channels/plugins/types.core.ts | Extends message tool schema contributions with optional actions scoping metadata. |
| src/channels/plugins/message-actions.test.ts | Tests cross-channel filtering behavior based on current-channel-only schema dependencies. |
| src/channels/plugins/message-action-discovery.ts | Adds listCrossChannelSchemaSupportedMessageActions to filter actions gated by channel-scoped schema. |
| src/auto-reply/reply/followup-runner.ts | Adds provider-owned follow-up routing seam and uses embedded-result classifier for fallback decisions. |
| src/auto-reply/reply/followup-runner.test.ts | Covers new routing behavior (dispatcher forcing, drop, and cross-channel route-failure notice). |
| src/auto-reply/reply/agent-runner-execution.ts | Integrates embedded-result classification into fallback flow and adds rollback handling for classified candidates. |
| src/auto-reply/reply/agent-runner-execution.test.ts | Adds tests for GPT-5 terminal classification behaviors and rollback of candidate persistence. |
| src/auto-reply/reply/agent-runner-auth-profile.ts | Uses provider-auth alias resolution when scoping session auth profiles across provider aliases. |
| src/agents/tools/message-tool.ts | Uses the new cross-channel schema-aware action listing for message tool discovery/description. |
| src/agents/tools/message-tool.test.ts | Ensures cross-channel message tool does not advertise actions whose params are hidden by channel-scoped schema. |
| src/agents/provider-auth-aliases.ts | Refactors alias priority selection and treats deprecated auth choice IDs as aliases. |
| src/agents/provider-auth-aliases.test.ts | Verifies deprecated auth choice IDs resolve to canonical providers for auth scoping. |
| src/agents/provider-api-families.ts | Introduces shared helper to identify GPT Responses-family APIs. |
| src/agents/provider-api-families.test.ts | Tests API-family classification helper. |
| src/agents/pi-embedded-runner/run/message-merge-strategy.ts | Adds a registry for message merge strategy selection (default: orphan trailing user prompt merge). |
| src/agents/pi-embedded-runner/run/message-merge-strategy.test.ts | Tests default strategy resolution and override/restore behavior. |
| src/agents/pi-embedded-runner/run/attempt.ts | Uses message merge strategy, emits resolvedRef, and gates WS pooling eligibility on clean completion. |
| src/agents/pi-embedded-runner/run/attempt.test.ts | Expands orphaned-user prompt merge tests (structured media summarization, removal semantics). |
| src/agents/pi-embedded-runner/run/attempt.subscription-cleanup.ts | Extends WS session release to optionally allow pooling. |
| src/agents/pi-embedded-runner/run/attempt.spawn-workspace.context-engine.test.ts | Updates expectations for the new releaseWsSession call signature/options. |
| src/agents/pi-embedded-runner/run/attempt.prompt-helpers.ts | Enhances orphaned user prompt merging to stringify/summarize structured content safely (no base64/data-uri embedding). |
| src/agents/pi-embedded-runner/run.ts | Adds provider-owned auth profile preference seam to reorder auth profile candidates. |
| src/agents/pi-embedded-runner/run.overflow-compaction.harness.ts | Updates harness mocks for new provider-runtime exports. |
| src/agents/pi-embedded-runner/result-fallback-classifier.ts | Adds shared GPT-5 embedded-run result classifier for model-fallback eligibility. |
| src/agents/pi-embedded-runner/extra-params.ts | Adds transport-scoped provider param patching and centralizes GPT Responses-family API checks. |
| src/agents/pi-embedded-runner/aliases.test.ts | Validates embedded-runner compatibility aliases remain bound to PI exports. |
| src/agents/pi-embedded-runner-extraparams.test.ts | Adds coverage for codex-responses and transport extra-param seam composition. |
| src/agents/openai-ws-stream.ts | Adds disabled-by-default WS session pooling with idle eviction and lifecycle safeguards. |
| src/agents/openai-ws-stream.test.ts | Tests session pooling behavior behind explicit env gating. |
| src/agents/openai-transport-stream.ts | Centralizes strict-tool downgrade diagnostics and always normalizes tool parameters for Responses/completions. |
| src/agents/openai-transport-stream.test.ts | Adds assertions for tool-parameter normalization behavior when strict is omitted/downgraded. |
| src/agents/openai-tool-schema.ts | Adds strict schema diagnostics collection to support downgrade logging. |
| src/agents/model-fallback.ts | Adds optional result classification hook to treat “successful but unusable” results as fallback-eligible errors. |
| src/agents/model-fallback.test.ts | Tests model-fallback result classification behavior (continue fallback / surface terminal classification). |
| src/agents/harness/types.ts | Adds optional AgentHarness.classify? surface and classification type. |
| src/agents/gpt5-prompt-overlay.ts | Scopes OpenAI plugin personality fallback to OpenAI-family providers and threads providerId through overlay mode resolution. |
| src/agents/embedded-runner.ts | Adds neutral embedded-runner re-export surface. |
| src/agents/command/attempt-execution.ts | Uses canonical auth-provider resolution to decide whether to forward session auth profile overrides. |
| src/agents/command/attempt-execution.cli.test.ts | Adds test coverage for codex-cli auth alias forwarding of session-bound auth profile. |
| src/agents/cli-runner.ts | Emits resolvedRef in LLM output hook events for CLI runs. |
| src/agents/auth-profiles/session-override.ts | Validates stored session auth overrides against canonical auth-provider identity (alias-aware). |
| src/agents/auth-profiles/session-override.test.ts | Tests preserving session override when CLI provider aliases the stored profile provider. |
| src/agents/auth-profiles/order.ts | Falls back to canonical auth-provider key when resolving stored/configured auth order for alias providers. |
| src/agents/auth-profiles/order.test.ts | Adds coverage for canonical auth order resolution under aliased provider IDs. |
| src/agents/agent-command.ts | Makes auth profile provider validation alias-aware when checking stored override/provider mismatch. |
| src/agents/agent-command.live-model-switch.test.ts | Adds coverage ensuring aliased session auth profiles aren’t cleared for codex-cli runs. |
| extensions/msteams/src/channel.ts | Declares schema-to-action dependency for MSTeams message-tool schema (unpin). |
| extensions/msteams/src/actions.ts | Declares schema-to-action dependency for MSTeams message-tool schema (unpin). |
| extensions/matrix/src/actions.ts | Declares schema-to-action dependency for Matrix profile tool schema (set-profile). |
| extensions/codex/src/app-server/run-attempt.ts | Emits resolvedRef in LLM output hook events for Codex app server attempts. |
| docs/.generated/plugin-sdk-api-baseline.sha256 | Updates plugin SDK API baseline checksums to reflect the new exported surfaces. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 39d56f6bbf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
39d56f6 to
ab39479
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ab39479846
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
763da24 to
ec0572f
Compare
|
@steipete next gpt 5.4 |
ec0572f to
029f387
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 029f387738
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
029f387 to
53683a4
Compare
|
Addressed the remaining Greptile top-level/outside-diff P2 note in |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 436efcdfe8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
561efb2 to
ac5e05f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f72d93e8f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| extraParams: prepared, | ||
| thinkingLevel: params.thinkingLevel, | ||
| model: params.model, | ||
| transport: params.resolvedTransport ?? resolveSupportedTransport(prepared.transport), |
There was a problem hiding this comment.
Resolve hook transport from the effective session transport
resolvePreparedExtraParams falls back to prepared.transport when resolvedTransport is not provided, so provider extraParamsForTransport hooks can still see auto/undefined even when runtime settings have forced a concrete transport. This is observable on call sites that now use the hook but do not pass resolvedTransport (for example compaction flows), which means transport-gated patches can be skipped on those turns and request shaping diverges from normal attempt routing under explicit global/project transport settings.
Useful? React with 👍 / 👎.
Summary
Stacked follow-up to #70743. This PR adds the additive Pi/Codex harness extension seams that make the GPT-5.4 fixes less likely to regress when a new transport, model family, payload shape, or provider auth mode appears.
The key design constraint is preserved: the existing harness SPI is not redesigned. Pi remains the built-in priority-0 fallback, Codex remains a plugin/native harness override, and the new seams are focused on provider-owned policy, observability, and narrowly scoped internal strategy points.
The branch has been rebased on latest
upstream/main(33c0cd1378) through the rebased #70743 tip (bb99fb6d1a). The current #70772 tip is0abfc8ddc4.Stack Shape And Review Scope
Reviewers should read #70772 as the architecture/seam layer on top of #70743. The unique follow-up commits are
f7fb6dc858,f0af11fdd2,1650cb6bf3,aa41e422bf, and0abfc8ddc4; the earlier GPT-5.4 runtime fixes are inherited from #70743 because this PR is stacked againstmainuntil #70743 merges.Runtime Routing Map
Seam Matrix
AgentHarness.classify?okclassifications are annotated onto the attempt result and surfaced through run metadata so model fallback can consume them.extraParamsForTransportpatchafter model/transport resolution.agentDir/workspaceDirand can driveparallel_tool_callspayload injection.resolvePromptOverlayfollowupFallbackRouteorigin,dispatcher, ordrop.resolveAuthProfileIdMessageMergeStrategyllm_output.resolvedRefllm_outputevent still emits.openai-codex/gpt-5.4vsgpt-5.4backend ambiguity easier to debug without renaming every symbol.OPENCLAW_OPENAI_WS_POOL=1can retain clean sessions until idle TTL.Fallback Classification Sequence
sequenceDiagram participant H as Selected Harness participant S as runAgentHarnessAttemptWithFallback participant R as runEmbeddedPiAgent participant C as Shared result classifier participant MF as runWithModelFallback H-->>S: attempt result S->>H: classify?(result, ctx) alt classification is ok/undefined S-->>R: result + harness id else classification is empty/reasoning-only/planning-only S-->>R: result + harness id + classification R-->>MF: run result meta includes classification and toolSummary MF->>C: classify run result C-->>MF: FailoverError(format) when no side effects endTransport Param And Schema Flow
The helper is intentionally behavior-named. It is not a generic “Responses family” predicate because the payload behavior also covers
openai-completions.Message Repair And Follow-Up Routing
The merge strategy seam is intentionally internal right now. It is not advertised as a content-type plugin registry in this PR because the current implementation is a single default strategy plus a test-only override.
WS Pool Lifecycle
flowchart TD Start["OpenAI WS attempt"] --> Key["session id + request/url/headers + auth signature"] Key --> Existing{"matching live session?"} Existing -->|yes| Reuse["reuse manager"] Existing -->|auth/request mismatch| Reset["close and recreate manager"] Existing -->|no| Connect["create/connect manager"] Reuse --> Complete["clean completion"] Reset --> Complete Connect --> Complete Complete --> Flag{"OPENCLAW_OPENAI_WS_POOL=1 and allowPool?"} Flag -->|no| Close["release closes session"] Flag -->|yes| Idle["retain until idle TTL"] Idle -->|next matching turn| Reuse Idle -->|TTL expires| CloseThe pool remains disabled by default. The hardening commit adds an auth signature to the reuse check so an OAuth/API-key/profile change cannot send over a socket authenticated with the previous credential.
Compatibility And Explicit Non-Goals
classify?method is added and now consumed.pi-embedded-runnerpaths continue working.resolvedRefattempt.tssplit remains a later pure-move phase.Related Work And Issue Map
This PR is the architectural seam layer after #70743. It is deliberately tied to nearby GPT-5.4/Codex work without claiming unrelated fixes.
codex-clihelper/embedded resolution work. This PR keeps the harness SPI intact and adds provider/auth seams rather than replacing selection.llm_output.resolvedRefobservability.extraParamsForTransporthook can support future provider-owned reasoning/param patches.supportsGptParallelToolCallsPayload.resolveAuthProfileId.attempt.tsmonolith concerns. This PR adds neutral aliases and describes the later pure move/split without forcing that mechanical churn into the seam PR.llm_output.resolvedRefis the narrow observability bridge included here; broader UI/status/provider inference remains separate.Latest Validation
Post-rebase verification on the final branch:
upstream/main(33c0cd1378) through the [codex] Harden GPT-5.4 runtime paths #70743 maintainer-note fix commitbb99fb6d1a.node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/followup-runner.test.tspassed2files /71tests after the final current-main restack.node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/model-fallback.test.ts src/agents/harness/selection.test.ts src/agents/pi-embedded-runner-extraparams.test.ts src/agents/provider-api-families.test.ts src/agents/pi-embedded-runner/run/message-merge-strategy.test.ts src/agents/pi-embedded-runner/run/attempt.test.ts src/agents/auth-profiles/order.test.ts src/agents/auth-profiles.resolve-auth-profile-order.uses-stored-profiles-no-config-exists.test.ts src/agents/auth-profiles/session-override.test.ts src/agents/provider-auth-aliases.test.ts src/agents/command/attempt-execution.cli.test.ts src/agents/agent-command.live-model-switch.test.tspassed9files /328tests after the final current-main restack.git diff --checkand thesrc/agents/embedded-runner.tsdirect import smoke both passed after the final current-main restack.node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/followup-runner.test.tspassed2files /71tests after the final restack on [codex] Harden GPT-5.4 runtime paths #70743.git diff --checkpassed after the final restack.node --import tsx -e "const m = await import('./src/agents/embedded-runner.ts'); if (typeof m.runEmbeddedAgent !== 'function') throw new Error('missing runEmbeddedAgent'); console.log('embedded-runner import ok')"passed after the final restack.pnpm plugin-sdk:api:checkpassed.git diff --checkpassed.node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/model-fallback.test.ts src/agents/harness/selection.test.ts src/agents/pi-embedded-runner-extraparams.test.ts src/agents/provider-api-families.test.ts src/agents/pi-embedded-runner/run/message-merge-strategy.test.ts src/agents/pi-embedded-runner/run/attempt.test.ts src/agents/auth-profiles/order.test.ts src/agents/auth-profiles.resolve-auth-profile-order.uses-stored-profiles-no-config-exists.test.ts src/agents/auth-profiles/session-override.test.ts src/agents/provider-auth-aliases.test.ts src/agents/command/attempt-execution.cli.test.ts src/agents/agent-command.live-model-switch.test.tspassed9files /328tests.node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/openai-ws-stream.test.tspassed1file /109tests.node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/followup-runner.test.tspassed1file /25tests.node scripts/run-vitest.mjs run --config test/vitest/vitest.e2e.config.ts src/agents/pi-embedded-runner.run-embedded-pi-agent.auth-profile-rotation.e2e.test.tspassed1file /27tests.node scripts/run-vitest.mjs run --config test/vitest/vitest.e2e.config.ts src/agents/model-fallback.run-embedded.e2e.test.tspassed1file /17tests.node --import tsx -e "const m = await import('./src/agents/embedded-runner.ts'); if (typeof m.runEmbeddedAgent !== 'function') throw new Error('missing runEmbeddedAgent'); console.log('embedded-runner import ok')"passed.Known local non-blocker:
pnpm tsgo:core:testcurrently fails before this PR's shim boundary on existing compat/dependency errors (supportsLongCacheRetentiontype shape,@vincentkoc/qrcode-tui, and related generated model compat typing). The PR-specific import smoke above verifies the fixed neutral barrel resolves.Earlier seam-specific verification also included:
node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/openai-ws-stream.test.tspassed106tests.node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/pi-embedded-runner/run/attempt.test.ts --reporter=dotpassed119tests.node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/pi-embedded-runner.run-embedded-pi-agent.auth-profile-rotation.e2e.test.ts src/agents/provider-auth-aliases.test.ts src/agents/command/attempt-execution.cli.test.ts src/agents/agent-command.live-model-switch.test.tspassed16tests.vitest.unit-fast.config.tstest-project shard after 382s of no output. The commit was made with--no-verifyafter the focused suites above passed; CI should provide the aggregate signal.Bot And Adversarial Review Follow-Up
The current stack addresses the #70772 bot/adversarial review findings:
961567766apreserves user-lockedopenai-codexauth profiles acrosscodex-cliembedded-runner alias checks, with regression coverage in the auth-profile rotation e2e test.bf8be4c910suppresses fallback retries after generic tool execution, so empty GPT-5 terminal states do not replay side-effectful tool turns on another model.a6ef146586completes that guard by propagatingtoolSummarythrough all relevant terminal result branches.a6ef146586flips GPT-5 OpenAI WS warm-up default tofalse, matching the original Phase 0 stability plan while leaving explicit opt-in intact.10b74a4459addresses fresh bot review by keeping strippedNO_REPLYterminal turns out of fallback and preserving explicit empty auth-order overrides, including exact alias keys such ascodex-cli: [].f73022e4f4addresses fresh follow-up routing review by emitting a visible partial-delivery notice when any cross-channel payload fails, even if another payload in the same completion routes successfully.b6dd417712and37b0d9f549address fresh runtime-config auth-scope review by passing the execution config into fallback persistence and requiring auth-scope callers to pass an explicit execution config.35f7c348e9updates the rebased CLI attempt-execution test mock for upstream's provider auth alias-map export.bb99fb6d1aresponds to the maintainer GPT-5.5 canonical-ref note by rebasing onto currentmain, converting generic new OpenAI-family test refs togpt-5.5, and documenting remaininggpt-5.4/codex-clirefs as intentional regression or legacy-compat coverage.f0af11fdd2removes public mutable message-merge strategy registration and keeps override registration test-only.1650cb6bf3documents theremoveLeaf: falseorphan-merge contract so preserved leaves are treated as an explicit consecutive-user-turn risk, not an implicit provider-safe default.f0af11fdd2fixes misleading orphan-repair log wording for preserved leaves.f0af11fdd2renames the misleading Responses-family helper to behavior-basedsupportsGptParallelToolCallsPayload.f0af11fdd2wiresAgentHarness.classify?into fallback-visible metadata.f0af11fdd2makes transport hookparallel_tool_callspatches effective in request payload wrapping.f0af11fdd2forwardsagentDirandworkspaceDirinto extra-param provider hook contexts.f0af11fdd2prevents pooled/reused OpenAI WS sessions from crossing auth/API-key boundaries.aa41e422bfupdates pinned-profile auth-rotation e2e coverage to assert the current visible-error behavior while preserving the no-rotation guarantee.0abfc8ddc4fixes the neutralsrc/agents/embedded-runner.tsbarrel to re-export from the existingpi-embedded-runner.jscompatibility module.Original Plan Coverage
This PR intentionally covers the additive-seam portion of the Pi/Codex Harness plan, not the later pure move work.
llm_output.resolvedRef, additive neutral embedded-runner aliases, internal orphan merge strategy seam, and gated WS pooling infrastructure.src/agents/embedded-runner/directory move, fullattempt.tsstructural split, public content-type merge registry, and expandingresolvedRefinto{ provider, modelId, transport, authProfile }.