🐛 fix openai codex reasoning and thinking - openai/gpt-5.1-codex currently breaks with 400#61982
Conversation
Greptile SummaryAdds Confidence Score: 5/5Safe to merge — the fix is additive, matches established patterns, and is well-tested. The only remaining finding is P2 (duplicated mapper function, already acknowledged in the PR description). The wrapper chain ordering is correct: as the outermost wrapper, createOpenAIThinkingLevelWrapper's patchPayload runs after inner patches, giving it the final say on reasoning.effort. Tests are comprehensive. No files require special attention. Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/pi-embedded-runner/openai-stream-wrappers.ts
Line: 220-230
Comment:
**Duplicated mapper function**
The new `mapThinkingLevelToReasoningEffort` (lines 220–230) is byte-for-byte identical to `mapThinkingLevelToOpenRouterReasoningEffort` in `proxy-stream-wrappers.ts` (same `"off" → "none"`, `"adaptive" → "medium"`, pass-through mapping). The PR already acknowledges this and proposes a follow-up cleanup. Worth flagging so it doesn't get forgotten — extracting a shared util to `openai-stream-wrappers.ts` (or a new `reasoning-effort-utils.ts`) and re-exporting from `proxy-stream-wrappers.ts` would eliminate the drift risk.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Merge branch 'main' into hotfix/openai-c..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d248d450fc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c203b479c6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
c203b47 to
2f8ea2e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2f8ea2ecf8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
5182e9c to
6d9aac3
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5f622af5ed
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c188699452
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c188699452
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex, review |
fddcc01 to
88a7044
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. Swish! ℹ️ About Codex in GitHubCodex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback". |
|
Please add |
|
@G7CNF Done 💪🏻 |
2e35fa2 to
6632171
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6632171fca
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| (existingReasoning as Record<string, unknown>).effort = | ||
| mapThinkingLevelToReasoningEffort(thinkingLevel); |
There was a problem hiding this comment.
Reapply OpenAI effort compat before overriding reasoning
This assignment writes mapThinkingLevelToReasoningEffort(thinkingLevel) straight into payload.reasoning.effort, which bypasses the existing OpenAI normalization/compat path. In practice, /think minimal now emits "minimal" instead of the normalized "low", and models with compatibility remaps (for example low→medium for Codex mini) can be overwritten back to unsupported values, causing request 400s that the transport layer currently avoids. Please run the same OpenAI effort normalization/model-compat mapping before setting payload.reasoning.effort in this wrapper.
Useful? React with 👍 / 👎.
6632171 to
ceddfe2
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ceddfe2901
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| (existingReasoning as Record<string, unknown>).effort = | ||
| mapThinkingLevelToReasoningEffort(thinkingLevel); |
There was a problem hiding this comment.
Preserve model effort remap before setting reasoning.effort
createOpenAIThinkingLevelWrapper writes the raw /think mapping directly into payload.reasoning.effort, which overrides the model-specific compatibility remap already applied in buildOpenAIResponsesParams (e.g. minimal|low -> medium for gpt-5.1-codex-mini / gpt-5.4-mini). On those models, /think low can now send "low" again and trigger OpenAI 400s that the transport remap path was avoiding. Fresh evidence in this commit: the new openai-stream-wrappers.test.ts case "passes through all thinking levels correctly" asserts passthrough of unsupported levels for Codex models instead of compatibility-normalized values.
Useful? React with 👍 / 👎.
ceddfe2 to
28b91ba
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 28b91ba7eb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (thinkingLevel === "off") { | ||
| if (existingReasoning !== undefined) { | ||
| delete payloadObj.reasoning; | ||
| } |
There was a problem hiding this comment.
Clear reasoning include when disabling thinking payload
In /think off mode this branch deletes payload.reasoning but does not clear include, so OpenAI Responses requests can retain include: ["reasoning.encrypted_content"] from buildOpenAIResponsesParams (src/agents/openai-transport-stream.ts) while reasoning is absent. That produces an inconsistent payload shape and can trigger 400s on strict endpoints or silently keep reasoning-specific response plumbing active even though thinking was explicitly disabled.
Useful? React with 👍 / 👎.
| const distRuntimeAddedPaths = distRuntimeAddedPathSummary.topologyAddedPaths.length; | ||
| const distRuntimeIgnoredDependencyAddedPaths = | ||
| distRuntimeAddedPathSummary.ignoredDependencyAddedPaths.length; | ||
| const distRuntimeFileGrowth = distRuntimeAddedPaths; |
There was a problem hiding this comment.
Measure dist-runtime file growth from file counts
This now sets distRuntimeFileGrowth from added topology path count, but diff.added is built from listTreeEntries (same file), which includes directories and root entries, not just files. The distRuntimeFileGrowth gate can therefore overcount and fail on directory-only changes or rename churn where net file growth is unchanged. Use snapshot file deltas (or filter added entries to files) before comparing against distRuntimeFileGrowthMax.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aaee702fc8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| distRuntimeAddedPaths === 0 | ||
| ? 0 | ||
| : post.distRuntime.apparentBytes - pre.distRuntime.apparentBytes; |
There was a problem hiding this comment.
Keep byte-growth checks for rewritten dist-runtime files
This change zeroes distRuntimeByteGrowth whenever no new topology paths are added, so in-place rewrites of existing dist-runtime files (same path set, larger files) will bypass distRuntimeByteGrowthMax entirely. In that scenario the watch-regression gate can silently pass substantial dist-runtime churn that previously would have been caught by the snapshot byte delta.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0de8d8743f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| export function createBundledRuntimeDependencyInstallArgs(missingSpecs) { | ||
| return ["install", "--ignore-scripts", ...missingSpecs]; |
There was a problem hiding this comment.
Keep dev deps out of eager bundled-runtime installs
createBundledRuntimeDependencyInstallArgs now emits only npm install --ignore-scripts <specs>, removing the previous dev-omit behavior. In runBundledPluginPostinstall, that install runs with cwd: packageRoot, so when OPENCLAW_EAGER_BUNDLED_PLUGIN_DEPS=1 is enabled on packaged installs, npm can reify the root manifest and pull devDependencies in addition to the missing runtime specs, which can massively increase install size/time and fail in restricted or offline environments; this path should keep an explicit dev omit (--omit=dev or equivalent env).
Useful? React with 👍 / 👎.
0de8d87 to
782bc55
Compare
782bc55 to
40462cb
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 40462cb049
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| export function createBundledRuntimeDepsInstallArgs(missingSpecs: readonly string[]): string[] { | ||
| return ["install", "--ignore-scripts", ...missingSpecs]; |
There was a problem hiding this comment.
Keep dev deps excluded in bundled runtime dep installs
createBundledRuntimeDepsInstallArgs now runs npm install --ignore-scripts <specs> without any dev-omit flag, and installBundledRuntimeDeps executes that in each plugin root. On npm defaults (omit=[]), this can reify plugin devDependencies alongside the missing runtime specs, which significantly increases startup install size/time and can fail in constrained/offline environments when ensureBundledPluginRuntimeDeps runs during plugin load. Please restore an explicit dev exclusion for this install path.
Useful? React with 👍 / 👎.
|
Reviewed/reworked before landing. Key changes:
Validation:
Thanks @a-tokyo! |
|
Landed via rebase on Source head: Thanks again @a-tokyo. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c7f3958a2b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| .toSorted((left, right) => left.localeCompare(right)); | ||
| const missingSpecs = deps | ||
| .filter((dep) => !fs.existsSync(path.join(installRoot, dependencySentinelPath(dep.name)))) | ||
| .filter((dep) => !hasDependencySentinel(dependencySearchRoots, dep)) |
There was a problem hiding this comment.
Check runtime dep versions before skipping plugin install
The new missingSpecs check now treats any parent node_modules/<dep>/package.json as “installed” via hasDependencySentinel, but it never validates that the resolved version satisfies the plugin’s declared dep.version. In the case where dist/node_modules or package-root node_modules contains the same package name at an incompatible version, ensureBundledPluginRuntimeDeps will skip installation and the plugin will run against that incompatible dependency. Please compare resolved versions (or semver-satisfy against dep.version) before considering parent-root dependencies as satisfied.
Useful? React with 👍 / 👎.
|
@steipete Awesome, Thank you! ✨ |
Closes #62045
Closes #64069
Summary
openai-responses-defaultsstream family never mapsctx.thinkingLeveltoreasoning.effortin the API payload. For models likegpt-5.1-codexandgpt-5.3-codexthat only accept specific effort values (e.g."medium"), the missing mapping causesbuildOpenAIResponsesParamsto fall back toreasoning: { effort: "none" }, which the API rejects with HTTP 400./thinkcommand has no effect on OpenAI providers despite working correctly for OpenRouter and Kilocode.createOpenAIThinkingLevelWrapper(mirrors the existingcreateOpenRouterWrapperpattern) and wired it as the innermost wrapper in theopenai-responses-defaultsstream family chain — itsonPayloadfires first, overriding the SDK's fallback before outer wrappers run. Gated byshouldApplyOpenAIReasoningCompatibility(model)(same capability check as the compat wrapper) so proxy routes and non-OpenAI endpoints are excluded. The wrapper only mutates an existingreasoningobject — it never creates one from scratch, so non-reasoning models (which have noreasoningin the payload) are left untouched. Extracted sharedmapThinkingLevelToReasoningEfforttoreasoning-effort-utils.tsto eliminate duplication with the OpenRouter/Kilocode wrapper. Added unit tests.buildOpenAIResponsesParamsfallback (reasoning: { effort: "none" }) is untouched — the wrapper overrides it viaonPayload. No PI SDK changes. No config schema changes.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause (if applicable)
openai-responses-defaultsstream family inprovider-stream-family.ts(and the duplicate inprovider-stream.ts) hasctx.thinkingLevelavailable on the context but never passes it to any wrapper.createOpenAIReasoningCompatibilityWrapperonly applies payload policy (store mode), not reasoning effort injection. By contrast, theopenrouter-thinkingandkilocode-thinkingfamilies correctly extractctx.thinkingLeveland pass it to their wrappers which callnormalizeProxyReasoningPayload.openai-responses-defaultsfamily propagatesthinkingLevelto the API payload'sreasoning.effortfield.openai-codex) usesbuildProviderStreamFamilyHooks("openai-responses-defaults")— the same family as the regularopenaiprovider. Whengpt-5.1-codexwas added with restricted reasoning effort support (only"medium"), the missing mapping became a hard failure instead of a silent default.Regression Test Plan (if applicable)
src/agents/pi-embedded-runner/openai-stream-wrappers.test.tscreateOpenAIThinkingLevelWrapperis given athinkingLeveland a reasoning-capable model (payload already hasreasoning), the resultingStreamFnoverridesreasoning.effortviaonPayload. AllThinkLevelvalues map correctly ("off"→"none","adaptive"→"medium", others pass through). Existingreasoning.effortvalues from upstream wrappers are overridden. Other reasoning properties (e.g.summary) are preserved. Non-reasoning models (noreasoningin payload) are left untouched. Proxy routes and completions API are skipped via capability gate.pi-embedded-runner-extraparams.test.tstests the OpenAI wrapper chain but did not assertreasoning.effortinjection fromthinkingLevel. Updated to wire the new wrapper into the test helper.openai-stream-wrappers.test.ts).User-visible / Behavior Changes
/think <level>now correctly propagates to OpenAI and OpenAI Codex models. Previously it had no effect — the API payload always used thebuildOpenAIResponsesParamsfallback.gpt-5.1-codexthat only accept specific reasoning effort values will no longer fail with HTTP 400 when the resolved thinking level matches a supported value.Diagram (if applicable)
Security Impact (required)
Repro + Verification
Environment
openai-codex/gpt-5.1-codexagent.model: "openai/gpt-5.1-codex"Steps
agent.modeltoopenai/gpt-5.1-codex(or any Codex model with restricted reasoning effort)Expected
reasoning: { effort: "medium" }(or the user's/thinklevel)Actual (before fix)
reasoning: { effort: "none" }(hardcoded fallback)'none' is not supported with the 'gpt-5.1-codex' model. Supported values are: 'medium'Evidence
Gateway log before fix:
Human Verification (required)
thinkingLevelundefined (wrapper is a no-op, returns underlying directly),thinkingLevel"off" (maps to "none"),thinkingLevel"adaptive" (maps to "medium"), existingreasoningobject with other properties preserved, noreasoningobject in payload (wrapper is a no-op — non-reasoning models are safe), proxy routes with custombaseUrl(skipped via capability gate), completions API (skipped).pnpm buildandpnpm test(build is very slow on this machine). Did not verify WebSocket transport path end-to-end (only traced code). Did not verify with livegpt-5.1-codexAPI call post-fix.Review Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Risks and Mitigations
reasoning.efforton any existingreasoningobject.onPayloadfires first. Outer wrappers (createOpenAIReasoningCompatibilityWrapper,createOpenAIResponsesContextManagementWrapper) run after and can still strip or adjust based on endpoint policy. The sameshouldApplyOpenAIReasoningCompatibilitycapability gate excludes proxy routes and non-OpenAI endpoints. If a model doesn't support the requested effort level, the API returns a clear error (same behavior as OpenRouter).reasoningpayload.reasoningobject — it never creates one. The SDK only addsreasoningfor models withmodel.reasoning === true, so non-reasoning models are inherently excluded.mapThinkingLevelToReasoningEffortis shared across OpenAI and OpenRouter/Kilocode wrappers viareasoning-effort-utils.ts.openai-stream-wrappers.tsandproxy-stream-wrappers.tsimport from the shared util.