Skip to content

[codex] Preserve raw code-mode exec output by default#23564

Merged
aibrahim-oai merged 11 commits into
mainfrom
aibrahim/code-mode-exec-raw-output
May 20, 2026
Merged

[codex] Preserve raw code-mode exec output by default#23564
aibrahim-oai merged 11 commits into
mainfrom
aibrahim/code-mode-exec-raw-output

Conversation

@aibrahim-oai
Copy link
Copy Markdown
Collaborator

@aibrahim-oai aibrahim-oai commented May 19, 2026

Why

Code mode can use nested unified exec calls as data sources. When those calls omit max_output_tokens, code mode should receive raw command output so the script can parse or summarize it itself. When code mode does provide max_output_tokens, that explicit nested budget should be respected, including values above the default unified exec limit, rather than being capped before code mode sees the result.

What

  • Preserve direct unified exec truncation behavior, while letting code-mode exec/write_stdin keep max_output_tokens as None unless explicitly supplied.
  • Make code-mode tool results use raw output when no explicit limit is present, and use the explicit nested limit directly when one is specified.
  • Refactor unified exec output formatting so truncated_output takes the caller-selected token budget.
  • Add e2e integration coverage for explicit nested exec limits, omitted nested exec limits, outer exec limit propagation, omitted-limit outputs that exceed both the default and a small truncation policy, explicit nested limits above those caps, and high explicit limits that still compact larger command output.
  • Reuse the code-mode turn setup helper while directly asserting the exact exec output item in each test.

Testing

  • just fmt
  • git diff --check
  • Not run locally per repo guidance; CI should validate the e2e integration tests.

@aibrahim-oai aibrahim-oai marked this pull request as ready for review May 19, 2026 22:28
@aibrahim-oai aibrahim-oai requested a review from a team as a code owner May 19, 2026 22:28
@aibrahim-oai
Copy link
Copy Markdown
Collaborator Author

@codex review this

@chatgpt-codex-connector
Copy link
Copy Markdown
Contributor

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread codex-rs/core/tests/suite/code_mode.rs Outdated
"#
);

assert_code_mode_exec_output_snapshot(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just assert equals :D

let req = second_mock.single_request();
let items = custom_tool_output_items(&req, "call-1");
assert_eq!(items.len(), 2);
assert_regex_match(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep this style of assertion? for all new tests?

As in capture exactly what the model will see

Comment thread codex-rs/core/src/tools/context.rs Outdated
original_token_count: self.original_token_count,
output: self.truncated_output(),
output: match self.max_output_tokens {
Some(_) => self.truncated_output(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we

  1. have upper bound even for exec
  2. allow exec to request higher limit than the model defaut?

Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 48c903e48a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +387 to +389
output: match self.max_output_tokens {
Some(max_tokens) => self.truncated_output(max_tokens),
None => String::from_utf8_lossy(&self.raw_output).to_string(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve truncation notice for oversized raw exec output

When a code-mode exec omits max_output_tokens, this now returns raw_output verbatim, but the bytes are still sourced from the unified-exec HeadTailBuffer, which caps retained output at 1 MiB and drops the middle for a fast/short-lived command that emits more than that before the collector drains it. In that scenario code mode receives a head+tail concatenation with no …truncated… marker, so scripts can parse or summarize corrupted data as if it were complete; the previous formatted path at least surfaced that the result was truncated.

Useful? React with 👍 / 👎.

let (_test, second_mock) = run_code_mode_turn_with_config(
&server,
"use exec_command from code mode",
r#"// @exec: {"max_output_tokens": 20000}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test for exec_command compaction high but working?

@aibrahim-oai aibrahim-oai enabled auto-merge (squash) May 20, 2026 03:53
@aibrahim-oai aibrahim-oai merged commit 5a4202a into main May 20, 2026
31 checks passed
@aibrahim-oai aibrahim-oai deleted the aibrahim/code-mode-exec-raw-output branch May 20, 2026 04:02
@github-actions github-actions Bot locked and limited conversation to collaborators May 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants