fix(shell): preserve lenient tokenization in chain segments by esengine · Pull Request #234 · esengine/reasonix

esengine · 2026-05-05T00:08:06Z

Follow-up to #233. Fixes the regression flagged in the PR description.

Problem

#233 used shell-quote for chain segmentation. shell-quote follows POSIX strictly, so it tokenizes cargo run -- --flag=1&2 as three tokens with & flagged as a background operator and rejects the call. Pre-#233, tokenizeCommand (lenient by design) treated --flag=1&2 as one literal argv.

That leniency is a deliberate, long-standing choice in this codebase — see the existing detectShellOperator test that documents it explicitly. shell-quote pulls in the opposite direction; mixing the two means fighting one with the other.

Fix

Drop shell-quote entirely. Replace the chain splitter with splitOnChainOps (~50 lines): a whitespace-bounded walker that mirrors detectShellOperator's convention — a chain operator only counts when it begins a whitespace-separated token. Each segment then runs through the existing tokenizeCommand (lenient) and detectShellOperator (rejects >, <, &, 2>&1, &>).

This way the chain layer is consistent with the rest of src/tools/shell.ts: same primitives, same lenient philosophy, same UX for model output that isn't perfectly POSIX.

What recovers

cargo run -- --flag=1&2                      # runs (was rejected post-#233)
git status ; cargo run -- --flag=1&2         # chain runs, second segment keeps &
grep a|b file                                # runs as one segment with regex `a|b`
echo a;b                                     # runs as `echo a;b` (no chain)

What changes

git status|grep main (no spaces around |) is no longer treated as a chain. It was a shell-quote-only side effect — never worked pre-feat(shell): support |, &&, ||, ; chain operators in run_command #233. Aligns the chain detector with detectShellOperator.
$VAR and $(…) are no longer rejected. They pass through as literal argv (since we don't invoke a shell). Same as the existing single-command tokenizer's behavior. Models see $HOME in echo's output and learn.

Why not "keep shell-quote + post-process"

The post-process layer would need to walk the original string, find each operator's position, check whether it's at a whitespace boundary, and re-glue tokens that aren't. That's ~30 lines of fight-the-library code on top of the library, with more edge cases than just doing the splitting ourselves. The hand-rolled splitOnChainOps is the same complexity tier as the existing tokenizeCommand (also hand-rolled) and consistent with the project's lenient philosophy.

Tests

New parser tests: does NOT split on chain chars embedded inside larger tokens, allows embedded chain chars inside chain segments, rejects redirects/background discovered inside a chain segment, rejects empty middle segments, rejects unclosed quotes inside a chain segment.
Removed: shell-quote-only $VAR and $(…) rejection tests.
105 shell-related tests pass; 2197 total.

…t tokenization Phase 1 (#233) used shell-quote for chain segmentation, but its strict POSIX tokenization split `--flag=1&2` into three tokens with `&` as a background operator — a regression vs. pre-Phase-1 behavior, where the existing lenient `tokenizeCommand` left embedded `&`/`|`/`;` chars as literal bytes inside larger tokens. Replace with a small whitespace-bounded chain splitter that matches the existing `detectShellOperator` convention: chain ops only count when they begin a whitespace-separated token. Each segment then runs through the existing `tokenizeCommand` (lenient) and `detectShellOperator` (rejects `>`, `<`, `&`, `2>&1`, `&>`). Recovers: - `cargo run -- --flag=1&2` — runs (was rejected post-#233) - `git status ; cargo run -- --flag=1&2` — chain runs, second segment keeps `&` Drops: - `git status|grep main` (no spaces) — never worked pre-Phase-1, was a shell-quote-only side effect; align with the project's whitespace-bounded operator convention. Also drops `$VAR` / `$(...)` rejection — these now pass through as literal arguments, matching the existing single-command tokenizer's behavior. Models see the literal `$HOME` in echo's output and learn.

* feat(shell): support file redirects in run_command (`>`, `>>`, `<`, `2>`, `2>>`, `2>&1`, `&>`) Phase 2 of the chain-aware run_command rewrite. Each segment now parses trailing redirects via a single-pass walker (no shell), opens files with fs.openSync, and hands the resulting fds to spawn's stdio array. Pipe wiring still applies — `<` overrides pipe-stdin only when explicitly present; `>`/`>>`/`&>` override the pipe-out / capture-buffer wiring. Supported per segment (max one per fd): - `>` stdout truncate - `>>` stdout append - `<` stdin from file - `2>` stderr truncate - `2>>` stderr append - `2>&1` merge stderr → stdout (file fd, pipe, or our buffer, whatever stdout points to) - `&>` both → file (truncate) Targets resolve relative to the chain's cwd. Multiple redirects to the same fd are rejected with a clear conflict message. ChainSegment now carries `redirects: Redirect[]` alongside `argv`. The old `tokenizeCommand + detectShellOperator` per-segment combo is replaced with `parseSegment`, a quote-aware single-pass walker that emits both argv tokens and redirects at the same level — keeps the lenient embedded-`&`/`|` handling intact (the `cmd1 ; cmd2 -flag=1&2` regression fix from #234 still passes). Heredoc `<<` is rejected explicitly. Background `&` is rejected explicitly (use run_background, not run_command). `$VAR` and `$(…)` pass through as literal characters — no shell expansion, matching the existing single-command tokenizer. Adds 30 new tests in tests/shell-redirects.test.ts covering parser output and execution; updates the obsolete redirect-rejection tests in tests/shell-tools.test.ts to cover the new rejection set (background `&`, heredoc `<<`, missing target, multi-redirect-per-fd). * fix(shell): hold child.stdin open until both prev streams end on mid-pipe `2>&1` `cmd1 2>&1 | cmd2` was wiring prev.stdout into child.stdin with default end-on-source-end, then piping prev.stderr in with `end: false`. When stdout finished first, child.stdin closed and any subsequent stderr writes hit a closed stream — output got truncated. Both branches now pipe with `end: false` and a 2-counter closes stdin only after both source streams emit `end`. Adds a test that pipes stderr through to a downstream cat-like process and asserts both streams reach the destination.

`run_command` learns the four common shell chain operators (`|`, `||`, `&&`, `;`) and the seven file redirect operators (`>`, `>>`, `<`, `2>`, `2>>`, `2>&1`, `&>`). Parsed and spawned natively — no shell is invoked, so semantics are identical on Windows / macOS / Linux; PowerShell 5.1's `&&` parse error and the object-vs-bytes pipe gap are sidestepped. Each chain segment is allowlist-checked independently: `git status | grep main` now auto-runs when both halves are individually allowed, the granular-approval behaviour wviana asked for in discussion #231. The chain parser stays consistent with the project's long-standing lenient tokenizer — `cargo run -- --flag=1&2` and similar embedded- operator args stay literal instead of getting POSIX-strict-rejected. shell-quote is no longer a dependency; `splitOnChainOps` is whitespace- bounded like the existing `detectShellOperator`. Mid-pipe `2>&1` correctly merges stderr into the next segment's stdin without truncating on stdout-end (counter-based close-when-both-ended). Targets resolve relative to the project root. Heredoc `<<` and background `&` remain explicitly rejected with clear errors. `\$VAR` and `\$(...)` pass through as literal characters — no shell expansion. Closes #232. Driven by discussion #231. PRs: #233 (chain ops), #234 (lenient-tokenizer regression fix), #235 (redirects + mid-pipe `2>&1`).

esengine merged commit 56b94d0 into main May 5, 2026
1 check passed

esengine deleted the fix/chain-embedded-operators branch May 5, 2026 00:10

esengine mentioned this pull request May 5, 2026

feat(shell): support file redirects in run_command #235

Merged

esengine mentioned this pull request May 5, 2026

Support command piping and chaining via split-and-spawn #232

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(shell): preserve lenient tokenization in chain segments#234

fix(shell): preserve lenient tokenization in chain segments#234
esengine merged 1 commit intomainfrom
fix/chain-embedded-operators

esengine commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esengine commented May 5, 2026

Problem

Fix

What recovers

What changes

Why not "keep shell-quote + post-process"

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant