fix(lexer): backticks opaque when content is invalid#71
Merged
Conversation
Bash treats the body of `` `...` `` as a single word token at the initial lexing stage — errors inside the backtick are runtime, not parse, concerns. Rable's fork-and-merge parser rejects bodies with reserved words in non-reserved positions, unbalanced conditionals, or unterminated ANSI-C quotes, producing `<error>` output or silently corrupting the outer AST where bash yields a plain word. Add `scan_backtick_opaque` as a fallback in `read_backtick_inner`: on `Err` from `parse_backtick_body`, re-scan the body as raw bytes, treating `\<x>` as a two-byte escape so `` \` `` does not terminate. The `Ok` path is unchanged, so valid backticks still produce the same tokens and spans as today. Unlocks all 6 `backtick_opaque N` cases in `bash_valid_divergences`. Closes #38
mpecan
pushed a commit
that referenced
this pull request
Apr 19, 2026
🤖 I have created a release *beep* *boop* --- ## [0.2.0](rable-v0.1.15...rable-v0.2.0) (2026-04-18) ### ⚠ BREAKING CHANGES * tighten lexer API surface and relocate WordSpan to ast ([#70](#70)) ### Bug Fixes * **format:** align cmdsub reformatter with bash canonical form ([#49](#49)) ([c7a4411](c7a4411)) * **lexer:** accept sloppy heredoc terminator in cmdsub mode ([#50](#50)) ([40f394f](40f394f)) * **lexer:** backticks opaque when content is invalid ([#71](#71)) ([e72166f](e72166f)), closes [#38](#38) * **lexer:** disable reserved-word recognition after assignment words ([#44](#44)) ([42e1fc0](42e1fc0)) * **lexer:** stop treating ]] and unbalanced [...] as special outside conditionals ([#45](#45)) ([4bf5a5c](4bf5a5c)) * **parser:** fall back from (( … )) arith to nested subshells ([#48](#48)) ([1437f00](1437f00)) ### Code Refactoring * **format:** introduce Formatter struct ([#65](#65)) ([d965a8f](d965a8f)) * **lexer:** drop Result<Token> wrapper from operator readers ([#62](#62)) ([d52a841](d52a841)) * **lexer:** split read_word_token into classify + advance + dispatch helpers ([#63](#63)) ([3ba09f5](3ba09f5)) * **parser:** extract fill_heredoc_contents visitor helpers ([#68](#68)) ([40e6165](40e6165)) * **parser:** extract helpers from three oversize parsers ([#69](#69)) ([25d0762](25d0762)) * **sexp:** dispatch NodeKind Display to per-category helpers ([#66](#66)) ([44b0330](44b0330)) * **sexp:** table-drive ANSI-C escape dispatch ([#67](#67)) ([91a5267](91a5267)) * tighten lexer API surface and relocate WordSpan to ast ([#70](#70)) ([5171d01](5171d01)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: repository-butler[bot] <166800726+repository-butler[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bash treats the body of
`...`as a single word token at the initial lexing stage — errors inside the backtick body are runtime errors, not parse errors. Rable's fork-and-merge parser was rejecting bodies with reserved words in non-reserved positions, unbalanced conditionals, or unterminated ANSI-C quotes — producing<error>output or (worse) silently corrupting the outer AST where bash would produce a plain word.This adds a
scan_backtick_opaquefallback inLexer::read_backtick_inner: whenparse_backtick_bodyerrors, re-scan the body as raw bytes, treating\<x>as a two-byte escape so\`does not terminate. TheOkpath is unchanged — valid backticks still produce identical tokens and spans.Behavior change
echo "`if true then echo a; fi` until "unterminated double quotex=`fo<>r i in a b; do echo $i; if done`unterminated backticke else cho ` else echo "hello"`Test plan
cargo fmtcargo clippy --all-targets -- -D warnings— cleancargo test— 258 passed (was 256; +2 new edge-case tests)backtick_opaque Ncases intests/oracle/bash_valid_divergences.testsremoved fromKNOWN_ORACLE_FAILURESand now pass`else echo) still surfaceMatchedPairerrors`echo hi`) still go through the fork path —span_backticktest unchangedCloses #38