Skip to content

fix(lexer): backticks opaque when content is invalid#71

Merged
mpecan merged 1 commit into
mainfrom
feat/38-backtick-opaque
Apr 18, 2026
Merged

fix(lexer): backticks opaque when content is invalid#71
mpecan merged 1 commit into
mainfrom
feat/38-backtick-opaque

Conversation

@mpecan
Copy link
Copy Markdown
Owner

@mpecan mpecan commented Apr 18, 2026

Summary

Bash treats the body of `...` as a single word token at the initial lexing stage — errors inside the backtick body are runtime errors, not parse errors. Rable's fork-and-merge parser was rejecting bodies with reserved words in non-reserved positions, unbalanced conditionals, or unterminated ANSI-C quotes — producing <error> output or (worse) silently corrupting the outer AST where bash would produce a plain word.

This adds a scan_backtick_opaque fallback in Lexer::read_backtick_inner: when parse_backtick_body errors, re-scan the body as raw bytes, treating \<x> as a two-byte escape so \` does not terminate. The Ok path is unchanged — valid backticks still produce identical tokens and spans.

Behavior change

Input Before After (bash)
echo "`if true then echo a; fi` until " unterminated double quote word preserved verbatim
x=`fo<>r i in a b; do echo $i; if done` unterminated backtick word preserved verbatim
e else cho ` else echo "hello"` backtick silently dropped, inner tokens leaked to outer command 4-word command matching bash

Test plan

  • cargo fmt
  • cargo clippy --all-targets -- -D warnings — clean
  • cargo test — 258 passed (was 256; +2 new edge-case tests)
  • All 6 backtick_opaque N cases in tests/oracle/bash_valid_divergences.tests removed from KNOWN_ORACLE_FAILURES and now pass
  • Unterminated bodies (e.g. `else echo) still surface MatchedPair errors
  • Valid backticks (`echo hi`) still go through the fork path — span_backtick test unchanged

Closes #38

Bash treats the body of `` `...` `` as a single word token at the
initial lexing stage — errors inside the backtick are runtime, not
parse, concerns. Rable's fork-and-merge parser rejects bodies with
reserved words in non-reserved positions, unbalanced conditionals,
or unterminated ANSI-C quotes, producing `<error>` output or silently
corrupting the outer AST where bash yields a plain word.

Add `scan_backtick_opaque` as a fallback in `read_backtick_inner`:
on `Err` from `parse_backtick_body`, re-scan the body as raw bytes,
treating `\<x>` as a two-byte escape so `` \` `` does not terminate.
The `Ok` path is unchanged, so valid backticks still produce the
same tokens and spans as today.

Unlocks all 6 `backtick_opaque N` cases in `bash_valid_divergences`.

Closes #38
@mpecan mpecan merged commit e72166f into main Apr 18, 2026
5 checks passed
@mpecan mpecan deleted the feat/38-backtick-opaque branch April 18, 2026 09:59
mpecan pushed a commit that referenced this pull request Apr 19, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.2.0](rable-v0.1.15...rable-v0.2.0)
(2026-04-18)


### ⚠ BREAKING CHANGES

* tighten lexer API surface and relocate WordSpan to ast
([#70](#70))

### Bug Fixes

* **format:** align cmdsub reformatter with bash canonical form
([#49](#49))
([c7a4411](c7a4411))
* **lexer:** accept sloppy heredoc terminator in cmdsub mode
([#50](#50))
([40f394f](40f394f))
* **lexer:** backticks opaque when content is invalid
([#71](#71))
([e72166f](e72166f)),
closes [#38](#38)
* **lexer:** disable reserved-word recognition after assignment words
([#44](#44))
([42e1fc0](42e1fc0))
* **lexer:** stop treating ]] and unbalanced [...] as special outside
conditionals ([#45](#45))
([4bf5a5c](4bf5a5c))
* **parser:** fall back from (( … )) arith to nested subshells
([#48](#48))
([1437f00](1437f00))


### Code Refactoring

* **format:** introduce Formatter struct
([#65](#65))
([d965a8f](d965a8f))
* **lexer:** drop Result&lt;Token&gt; wrapper from operator readers
([#62](#62))
([d52a841](d52a841))
* **lexer:** split read_word_token into classify + advance + dispatch
helpers ([#63](#63))
([3ba09f5](3ba09f5))
* **parser:** extract fill_heredoc_contents visitor helpers
([#68](#68))
([40e6165](40e6165))
* **parser:** extract helpers from three oversize parsers
([#69](#69))
([25d0762](25d0762))
* **sexp:** dispatch NodeKind Display to per-category helpers
([#66](#66))
([44b0330](44b0330))
* **sexp:** table-drive ANSI-C escape dispatch
([#67](#67))
([91a5267](91a5267))
* tighten lexer API surface and relocate WordSpan to ast
([#70](#70))
([5171d01](5171d01))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: repository-butler[bot] <166800726+repository-butler[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lexer: backticks should be opaque when content is not valid bash

1 participant