Skip to content

refactor(lexer): parse backticks via fork-and-merge #30

@mpecan

Description

@mpecan

Problem

Backtick command substitution (```...```) is still parsed by `read_backtick_inner` in `src/lexer/expansions.rs`, which walks characters and only knows about `\` escapes. It has the same latent bug class that `$(...)` had before #29: anything inside the backticks that looks like a terminator but isn't (e.g. a heredoc body containing a backtick) can close the scan at the wrong position.

Fix

Apply the same fork-and-merge pattern that #29 uses for `$(...)`:

  1. In `Lexer::read_backtick`, fork a fresh parser over the shared source buffer.
  2. Have the inner parser consume the body until the matching closing backtick.
  3. Merge `pos`/`line` back.
  4. Copy the consumed source range into `wb.value`.

The fork infrastructure is already in place from #29:

  • `Lexer::fork(&self) -> Self` — cheap refcounted clone of the source buffer
  • `parser::parse_cmdsub_body(outer, outer_depth)` — model for the new `parse_backtick_body`
  • `parser_depth` field — inner parser inherits outer depth for `MAX_DEPTH`
  • `in_cmdsub` flag on forked lexers — reuse or add a sibling `in_backtick` flag if the heredoc delimiter rewind logic differs

Complications specific to backticks

Backtick parsing is trickier than `$(...)` because of the escape rules:

  • `\\\`` inside a backtick is an escaped literal backtick
  • `\\\$` inside a backtick escapes a `$`
  • Nested backticks use `\\\\\\\`` to escape (every level of nesting adds a layer of backslashes)
  • `"\\\`...\\\`"` (backtick inside double quotes) has its own escape handling

The fork's inner lexer will need a mode that accounts for the backslash-escape stripping. Options:

  1. Pre-process the source range into a scratch buffer with escapes stripped, then parse that. Breaks the shared-buffer assumption. Not recommended.
  2. Add a mode flag to the forked `Lexer` that makes it treat `\\\$`, `\\\``, `\\\\` as single literal chars and the closing unescaped `\\\`` as a terminator. Symmetric to the `in_cmdsub` flag added in refactor(lexer): parse $(...) via fork-and-merge instead of a sub-lexer #29.
  3. Have the fork tokenize only as far as needed to find the closing backtick, then feed the backslash-stripped content to a second `crate::parse` call. Two-pass but simpler and structurally similar to the pre-refactor(lexer): parse $(...) via fork-and-merge instead of a sub-lexer #29 world.

Option 2 is preferred for consistency with the `$(...)` path.

Tests to add

  • Backtick containing a heredoc whose body ends with a backtick char
  • Nested backticks with appropriate backslash escaping
  • Backtick inside `"..."` (already covered by existing tests — confirm no regressions)

Scope

Independent of #29 and the process-sub follow-up. Keep the PR focused on backticks only.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions