fix(command-parser)!: drop the char-level tokenizer in favor of an AST-only path#330
Conversation
…REDOC + command substitution parsing
intent(command-parser): make `git commit -m "$(cat <<'EOF' ... EOF)"` evaluate
correctly — it was rejected with `unclosed quote` because the char-level
fallback tokenizer scanned the literal HEREDOC body as live shell, hit a
stray quote in the prose, and bailed
decision(command-parser): replace the char-level `tokenize` with an AST-only
`tokenize_command` and resolve quoting per tree-sitter node (`string`,
`raw_string`, `concatenation`, `command_name` recurses into its named
child) so a HEREDOC body never re-enters the shell scanner; fall back to
`shlex::split` only when tree-sitter cannot parse the input at all
rejected(command-parser): patching the existing fallback tokenizer to
understand HEREDOCs — it would re-implement bash's quoting rules a third
time on top of the AST walk it already does, and the two paths drifting
is what caused this bug in the first place
decision(command-parser): skip recursing into HEREDOC bodies whose delimiter
is quoted (`<<'EOF'`, `<<"EOF"`, `<<\EOF`); bash treats those bodies as
literal so a `$(cmd)` inside them is prose, not a real command — extracting
it caused false ask/deny on commit messages
constraint(command-parser): tree-sitter-bash refuses some real-world inputs
like `gh api graphql -f query=mutation{...}` whose flag value contains
unbalanced shell metacharacters; AstTokenizeOutcome distinguishes "parse
error" (use shlex) from "not a single command" (caller should have split
it first) so pipelines never silently flatten through the fallback
learned(command-parser): tree-sitter-bash exposes `string_content` separately
from interpolation children, so escape resolution can run on
`string_content` only and `$(...)` / `$X` stay as source text — wrapper
patterns like `bash -c <cmd>` still see the substitution intact
…ated section intent(release-notes): make the release notes scannable for the common reader (CLI / runok.yml authors) by surfacing the breaking change that affects them first and pushing the library-API-only ones to the bottom
intent(release-notes): drop the `</content></invoke>` artefact left by a bad copy-paste so the page renders without trailing literal tags
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #330 +/- ##
==========================================
- Coverage 89.12% 89.04% -0.08%
==========================================
Files 53 53
Lines 11771 11871 +100
==========================================
+ Hits 10491 10571 +80
- Misses 1280 1300 +20
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request migrates the command parser to a tree-sitter-bash AST-based implementation, improving tokenization accuracy and shell-quoting resolution. It introduces breaking changes by removing the CommandParseError::UnclosedQuote variant and changing how bare assignments and trailing backslashes are handled, while also fixing bugs related to quoted command names and HEREDOC processing. Review feedback recommends enhancing the HEREDOC delimiter check to detect quotes at any position within the string and simplifying the node retrieval logic for command names.
…t of the delimiter is quoted intent(command-parser): match bash semantics — `<<EO'F'`, `<<E\OF`, and similar partially-quoted delimiters disable body expansion the same way `<<'EOF'` does, so runok must not extract `$(cmd)` from inside them learned(command-parser): tree-sitter-bash sets has_error on `<<EO'F'` but parses `<<E\OF` cleanly (heredoc_start = `E\OF`); previously the quote check only looked at the first byte, so the `\OF` case slipped through and runok extracted the body's `$(cmd)` as a real command
…ed_child intent(command-parser): use the tree-sitter helper that already exists for "first named child" instead of hand-rolling a `child_count` loop — same behaviour, easier to read
…itter-bash # Conflicts: # docs/src/content/docs/releases/next.md
intent(releases): the regression originated in #330 (AST-only tokenizer) which is itself unreleased, so users will never see the broken behaviour. A "fixed" entry in next.md would describe a state that does not exist from their perspective.
intent(releases): avoid documenting a fix for a regression whose source PR has not shipped yet — #330 itself is still in next.md, so listing the regression separately would describe a bug that no released version ever exhibited
Purpose
/commitskill produces) were rejected withcommand parse error: unclosed quoteReproduction steps
Piping the following command into
runok check --input-format claude-code-hookis rejected withcommand parse error: unclosed quote, even though the existinggit [-C *] commit -m *rule should allow it.Approach
<<'EOF',<<"EOF",<<\EOF) as literal — skip the$(cmd)extraction inside their bodies, matching bash semanticsDesign decisions
Fallback for inputs tree-sitter cannot parse
shlex::splitonly when tree-sitter returnshas_errorhas_errorfrom "not a single command" so pipelines and&&lists are not silently flattened by the fallbackSyntaxErrorgh api graphql -f query=mutation{...}Breaking changes
$(cmd)from inside<<'EOF'/<<"EOF"/<<\EOFHEREDOC bodies. Bash treats these bodies as literal, so any rule that relied on matching a substitution inside one needs to be rewritten to target the command outside the HEREDOC.CommandParseError::UnclosedQuoteis removed and folded intoSyntaxError.FOO=baralone) and a trailing backslash (echo \) now returnSyntaxError.