feat(generators): token-level string interpolation metadata + string fixes#9
Conversation
|
Hi @theoephraim, |
|
yes - apologies when I saw what was first created, I had it cranking away to make it more clear :) |
|
The main branch has undergone some major restructurings. I don’t have the permission to push changes to this PR. Could you handle the merge conflicts or grant me the push permission? |
|
@johnsoncodehk - hm strange. everything seems correct over here 🤷 I added you as a maintainer as well to my fork just in case... |
8b53b48 to
717167f
Compare
|
Rebuilt on top of the current
Verification: Thanks for the clear reimplementation contract — the test files made this straightforward. Credited you as co-author on the commit. |
The original PR was written against the pre-IR master (escape: RegExp, pattern.source); the codebase has since moved tokens to an algebra/IR, so this reimplements the same three behaviors on current master and keeps the regression tests as the contract. - gen-tm: infer string-region delimiters generically, so an escaped backtick string keeps backtick delimiters instead of falling back to `"` - gen-lexer: gate the YAML multiline quoted-scalar continuation check behind `indent.blockScalar`, so a plain indentation grammar accepts `KEY="a\nb"` - types/api: a `string` token may declare `interpolation` regions - gen-tm / gen-monarch / gen-treesitter: consume `interpolation` — nested TextMate regions, Monarch interpolation states, and a tree-sitter rule + an external `<tok>_chars` scanner + highlight captures - tests: env-spec-regressions + interpolation-metadata, ported to the IR API All seven existing grammars regenerate byte-identically; TS conformance, the agnostic gate, and the tree-sitter accuracy gate are unchanged. Co-authored-by: Theo Ephraim <theoephraim@users.noreply.github.com>
717167f to
25ee63a
Compare
|
Follow-up after a design review of the interpolation API:
Verification unchanged: env-spec-regressions 4/4, interpolation-metadata 19/19 (incl. a real tree-sitter generate + parse), all 7 existing grammars byte-identical, TS conformance == baseline, tree-sitter gate 96.0%. |
|
🥳 thank you sir |
Apologies in advance for the AI authored PR. This was encountered while wiring up a parser for varlock - the language is called "@env-spec" and is a small DSL on top of familiar dotenv syntax which includes decorator style comments and function calls.
From what I understand, it was having some issues with handling backtick quotes correctly in the generated textmate grammar, as well as string template style regions.
Why this exists
This PR documents and locks down specific behavior needed by env-spec-style DSL grammars. The implementation may be replaced; the important part is preserving these scenarios (the tests).
Behavior locked down (must-pass)
TextMate backtick delimiter inference is correct for escaped backtick strings:
The TM region uses the backtick as its begin/end delimiter (begin = the backtick, end = the backtick followed by
|$), with no fallback to a double-quote delimiter.interpolationmetadata is first-class on string tokens and propagates to all three highlighters.begin/endare literal delimiters:<tok>_charsscanner +@punctuation.specialhighlight capturesYAML quoted-scalar continuation gating: for indentation grammars without
indent.blockScalar, inline multiline quoted values such asKEY="line1\nline2"must parse (no YAML continuation indentation error).Tests (the contract)
test/env-spec-regressions.ts— backtick delimiter regression + block-scalar overreach regressiontest/interpolation-metadata.ts— interpolation metadata propagation across TextMate / Monarch / tree-sitter (incl. a realtree-sitter generate+ parse)If this PR is replaced with a cleaner implementation, keeping these tests (or equivalent) preserves the same user-facing behavior.
Summary of changes
"fallback)indent.blockScalarinterpolationoption (highlight-only; literalbegin/enddelimiters)interpolationin TextMate, Monarch, and tree-sitter generationinterpolationtoken optionA
stringtoken may declare highlight-only interpolation regions.begin/endare literal delimiters (each generator escapes/uses them — TextMate/Monarch escape them into their regex dialect, tree-sitter uses them as literals):Validation
node test/env-spec-regressions.ts— 4/4node test/interpolation-metadata.ts— 19/19 (includes a realtree-sitter generate+ parse)npm run gen)npm test15/15, agnostic 9/9,gate:treesitter96.0% (beats official)