Generative testing: grammar-derived inputs + by-construction consistency (#25)#28
Merged
Conversation
…ncy (#25) Walk the shared combinator IR to emit guaranteed-legal inputs for any Monogram grammar, replacing corpus sampling with systematic, bounded coverage — the lever a normal highlighter lacks (the source IS a grammar). Two by-construction judges, no external oracle: - round-trip: every generated derivation parses as the rule it was rooted at (parser self-consistency); the structured strategies are ~88% legal, fuzz is exploratory (random choices wander outside the IR's context constraints). - scope ≡ role: the flat highlighter's scope at each parsed token must agree with the token's by-construction role (the scope the grammar declares). Where they disagree is the #23/#24 class — a value-leading `---` the parser keeps a plain scalar but a flat grammar mis-scopes as a marker; an inner sequence `-` the parser knows is an indicator but a flat grammar folds into a string. Floor-blind (compares the punctuation class directly), so a `-` painted string is caught. The check independently re-surfaces both: a directed-nesting derivation produces `- - x\n - x` (#24); the anchored-marker scan catches a value-leading marker misfire (#23). Verified by reverting each fix — the gate fires — and depth-site coverage is asserted so generation can't silently stop exercising them. Test-suite cleanup alongside: - delete 9 dev-only scratch / superseded probes (each confirmed not a CI gate). - fold the per-language scope-gap + src-coverage adapters into two data-driven drivers (scope-gap-run.ts / src-coverage-run.ts) + a config table, the per-language entry preserved as a <lang> parameter. Output byte-identical to the old adapters; coverage-table.ts and package.json rewired. The thicker html / yaml / vue adapters keep their files and are delegated to. Adds: grammar-gen.ts (the walker), generative.ts (the judges), curated-corpora.ts. CI runs node test/generative.ts.
…overage The generated legal corpus never reached whole scoped token classes the scope≡role judge checks — for TypeScript, numerics (Hex/Octal/Binary/BigInt/ Number), because the legal corpus is shallow/structural and never lands on an expression-position literal (proven: raising cap/fuzz still yields zero numerics). Add a 5th strategy `tokenCover`: for each scoped, samplable token, descend the SHORTEST path from the entry rule that references it (reusing the distTo/exprDist BFS), build a minimal legal context (fillContent/minExpand), and substitute sampleVariants. Deterministic and minimal-context, so it stays cheap on the large TS grammar (no depth strategies for token-stream). Also sweep all top-level token-pattern `alt` branches in sampleVariants (so a Number emits hex/oct/bin/ float/bigint, not just `0`), guarded against the interesting-literal embed for decimal-start / start()-anchored tokens (no `-0x1`, no broken column-0 anchor). TS declared-scope tokens checked 157→326 (numerics now graded); generative 7/7 consistent, depth-site 2/2 (#23/#24 intact); agnostic 9/9.
Generation was seed-dependent — different opts.seed → different fuzz outputs → different "discovered" divergences. That's fatal for a reproducible gap ledger (random testing shows presence, not absence, and can't be tracked across commits) and contradicts the project's own "systematic, not a representativeness bet" thesis. The only random STRUCTURE was `fuzz` (this.rand for alt/quantifier choices); enum/ nestChain/tokenCover already rotate on a variant index. Replace fuzz with `cover`: the same walk, but every production choice comes from a deterministic mixed-radix Chooser indexed by round i alone — the first few choice points form a full base-N cartesian (t-wise interaction coverage by construction: measured complete to 3-wise), the tail perturbed by rotations. this.rand is seeded from a fixed constant; opts.seed is now a no-op. generateInputs(grammar) is a pure function of the grammar: byte-identical across runs for all 7 languages. 7/7 consistent, depth-site 2/2 (#23/#24 intact); agnostic 9/9. Foundation for a deterministic, commit-trackable gap ledger.
…ible Deterministic generation found 0 divergences — the gaps random fuzz hit were luck, and the deterministic generator couldn't produce those shapes. Discovery is bounded by generator PRECISION, not luck; so make the known gap shape-classes producible (config-derived, no language names): - markup: a NO-SPACE (tight) render variant + a directed `markupSelfCloseAttr` producer so `<img src="a"/>` (quoted attr flush against `/>`) forms. The HTML/Vue self-close `/` gap now surfaces deterministically under "discovered": «/» got «string.unquoted.html». - indent: sample plain scalars from `blockPattern` + splice a flow bracket mid-token, and directed `indentExplicitKeyBracket` producer, so `? k [y : …` forms (round-trips). - indent: `indentBlockScalar` synthesis for the `never()`-token block scalar `|`/`>` (introducer + deeper-indented body), so `string.unquoted.block` is covered (was 0%). Deterministic preserved (generateInputs pure); 7/7 gated-clean; depth-site 2/2 (#23/#24 intact); agnostic 9/9. Honest finding: the YAML explicit-key `[` divergence is a `name`-bucket scope (entity.name.tag), which the scope≡role gates (literal→content, anchored-marker) structurally don't flag — a check-precision item for a follow-up, distinct from producibility (which is now done). The HTML `/` is unambiguously gate-1.
…WN-GAPS.md) Operationalize the scope≡role check's "discovered" divergences into a committed, commit-trackable ledger instead of console output that vanishes. test/gap-ledger.ts: for each language, collect the discovered divergences (reusing the EXACT detection, factored into generative-detect.ts so generative.ts's gate is unchanged), MINIMIZE each via delta-debugging to a stable minimal repro, CLASSIFY via the neutral oracle (typescript/yaml/parse5) keeping only oracle-VALID-input gaps (over-accepts dropped), and FINGERPRINT (content hash, stable across commits). Emits KNOWN-GAPS.md (human + machine-readable), regenerated with `--write`, gated up-to-date with `--check`. Deterministic: two runs → byte-identical ledger. Currently 2 gaps, 0 dropped — the HTML/Vue self-close `/` mis-scope (`<aA aA = "a"/>` ddmin-minimized to `<A A=""/>`), the floor-blind divergence the corpus-bound scope-gap metric can't see. CI runs the selftest + `--check`. generative 7/7 unchanged; agnostic 9/9; deterministic. The fixes for these gaps live on a separate branch (highlighter product changes), so the ledger here demonstrates the tool FINDING them; a later layer can reconcile the ledger into GitHub issues.
Comments are skip:true tokens — the parser drops them, so they are never CST leaves, so the scope≡role judge (which walks the parser's CST) never checked the highlighter's comment scopes (0% covered). Closing it needs a witness the GENERATOR records, not a parser leaf. 4a — deterministic comment injection at one safe position per mode (config-derived, no `//`/`#`/`<!--` hardcoded): token-stream → a no-newline block comment at an inter-token space; indent → end-of-line `# c` outside flow; markup → `<!-- c -->` after a tagClose. A re-parse-and-drop net keeps round-trip clean; the injected comment is recorded as a witness in GenInput.tokens (its first consumer), inheriting the host's tier. 4b — the judge grades each witness span: the flat highlighter must paint `comment` somewhere in it (same scopeBucket partition + leniency); a comment painted non-comment is unambiguous, so it GATES. Coverage hole closed 0→N graded per language (YAML 442, TS 46, …), all 0 uncolored today; proven non-trivial — mutating a comment scope makes every witness uncolored and the gate fail. Deterministic preserved; 7/7 + depth-site 2/2 (#23/#24); gap-ledger --check clean; agnostic 9/9.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #25.
The source IS a grammar, so the same combinator object the parser / highlighter / tree-sitter derive from is also a generator. Walking its rule IR emits guaranteed-legal inputs — replacing "hope the corpus contains the shape" (the blind spot that hid #23/#24 from a
monogramWrong=0metric) with systematic, deterministic, grammar-derived coverage. This PR implements #25 plus a follow-on roadmap that pushes the generator to deterministic precision and operationalizes its findings.#25 core — the method
test/grammar-gen.ts— a generic, language-agnostic walker over the sharedRuleExprIR. Every per-language fact (indent tokens, flow brackets, compact indicators, markup delimiters) is read fromgrammar.indent/.markup, never hardcoded.test/generative.ts— two by-construction judges, no external oracle: round-trip (every derivation parses as its rule) and scope ≡ role (the flat highlighter's scope at each parsed token must match the token's by-construction role; floor-blind, so a-mis-painted as string is caught — the blind spot the role-graded scope-gap metric had). It re-surfaces #23/#24 by construction (verified by reverting each fix), withmustCoverasserting the corpus keeps containing both shape-classes.Roadmap (this PR, on top of the core)
tokenCover) — directed descent to each scoped token via thedistToBFS, so numerics / regex / etc. (which the shallow corpus never reached) are graded. TS scope-checked tokens 157→326.fuzz) with deterministic t-wise systematic coverage (complete to 3-wise);generateInputs(grammar)is now a pure function (seed eliminated). This is what makes the gap ledger commit-trackable, and it makes the tool faithful to its own "systematic, not a representativeness bet" thesis: discovery is bounded by generator PRECISION, not random luck.[-in-scalar; block scalars|/>), so the deterministic check finds them on purpose.test/gap-ledger.ts→KNOWN-GAPS.md) — collects the discovered divergences, delta-debug-minimizes each (<aA aA = "a"/>→<A A=""/>), classifies via the neutral oracle (typescript / yaml / parse5; over-accepts dropped), and fingerprints for cross-commit identity. Deterministic, regenerated with--write, CI-gated up-to-date with--check. It currently lists the HTML/Vue self-close/gap — a real, valid-input divergence the corpus-bound scope-gap metric is blind to (the/is a lexical-floor punct role).skiptokens (no CST leaf), so the judge couldn't see them; closed via deterministic comment injection recorded as witnesses inGenInput.tokens(its first consumer) + a judge arm grading each witness span. 0→N comment spans graded per language (YAML 442, TS 46, …), proven non-trivial (mutating a comment scope fails the gate).Test-suite cleanup (the #25 part-2)
Deleted 9 dev-only scratch probes (each confirmed not a CI gate). Folded the per-language
scope-gap+src-coverageadapters into two data-driven drivers (scope-gap-run.ts/src-coverage-run.ts) + a config table,<lang>preserved as a parameter — byte-identical output to the old adapters (the README coverage-table algorithm is unchanged; only the dispatch table was rewired). The thicker html / yaml / vue adapters are delegated to.Boundary
Round-trip proves only self-consistency — never that the parser matches an external semantic boundary — so conformance / scope-gap-vs-official / repo-compat and the negative tests all stay. The gap fixes for what the ledger finds are a separate concern (a
fix-highlighter-gapsbranch), so this PR demonstrates the tool FINDING them.Verification
npm run genbyte-identical ·tsc --noEmitclean · sanity 15/15 ·yaml-depth-witnesses10/0 ·agnostic9/9 ·generative7/7 + depth-site 2/2 ·gap-ledgerdeterministic +--checkclean + selftest ·coverage-tableend-to-end. CI runsnode test/generative.ts, the gap-ledger selftest, andgap-ledger --check.