From 06066c9e24622e726165d157e01685484e691473 Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Wed, 22 Apr 2026 12:23:50 -0700 Subject: [PATCH 01/16] actionGrammar: add opt-in compile-time grammar optimizations MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds an optimizer pipeline (`optimizeGrammar`) gated by per-grammar options, currently exposing two passes: - inlineSingleAlternatives: collapse single-alternative RulesParts into their parent, removing a layer of backtracking nesting in the matcher. Handles child.value via a flat 3-case decision — substitute into parent.value, hoist onto a single-part parent without its own value, or drop when unobservable — and propagates captured variables onto direct-capture parts when the wrapper carries a binding. - factorCommonPrefixes: factor common leading parts (including partial leading string tokens) shared by alternatives within a RulesPart, with conservative refusals on binding collisions, cross-scope value references, mixed value-presence, and multi-part suffixes that would lose the matcher's single-part default-value rule. Both passes preserve shared-array identity via an identity memo so the dedup invariant in grammarSerializer.ts holds. Tests cover the spacing-mode tightening, value substitution/hoist/drop cases, and the prefix-factoring guards. --- ts/docs/architecture/actionGrammar.md | 133 ++ .../actionGrammar/src/grammarCompiler.ts | 10 + .../actionGrammar/src/grammarLoader.ts | 4 + .../actionGrammar/src/grammarOptimizer.ts | 1156 +++++++++++++++++ ts/packages/actionGrammar/src/index.ts | 1 + .../test/grammarOptimizerBenchmark.spec.ts | 190 +++ .../test/grammarOptimizerEquivalence.spec.ts | 99 ++ .../test/grammarOptimizerFactoring.spec.ts | 120 ++ .../grammarOptimizerFactoringRepro.spec.ts | 113 ++ .../test/grammarOptimizerInline.spec.ts | 265 ++++ .../test/grammarOptimizerSharing.spec.ts | 138 ++ ...grammarOptimizerSyntheticBenchmark.spec.ts | 231 ++++ 12 files changed, 2460 insertions(+) create mode 100644 ts/packages/actionGrammar/src/grammarOptimizer.ts create mode 100644 ts/packages/actionGrammar/test/grammarOptimizerBenchmark.spec.ts create mode 100644 ts/packages/actionGrammar/test/grammarOptimizerEquivalence.spec.ts create mode 100644 ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts create mode 100644 ts/packages/actionGrammar/test/grammarOptimizerFactoringRepro.spec.ts create mode 100644 ts/packages/actionGrammar/test/grammarOptimizerInline.spec.ts create mode 100644 ts/packages/actionGrammar/test/grammarOptimizerSharing.spec.ts create mode 100644 ts/packages/actionGrammar/test/grammarOptimizerSyntheticBenchmark.spec.ts diff --git a/ts/docs/architecture/actionGrammar.md b/ts/docs/architecture/actionGrammar.md index 725f7c8d9..dde268f75 100644 --- a/ts/docs/architecture/actionGrammar.md +++ b/ts/docs/architecture/actionGrammar.md @@ -382,6 +382,139 @@ GrammarParseResult { 6. Type-checks value expressions in two passes (see [Validation architecture](#validation-architecture) below) 7. Produces the flat `Grammar` structure ready for matching +8. Optionally runs the [Compile-time optimizer](#compile-time-optimizations) + to reshape the AST without changing match semantics + +### Compile-time optimizations + +`grammarOptimizer.ts` exposes opt-in AST passes that reshape the +compiled `Grammar` to reduce matcher work without changing match +results. Both passes are off by default and individually controllable +through `LoadGrammarRulesOptions.optimizations`: + +```typescript +loadGrammarRules("agent.agr", text, { + optimizations: { + inlineSingleAlternatives: true, + factorCommonPrefixes: true, + }, +}); +``` + +The optimizer runs after value-expression validation, so it operates on +fully-compiled `CompiledValueNode`s. + +#### Pass 1 — Inline single-alternative `RulesPart` + +`inlineSingleAlternativeRules` walks all rule alternatives post-order +and replaces an eligible `RulesPart` with the spread of its child +rule's parts. This removes one layer of `ParentMatchState` push/pop and +`finalizeNestedRule` in the matcher, which is common for named rules +that simply delegate to a single sub-rule. + +A `RulesPart` is inlined only when **all** of the following hold: + +- `part.rules.length === 1` +- `!part.repeat` and `!part.optional` (loop-back / optional semantics + must be preserved) +- The child rule has no explicit `value` expression (an inlined value + would no longer fire under the parent's value-tracking policy) +- The child rule's `spacingMode` is compatible with the parent + (either `undefined` to inherit, or identical to the parent's mode) +- If `part.variable` is set, the child must consist of a single + direct-capture part (`wildcard` or `number`) so the variable name + can be propagated onto it. Variable propagation is **never** pushed + onto a nested `RulesPart` — that scope is structurally distinct from + the parent's and would silently drop the binding for cases like + `$(x:)` where `` produces a nested object via its own + value expression. + +#### Pass 2 — Common prefix factoring + +`factorCommonPrefixes` walks every `RulesPart` and groups alternatives +that share a non-empty leading prefix. The shared prefix is hoisted +into the alternative once, followed by a nested `RulesPart` containing +the remaining suffixes: + +``` +play the song -> "song" play the (song -> "song" +play the track -> "track" ⇒ | track -> "track" +play the album -> "album" | album -> "album") +``` + +**Prefix shape.** Two alternatives share a prefix of `(fullParts, +stringTokens)` shape: `fullParts` parts that are structurally equal +(via `partsEqualForFactoring`), optionally followed by `stringTokens` +matching leading tokens of a shared `StringPart`. The partial-string +case lets `play the song | play the track` factor even though +`play the song` and `play the track` are each tokenized into a single +multi-token `StringPart`. + +**Variable remapping.** `partsEqualForFactoring` treats variable parts +(`wildcard`, `number`, `rules`) as equal when their type/shape matches +even if the variable names differ. The lead alternative provides the +canonical names; for each non-lead member a `remap: Map` is built and applied to the suffix's parts and value +expression via `remapPartVariables` and `remapValueVariables`. Object +shorthand `{ x }` (compiled as `{ key: "x", value: null }`) is +expanded to `{ key: "x", value: variable("renamed") }` during remap so +the object field name stays the same. + +**Wrapper value capture.** When any suffix carries a value expression, +the new wrapper rule has more than one part and the matcher's default +single-part value-tracking policy no longer fires. The optimizer +generates a fresh variable name (`__opt_factor`, `__opt_factor_1`, …), +binds the suffix `RulesPart` to it, and produces a wrapper value +`{ type: "variable", name: "__opt_factor" }` — preserving the suffix +value through the new nesting level. + +**Iteration.** Factoring is applied to a fixed point per `RulesPart` +(capped at 8 rounds) since a freshly-factored suffix may itself share +a new prefix among its members. When both passes are enabled, Pass 1 +runs again after factoring so that any single-alternative wrappers +produced by factoring collapse away. + +**Shared-rule identity preservation.** Both passes memoize their +output by `GrammarRule[]` array identity. The compiler points every +reference to the same named rule (``) at the same underlying +`rules` array so [grammarSerializer.ts](packages/actionGrammar/src/grammarSerializer.ts) +can dedupe via `rulesToIndex.get(p.rules)`. The optimizer preserves +that invariant: two `RulesPart`s that originally pointed at the same +array still point at the same (possibly new) array after the pass — +keeping `.ag.json` size proportional to unique rule bodies, and +allowing `partsEqualForFactoring`'s `a.rules === b.rules` check to +keep matching across multiple references. + +**Safety guards.** The optimizer refuses to factor when any of the +following would change semantics: + +- **Mixed value presence.** Some members have an explicit value, others + rely on default-value semantics. Wrapping changes the parent shape + and would silently drop the implicit values. +- **Multi-part defaulted suffix.** All members rely on default values + but at least one suffix would end up with more than one part, where + the matcher's single-part default-value policy no longer applies. +- **Cross-scope value reference.** A suffix's value expression + references (after remap) a variable bound in the canonical prefix. + The matcher scopes value variables per nested rule, so the suffix + cannot see prefix bindings. +- **Suffix–prefix variable collision.** A suffix-bound variable name + collides with a canonical-prefix name after remap — would shadow the + outer binding. +- **Wholly-consumed alternative with explicit value.** The shared + prefix consumes every part of some alternative that also has an + explicit value — leaves an empty-parts suffix that cannot carry the + value cleanly. + +#### Equivalence and benchmarks + +The new `grammarOptimizer*` test specs cover unit behavior, structural +equivalence (every flag combination produces identical `matchGrammar` +output across a set of curated and real-agent grammars), and an +informational `grammarOptimizerBenchmark.spec.ts` patterned on +`dfaBenchmark.spec.ts` that prints matcher-time numbers per +configuration. Set `TYPEAGENT_SKIP_BENCHMARKS=1` to skip the +benchmark spec. ### Matching backend diff --git a/ts/packages/actionGrammar/src/grammarCompiler.ts b/ts/packages/actionGrammar/src/grammarCompiler.ts index 6dcb2c821..39d49f678 100644 --- a/ts/packages/actionGrammar/src/grammarCompiler.ts +++ b/ts/packages/actionGrammar/src/grammarCompiler.ts @@ -20,6 +20,10 @@ import { isObjectSpread, } from "./grammarRuleParser.js"; import { getLineCol } from "./utils.js"; +import { + optimizeGrammar, + GrammarOptimizationOptions, +} from "./grammarOptimizer.js"; import { globalEntityRegistry } from "./entityRegistry.js"; import { globalPhraseSetRegistry } from "./builtInPhraseMatchers.js"; import { getBuiltInEntitiesGrammarContent } from "./builtInFileLoader.js"; @@ -417,6 +421,7 @@ export function compileGrammar( warnings?: string[], imports?: ImportStatement[], schemaLoader?: SchemaLoader, + optimizations?: GrammarOptimizationOptions, ): Grammar { const grammarFileMap = new Map(); const context = createCompileContext( @@ -511,6 +516,11 @@ export function compileGrammar( if (allEntities.size > 0) { grammar.entities = Array.from(allEntities); } + // Skip optimizations when there were errors — the AST may be partial + // and optimization invariants may not hold. + if (errors.length === 0 && optimizations !== undefined) { + return optimizeGrammar(grammar, optimizations); + } return grammar; } diff --git a/ts/packages/actionGrammar/src/grammarLoader.ts b/ts/packages/actionGrammar/src/grammarLoader.ts index 5f4c26b09..093f50a3e 100644 --- a/ts/packages/actionGrammar/src/grammarLoader.ts +++ b/ts/packages/actionGrammar/src/grammarLoader.ts @@ -3,6 +3,7 @@ import { defaultFileLoader } from "./defaultFileLoader.js"; import { compileGrammar, FileLoader, SchemaLoader } from "./grammarCompiler.js"; +import { GrammarOptimizationOptions } from "./grammarOptimizer.js"; import { parseGrammarRules } from "./grammarRuleParser.js"; import { Grammar } from "./grammarTypes.js"; @@ -11,6 +12,8 @@ export type LoadGrammarRulesOptions = { startValueRequired?: boolean; // Whether the start rule must produce a value (default: true) schemaLoader?: SchemaLoader; // Optional loader for resolving .ts type imports enableValueExpressions?: boolean; // Enable JavaScript-like value expressions (default: false) + /** Compile-time AST optimizations. All optimizations default to off. */ + optimizations?: GrammarOptimizationOptions; }; function parseAndCompileGrammar( @@ -54,6 +57,7 @@ function parseAndCompileGrammar( warnings, parseResult.imports, options?.schemaLoader, + options?.optimizations, ); return grammar; } diff --git a/ts/packages/actionGrammar/src/grammarOptimizer.ts b/ts/packages/actionGrammar/src/grammarOptimizer.ts new file mode 100644 index 000000000..54e467b0f --- /dev/null +++ b/ts/packages/actionGrammar/src/grammarOptimizer.ts @@ -0,0 +1,1156 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +import registerDebug from "debug"; +import { + CompiledObjectElement, + CompiledValueNode, + Grammar, + GrammarPart, + GrammarRule, + RulesPart, +} from "./grammarTypes.js"; + +const debug = registerDebug("typeagent:grammar:opt"); + +export type GrammarOptimizationOptions = { + /** + * Inline single-alternative RulesPart when the nesting carries no + * additional semantics (no repeat, no optional, no conflicting value + * binding). Removes a layer of backtracking nesting in the matcher. + */ + inlineSingleAlternatives?: boolean; + + /** + * Factor common leading parts shared across alternatives in a RulesPart. + * Avoids re-matching the shared prefix while exploring each alternative. + */ + factorCommonPrefixes?: boolean; +}; + +/** + * Run enabled optimization passes against the compiled grammar AST. + * The returned grammar is semantically equivalent to the input — only the + * shape of the parts/rules tree changes. + * + * The optimizer is intentionally conservative: when in doubt about an + * eligibility check, it leaves the AST unchanged. + */ +export function optimizeGrammar( + grammar: Grammar, + options: GrammarOptimizationOptions | undefined, +): Grammar { + if (!options) { + return grammar; + } + let rules = grammar.rules; + if (options.inlineSingleAlternatives) { + rules = inlineSingleAlternativeRules(rules); + } + if (options.factorCommonPrefixes) { + rules = factorCommonPrefixes(rules); + if (options.inlineSingleAlternatives) { + // Factoring can produce new single-alternative wrapper rules; + // run the inliner once more so they collapse. + rules = inlineSingleAlternativeRules(rules); + } + } + if (rules === grammar.rules) { + return grammar; + } + return { ...grammar, rules }; +} + +// ───────────────────────────────────────────────────────────────────────────── +// Optimization #1: inline single-alternative RulesPart +// ───────────────────────────────────────────────────────────────────────────── + +/** + * Walk all rule alternatives post-order and replace each eligible + * RulesPart with the spread of its child rule's parts. + * + * Uses an identity memo over `GrammarRule[]` arrays so that two + * `RulesPart`s that originally pointed to the same array (named-rule + * sharing established by the compiler) still point to the same array + * after the pass. Preserves the dedup invariant that + * `grammarSerializer.ts` relies on (`rulesToIndex.get(p.rules)`). + */ +export function inlineSingleAlternativeRules( + rules: GrammarRule[], +): GrammarRule[] { + const counter = { inlined: 0 }; + const memo: RulesArrayMemo = new Map(); + const result = inlineRulesArray(rules, counter, memo); + if (counter.inlined > 0) { + debug(`inlined ${counter.inlined} single-alternative RulesParts`); + } + return result; +} + +type RulesArrayMemo = Map; + +function inlineRulesArray( + rules: GrammarRule[], + counter: { inlined: number }, + memo: RulesArrayMemo, +): GrammarRule[] { + const cached = memo.get(rules); + if (cached !== undefined) return cached; + // Reserve the slot before recursing so cycles (if any) terminate. + memo.set(rules, rules); + const next = rules.map((r) => inlineRule(r, counter, memo)); + const changed = next.some((r, i) => r !== rules[i]); + const result = changed ? next : rules; + memo.set(rules, result); + return result; +} + +function inlineRule( + rule: GrammarRule, + counter: { inlined: number }, + memo: RulesArrayMemo, +): GrammarRule { + const { parts, changed, valueSubstitutions, valueAssignment } = inlineParts( + rule.parts, + rule, + counter, + memo, + ); + if (!changed) { + return rule; + } + let value = rule.value; + if (value === undefined && valueAssignment !== undefined) { + value = valueAssignment; + } + if (valueSubstitutions.length > 0 && value !== undefined) { + for (const sub of valueSubstitutions) { + value = substituteValueVariable( + value, + sub.variable, + sub.replacement, + ); + } + } + if (value === rule.value) { + return { ...rule, parts }; + } + return { ...rule, parts, value }; +} + +type InlineValueSubstitution = { + variable: string; + replacement: CompiledValueNode; +}; + +type TryInlineResult = { + parts: GrammarPart[]; + valueSubstitution?: InlineValueSubstitution; + /** + * When set, the parent rule had no value expression of its own and + * this inlining synthesizes one — copying what the matcher would + * have computed via the single-part default-value rule (i.e. the + * captured child rule's value). Only valid when the parent had a + * single part and no `value`; in that situation no other inlining + * decision in the same parent can collide. + */ + valueAssignment?: CompiledValueNode; +}; + +function inlineParts( + parts: GrammarPart[], + parentRule: GrammarRule, + counter: { inlined: number }, + memo: RulesArrayMemo, +): { + parts: GrammarPart[]; + changed: boolean; + valueSubstitutions: InlineValueSubstitution[]; + valueAssignment: CompiledValueNode | undefined; +} { + let changed = false; + const out: GrammarPart[] = []; + const valueSubstitutions: InlineValueSubstitution[] = []; + let valueAssignment: CompiledValueNode | undefined; + for (const p of parts) { + if (p.type !== "rules") { + out.push(p); + continue; + } + // Recurse into nested rules first (post-order), preserving + // shared-array identity via memo. + const inlinedRules = inlineRulesArray(p.rules, counter, memo); + const rewritten: RulesPart = + inlinedRules !== p.rules ? { ...p, rules: inlinedRules } : p; + + const replacement = tryInlineRulesPart(rewritten, parentRule); + if (replacement !== undefined) { + counter.inlined++; + changed = true; + for (const np of replacement.parts) { + out.push(np); + } + if (replacement.valueSubstitution !== undefined) { + valueSubstitutions.push(replacement.valueSubstitution); + } + if (replacement.valueAssignment !== undefined) { + // valueAssignment is only produced when the parent had + // exactly one part (this RulesPart) and no value of its + // own — so at most one assignment is possible per + // parent rule. + valueAssignment = replacement.valueAssignment; + } + } else { + if (rewritten !== p) { + changed = true; + } + out.push(rewritten); + } + } + return { + parts: changed ? out : parts, + changed, + valueSubstitutions, + valueAssignment, + }; +} + +/** + * Decide whether `part` can be replaced by the spread of its single child + * rule's parts. Returns the replacement parts (and an optional value + * substitution to apply to the parent rule's value expression) on + * success, or `undefined` if the part must stay nested. + */ +function tryInlineRulesPart( + part: RulesPart, + parentRule: GrammarRule, +): TryInlineResult | undefined { + if (part.repeat || part.optional) { + return undefined; + } + if (part.rules.length !== 1) { + return undefined; + } + const child = part.rules[0]; + if (child.parts.length === 0) { + return undefined; + } + + // Spacing mode: the child rule's spacing mode governs the boundaries + // *between* its own parts. When inlined, those boundaries are + // governed by the parent's spacing mode. Require exact equality: + // `undefined` (auto) is a distinct mode at the matcher level, not + // a synonym for "inherit from parent" — inlining a child with + // `undefined` into a parent with `"required"` would change boundary + // behavior at e.g. digit↔Latin transitions where auto resolves to + // `optionalSpacePunctuation` but required is always + // `spacePunctuation`. + if (child.spacingMode !== parentRule.spacingMode) { + return undefined; + } + + // The child rule may carry its own value expression. After + // inlining, child.parts move into the parent and the explicit + // child.value can no longer fire on its own. child.value is + // observable to the matcher in exactly two ways — handle each, + // otherwise the value is dead and can be dropped: + // + // (1) Substitute: parent captures via `part.variable` AND + // parent.value references that variable. Substitute + // child.value for the variable in parent.value. + // + // (2) Hoist: parent has no value of its own and exactly one + // part (this RulesPart). The matcher's single-part + // default-value rule would have promoted the captured + // child.value into the parent's value at runtime. + // Synthesize that assignment explicitly on the parent. + // + // (3) Drop: child.value is unobservable; inline child.parts + // and forget the value. + // + // child.value's references to child's own part bindings remain + // in scope after inlining since those bindings move from + // child.parts → parent.parts. Only case (1) needs an additional + // collision check against the parent's *other* parts. + if (child.value !== undefined) { + // (1) Substitution. + if (part.variable !== undefined && parentRule.value !== undefined) { + const parentRefs = collectVariableReferences(parentRule.value); + if (parentRefs.has(part.variable)) { + // Refuse if child's top-level bindings would collide + // with bindings already in parent's other parts. + const childBindings = collectVariableNames(child.parts); + for (const otherPart of parentRule.parts) { + if (otherPart === part) continue; + const v = bindingName(otherPart); + if (v !== undefined && childBindings.has(v)) { + return undefined; + } + } + return { + parts: child.parts, + valueSubstitution: { + variable: part.variable, + replacement: child.value, + }, + }; + } + // Parent has its own value and doesn't reference the + // captured variable — fall through to drop. + } + + // (2) Hoist onto a single-part parent without its own value. + // No collision check needed: parent has no other parts. + if (parentRule.value === undefined && parentRule.parts.length === 1) { + return { + parts: child.parts, + valueAssignment: child.value, + }; + } + + // (3) Drop: child.value is unobservable at runtime. + return { parts: child.parts }; + } + + // If the parent expects to capture this RulesPart into a variable, the + // child rule must provide a single binding-friendly part to take the + // variable name; otherwise we'd silently drop the binding. + if (part.variable !== undefined) { + if (child.parts.length !== 1) { + return undefined; + } + const only = child.parts[0]; + const bound = withPropagatedVariable(only, part.variable); + if (bound === undefined) { + return undefined; + } + // Guard against duplicate variable names being introduced into the + // parent's parts list. + if (findExistingVariable(parentRule.parts, part.variable, part)) { + return undefined; + } + return { parts: [bound] }; + } + + return { parts: child.parts }; +} + +/** + * Return a clone of `part` with `variable` set, or undefined if the part + * cannot safely carry a variable binding via inlining. + * + * We only propagate onto direct-capture parts (wildcard/number). Pushing + * a variable onto a nested RulesPart is unsafe in the general case: the + * inner rule may compute its value via an expression that references + * names not reachable from the new parent scope, or it may provide no + * structural value at all, causing the parent's binding to miss. + */ +function withPropagatedVariable( + part: GrammarPart, + variable: string, +): GrammarPart | undefined { + switch (part.type) { + case "wildcard": + case "number": + return { ...part, variable }; + case "rules": + case "string": + case "phraseSet": + return undefined; + } +} + +function findExistingVariable( + parts: GrammarPart[], + name: string, + skip: GrammarPart, +): boolean { + for (const p of parts) { + if (p === skip) continue; + if ( + (p.type === "wildcard" || + p.type === "number" || + p.type === "rules") && + p.variable === name + ) { + return true; + } + } + return false; +} + +// ───────────────────────────────────────────────────────────────────────────── +// Optimization #2: factor common prefixes across alternatives +// ───────────────────────────────────────────────────────────────────────────── + +/** + * Walk all RulesParts and factor common leading parts shared by two or + * more alternatives within the same RulesPart. The top-level + * Grammar.rules array is not factored because each top-level alternative + * is reported separately by the matcher. + * + * Uses an identity memo over `GrammarRule[]` arrays so shared named + * rules (multiple `RulesPart`s pointing at the same array) still share + * after the pass — see `inlineSingleAlternativeRules` for rationale. + */ +export function factorCommonPrefixes(rules: GrammarRule[]): GrammarRule[] { + const counter = { factored: 0 }; + const memo: RulesArrayMemo = new Map(); + const result = factorRulesArray(rules, counter, memo); + if (counter.factored > 0) { + debug(`factored ${counter.factored} common prefix groups`); + } + return result; +} + +function factorRulesArray( + rules: GrammarRule[], + counter: { factored: number }, + memo: RulesArrayMemo, +): GrammarRule[] { + const cached = memo.get(rules); + if (cached !== undefined) return cached; + memo.set(rules, rules); + const next = rules.map((r) => factorRule(r, counter, memo)); + const changed = next.some((r, i) => r !== rules[i]); + const result = changed ? next : rules; + memo.set(rules, result); + return result; +} + +function factorRule( + rule: GrammarRule, + counter: { factored: number }, + memo: RulesArrayMemo, +): GrammarRule { + const { parts, changed } = factorParts(rule.parts, counter, memo); + if (!changed) return rule; + return { ...rule, parts }; +} + +function factorParts( + parts: GrammarPart[], + counter: { factored: number }, + memo: RulesArrayMemo, +): { parts: GrammarPart[]; changed: boolean } { + let changed = false; + const out: GrammarPart[] = []; + for (const p of parts) { + if (p.type !== "rules") { + out.push(p); + continue; + } + // Recurse into nested rules first, preserving shared-array + // identity via memo. + const recursedRules = factorRulesArray(p.rules, counter, memo); + let working: RulesPart = + recursedRules !== p.rules ? { ...p, rules: recursedRules } : p; + + // Factor with bounded iteration to fixed point. Newly produced + // suffix `RulesPart`s are not shared by construction, so they + // don't need memo entries. + for (let i = 0; i < 8; i++) { + const next = factorRulesPart(working, counter); + if (next === working) break; + working = next; + } + if (working !== p) changed = true; + out.push(working); + } + return { parts: changed ? out : parts, changed }; +} + +/** + * One pass of common-prefix factoring inside a single RulesPart. + * Returns the same object if nothing changed. + */ +function factorRulesPart( + part: RulesPart, + counter: { factored: number }, +): RulesPart { + if (part.repeat || part.optional) { + // Repeat/optional change the matcher's loop-back semantics; leave + // such groups untouched to stay safe. + return part; + } + const rules = part.rules; + if (rules.length < 2) return part; + + // Group alternatives that share at least one leading part (or at + // least one leading string token) with the group's lead alternative. + // Preserve original ordering. + const groups: { members: number[] }[] = []; + const consumed = new Set(); + for (let i = 0; i < rules.length; i++) { + if (consumed.has(i)) continue; + const group: { members: number[] } = { members: [i] }; + consumed.add(i); + for (let j = i + 1; j < rules.length; j++) { + if (consumed.has(j)) continue; + const sp = sharedPrefixShape(rules[i], rules[j]); + if (sp.fullParts > 0 || sp.stringTokens > 0) { + group.members.push(j); + consumed.add(j); + } + } + groups.push(group); + } + + if (groups.every((g) => g.members.length < 2)) return part; + + const newRules: GrammarRule[] = []; + let didFactor = false; + for (const g of groups) { + if (g.members.length < 2) { + newRules.push(rules[g.members[0]]); + continue; + } + const members = g.members.map((i) => rules[i]); + // Intersect prefix shapes across all members (using member[0] as + // canonical reference). + let shape: PrefixShape = { + fullParts: members[0].parts.length, + stringTokens: 0, + }; + for (let mi = 1; mi < members.length; mi++) { + const s = sharedPrefixShape(members[0], members[mi]); + if (s.fullParts < shape.fullParts) { + shape = { + fullParts: s.fullParts, + stringTokens: s.stringTokens, + }; + } else if ( + s.fullParts === shape.fullParts && + s.stringTokens < shape.stringTokens + ) { + shape = { + fullParts: s.fullParts, + stringTokens: s.stringTokens, + }; + } + } + if (shape.fullParts === 0 && shape.stringTokens === 0) { + for (const m of members) newRules.push(m); + continue; + } + + // Refuse to factor if any alternative would be wholly consumed by + // the shared prefix AND has a value expression — the suffix + // alternative would become empty-parts. + const wholeConsumed = (m: GrammarRule): boolean => { + if ( + m.parts.length !== + shape.fullParts + (shape.stringTokens > 0 ? 1 : 0) + ) { + return false; + } + if (shape.stringTokens === 0) { + return m.parts.length === shape.fullParts; + } + const last = m.parts[shape.fullParts]; + return ( + last.type === "string" && + last.value.length === shape.stringTokens + ); + }; + if (members.some((m) => wholeConsumed(m) && m.value !== undefined)) { + for (const m of members) newRules.push(m); + continue; + } + + // Build canonical prefix parts. + const canonicalParts: GrammarPart[] = members[0].parts.slice( + 0, + shape.fullParts, + ); + if (shape.stringTokens > 0) { + const lead = members[0].parts[shape.fullParts]; + if (lead.type !== "string") { + // Shouldn't happen (shape guarantees string), bail safely. + for (const m of members) newRules.push(m); + continue; + } + canonicalParts.push({ + type: "string", + value: lead.value.slice(0, shape.stringTokens), + }); + } + const canonicalNames = collectVariableNames(canonicalParts); + + // Build per-member variable remap from member-local prefix names + // to canonical names taken from the lead alternative. Only the + // full-parts range carries variables (partial string tokens have + // no variable bindings). + const memberRemaps: Map[] = members.map((m) => + buildPrefixRemap(canonicalParts, m.parts, shape.fullParts), + ); + + // Compute per-member suffix parts, splitting the partial + // StringPart if needed. + const memberSuffixParts: GrammarPart[][] = members.map((m) => { + if (shape.stringTokens === 0) { + return m.parts.slice(shape.fullParts); + } + const lead = m.parts[shape.fullParts]; + if (lead.type !== "string") { + return m.parts.slice(shape.fullParts); // defensive + } + const remaining = lead.value.slice(shape.stringTokens); + const rest = m.parts.slice(shape.fullParts + 1); + if (remaining.length === 0) { + return rest; + } + return [ + { type: "string", value: remaining } as GrammarPart, + ...rest, + ]; + }); + + // Verify suffix bindings won't shadow shared canonical names. + let collision = false; + for (let mi = 0; mi < members.length && !collision; mi++) { + const suffixVars = collectVariableNames(memberSuffixParts[mi]); + const remap = memberRemaps[mi]; + for (const v of suffixVars) { + const renamed = remap.get(v) ?? v; + if (canonicalNames.has(renamed)) { + collision = true; + break; + } + } + } + if (collision) { + for (const m of members) newRules.push(m); + continue; + } + + // Refuse to factor when any member's value expression references + // a variable bound in the shared prefix. The matcher scopes + // value variables per nested rule, so the suffix's value cannot + // see canonical-prefix bindings — factoring would break match + // results. + let crossScopeRef = false; + for (let mi = 0; mi < members.length && !crossScopeRef; mi++) { + const m = members[mi]; + if (m.value === undefined) continue; + const remap = memberRemaps[mi]; + const referenced = collectVariableReferences(m.value); + for (const v of referenced) { + const renamed = remap.get(v) ?? v; + if (canonicalNames.has(renamed)) { + crossScopeRef = true; + break; + } + } + } + if (crossScopeRef) { + for (const m of members) newRules.push(m); + continue; + } + + // Refuse to factor when value-presence pattern is mixed across + // members. Mixing explicit-value and implicit-default alternatives + // inside a new wrapper rule changes the matcher's default-value + // semantics for the implicit cases. + const valuePresence = members.map((m) => m.value !== undefined); + const allHaveValue = valuePresence.every((v) => v); + const noneHaveValue = valuePresence.every((v) => !v); + if (!allHaveValue && !noneHaveValue) { + for (const m of members) newRules.push(m); + continue; + } + + // Refuse to factor when (no member has explicit value) and any + // suffix would end up with a multi-part shape: the matcher's + // single-part default-value rule no longer applies, silently + // turning a valid default into `undefined`. + if (noneHaveValue) { + const anySuffixMultipart = members.some((m) => { + const suffixLen = + m.parts.length - + shape.fullParts - + (shape.stringTokens > 0 && + m.parts[shape.fullParts]?.type === "string" && + (m.parts[shape.fullParts] as any).value.length === + shape.stringTokens + ? 1 + : 0); + return suffixLen > 1; + }); + if (anySuffixMultipart) { + for (const m of members) newRules.push(m); + continue; + } + } + + const suffixRules: GrammarRule[] = members.map((m, mi) => { + const remap = memberRemaps[mi]; + const suffixParts = memberSuffixParts[mi].map((p) => + remapPartVariables(p, remap), + ); + const suffixValue = + m.value !== undefined + ? remapValueVariables(m.value, remap) + : undefined; + const out: GrammarRule = { parts: suffixParts }; + if (suffixValue !== undefined) out.value = suffixValue; + if (m.spacingMode !== undefined) out.spacingMode = m.spacingMode; + return out; + }); + + // If any suffix carries a value expression, the factored wrapper + // rule must capture it — otherwise the matcher's value-tracking + // policy would drop the nested value (parent has > 1 part with no + // explicit value). Generate a fresh variable name that does not + // collide with the shared prefix or any suffix. + const anySuffixHasValue = suffixRules.some( + (r) => r.value !== undefined, + ); + const suffixRulesPart: RulesPart = { + type: "rules", + rules: suffixRules, + }; + const factoredAlt: GrammarRule = { + parts: [...canonicalParts, suffixRulesPart], + }; + if (anySuffixHasValue) { + const reserved = new Set(canonicalNames); + for (const r of suffixRules) { + for (const v of collectVariableNames(r.parts)) reserved.add(v); + } + let gen = "__opt_factor"; + let i = 0; + while (reserved.has(gen)) { + i++; + gen = `__opt_factor_${i}`; + } + suffixRulesPart.variable = gen; + factoredAlt.value = { type: "variable", name: gen }; + } + const firstSpacing = members[0].spacingMode; + if ( + members.every((m) => m.spacingMode === firstSpacing) && + firstSpacing !== undefined + ) { + factoredAlt.spacingMode = firstSpacing; + } + + newRules.push(factoredAlt); + didFactor = true; + counter.factored++; + } + + if (!didFactor) return part; + return { ...part, rules: newRules }; +} + +// Compare two parts for "structurally equal modulo variable name". +function partsEqualForFactoring(a: GrammarPart, b: GrammarPart): boolean { + if (a.type !== b.type) return false; + switch (a.type) { + case "string": { + const bs = b as typeof a; + if (a.value.length !== bs.value.length) return false; + for (let i = 0; i < a.value.length; i++) { + if (a.value[i] !== bs.value[i]) return false; + } + return true; + } + case "phraseSet": + return a.matcherName === (b as typeof a).matcherName; + case "wildcard": { + const bw = b as typeof a; + return ( + a.typeName === bw.typeName && + (a.optional ?? false) === (bw.optional ?? false) + ); + } + case "number": { + const bn = b as typeof a; + return (a.optional ?? false) === (bn.optional ?? false); + } + case "rules": { + const br = b as typeof a; + return ( + a.rules === br.rules && + (a.optional ?? false) === (br.optional ?? false) && + (a.repeat ?? false) === (br.repeat ?? false) + ); + } + } +} + +function sharedPrefixLength(a: GrammarRule, b: GrammarRule): number { + const max = Math.min(a.parts.length, b.parts.length); + let i = 0; + while (i < max && partsEqualForFactoring(a.parts[i], b.parts[i])) i++; + return i; +} + +type PrefixShape = { + // Number of leading parts where both rules match via + // partsEqualForFactoring. + fullParts: number; + // If the next part on both sides is a StringPart with a non-empty + // common leading token sequence, this records its length. + stringTokens: number; +}; + +function sharedPrefixShape(a: GrammarRule, b: GrammarRule): PrefixShape { + const full = sharedPrefixLength(a, b); + let stringTokens = 0; + if (full < a.parts.length && full < b.parts.length) { + const pa = a.parts[full]; + const pb = b.parts[full]; + if (pa.type === "string" && pb.type === "string") { + const max = Math.min(pa.value.length, pb.value.length); + while ( + stringTokens < max && + pa.value[stringTokens] === pb.value[stringTokens] + ) { + stringTokens++; + } + } + } + return { fullParts: full, stringTokens }; +} + +function collectVariableNames(parts: GrammarPart[]): Set { + const out = new Set(); + for (const p of parts) { + if ( + (p.type === "wildcard" || + p.type === "number" || + p.type === "rules") && + p.variable !== undefined + ) { + out.add(p.variable); + } + } + return out; +} + +function bindingName(p: GrammarPart): string | undefined { + if (p.type === "wildcard" || p.type === "number" || p.type === "rules") { + return p.variable; + } + return undefined; +} + +function collectVariableReferences(node: CompiledValueNode): Set { + const out = new Set(); + const walk = (n: CompiledValueNode) => { + switch (n.type) { + case "literal": + return; + case "variable": + out.add(n.name); + return; + case "object": + for (const el of n.value) { + if (el.type === "spread") { + walk(el.argument); + } else if (el.value === null) { + // Shorthand { foo } = { foo: foo } + out.add(el.key); + } else { + walk(el.value); + } + } + return; + case "array": + for (const v of n.value) walk(v); + return; + case "binaryExpression": + walk(n.left); + walk(n.right); + return; + case "unaryExpression": + walk(n.operand); + return; + case "conditionalExpression": + walk(n.test); + walk(n.consequent); + walk(n.alternate); + return; + case "memberExpression": + walk(n.object); + if (typeof n.property !== "string") walk(n.property); + return; + case "callExpression": + walk(n.callee); + for (const a of n.arguments) walk(a); + return; + case "spreadElement": + walk(n.argument); + return; + case "templateLiteral": + for (const e of n.expressions) walk(e); + return; + } + }; + walk(node); + return out; +} + +function buildPrefixRemap( + canonicalParts: GrammarPart[], + memberParts: GrammarPart[], + sharedLen: number, +): Map { + const remap = new Map(); + for (let i = 0; i < sharedLen; i++) { + const cv = bindingName(canonicalParts[i]); + const mv = bindingName(memberParts[i]); + if (cv !== undefined && mv !== undefined && cv !== mv) { + remap.set(mv, cv); + } + } + return remap; +} + +function remapPartVariables( + part: GrammarPart, + remap: Map, +): GrammarPart { + if (remap.size === 0) return part; + switch (part.type) { + case "wildcard": + case "number": + if (part.variable && remap.has(part.variable)) { + return { ...part, variable: remap.get(part.variable)! }; + } + return part; + case "rules": + // Rename this part's own variable; do NOT recurse into nested + // rules — those have their own scope. + if (part.variable && remap.has(part.variable)) { + return { ...part, variable: remap.get(part.variable)! }; + } + return part; + case "string": + case "phraseSet": + return part; + } +} + +function remapValueVariables( + node: CompiledValueNode, + remap: Map, +): CompiledValueNode { + if (remap.size === 0) return node; + switch (node.type) { + case "literal": + return node; + case "variable": + if (remap.has(node.name)) { + return { ...node, name: remap.get(node.name)! }; + } + return node; + case "object": { + const value: CompiledObjectElement[] = node.value.map((el) => { + if (el.type === "spread") { + return { + ...el, + argument: remapValueVariables(el.argument, remap), + }; + } + if (el.value === null) { + // Shorthand { foo } = { foo: foo }. If the key is + // being remapped, expand to a full property so the + // key (object field name) stays the same while the + // value references the new variable name. + if (remap.has(el.key)) { + return { + ...el, + value: { + type: "variable" as const, + name: remap.get(el.key)!, + }, + }; + } + return el; + } + return { + ...el, + value: remapValueVariables(el.value, remap), + }; + }); + return { ...node, value }; + } + case "array": + return { + ...node, + value: node.value.map((v) => remapValueVariables(v, remap)), + }; + case "binaryExpression": + return { + ...node, + left: remapValueVariables(node.left, remap), + right: remapValueVariables(node.right, remap), + }; + case "unaryExpression": + return { + ...node, + operand: remapValueVariables(node.operand, remap), + }; + case "conditionalExpression": + return { + ...node, + test: remapValueVariables(node.test, remap), + consequent: remapValueVariables(node.consequent, remap), + alternate: remapValueVariables(node.alternate, remap), + }; + case "memberExpression": + return { + ...node, + object: remapValueVariables(node.object, remap), + property: + typeof node.property === "string" + ? node.property + : remapValueVariables(node.property, remap), + }; + case "callExpression": + return { + ...node, + callee: remapValueVariables(node.callee, remap), + arguments: node.arguments.map((a) => + remapValueVariables(a, remap), + ), + }; + case "spreadElement": + return { + ...node, + argument: remapValueVariables(node.argument, remap), + }; + case "templateLiteral": + return { + ...node, + expressions: node.expressions.map((e) => + remapValueVariables(e, remap), + ), + }; + } +} + +/** + * Replace every reference to the variable `name` in `node` with a deep + * copy of `replacement`. Used by the inliner when a child rule with an + * explicit value expression is folded into its parent: the parent's + * value expression's reference to the captured variable is substituted + * with the child's own value expression. + */ +function substituteValueVariable( + node: CompiledValueNode, + name: string, + replacement: CompiledValueNode, +): CompiledValueNode { + switch (node.type) { + case "literal": + return node; + case "variable": + return node.name === name ? replacement : node; + case "object": { + const value: CompiledObjectElement[] = node.value.map((el) => { + if (el.type === "spread") { + return { + ...el, + argument: substituteValueVariable( + el.argument, + name, + replacement, + ), + }; + } + if (el.value === null) { + // Shorthand { foo } = { foo: foo }. If the key is + // the variable being substituted, expand to the + // full property form { foo: }. + if (el.key === name) { + return { ...el, value: replacement }; + } + return el; + } + return { + ...el, + value: substituteValueVariable(el.value, name, replacement), + }; + }); + return { ...node, value }; + } + case "array": + return { + ...node, + value: node.value.map((v) => + substituteValueVariable(v, name, replacement), + ), + }; + case "binaryExpression": + return { + ...node, + left: substituteValueVariable(node.left, name, replacement), + right: substituteValueVariable(node.right, name, replacement), + }; + case "unaryExpression": + return { + ...node, + operand: substituteValueVariable( + node.operand, + name, + replacement, + ), + }; + case "conditionalExpression": + return { + ...node, + test: substituteValueVariable(node.test, name, replacement), + consequent: substituteValueVariable( + node.consequent, + name, + replacement, + ), + alternate: substituteValueVariable( + node.alternate, + name, + replacement, + ), + }; + case "memberExpression": + return { + ...node, + object: substituteValueVariable(node.object, name, replacement), + property: + typeof node.property === "string" + ? node.property + : substituteValueVariable( + node.property, + name, + replacement, + ), + }; + case "callExpression": + return { + ...node, + callee: substituteValueVariable(node.callee, name, replacement), + arguments: node.arguments.map((a) => + substituteValueVariable(a, name, replacement), + ), + }; + case "spreadElement": + return { + ...node, + argument: substituteValueVariable( + node.argument, + name, + replacement, + ), + }; + case "templateLiteral": + return { + ...node, + expressions: node.expressions.map((e) => + substituteValueVariable(e, name, replacement), + ), + }; + } +} diff --git a/ts/packages/actionGrammar/src/index.ts b/ts/packages/actionGrammar/src/index.ts index c0e6993cc..056e38ab0 100644 --- a/ts/packages/actionGrammar/src/index.ts +++ b/ts/packages/actionGrammar/src/index.ts @@ -10,6 +10,7 @@ export { grammarFromJson } from "./grammarDeserializer.js"; export { grammarToJson } from "./grammarSerializer.js"; export { loadGrammarRules, loadGrammarRulesNoThrow } from "./grammarLoader.js"; export type { LoadGrammarRulesOptions } from "./grammarLoader.js"; +export type { GrammarOptimizationOptions } from "./grammarOptimizer.js"; export type { SchemaLoader } from "./grammarCompiler.js"; // Parser (for tooling — formatter, linters, etc.) diff --git a/ts/packages/actionGrammar/test/grammarOptimizerBenchmark.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerBenchmark.spec.ts new file mode 100644 index 000000000..1c04c3033 --- /dev/null +++ b/ts/packages/actionGrammar/test/grammarOptimizerBenchmark.spec.ts @@ -0,0 +1,190 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Optimizer benchmark — informational only. + * + * Measures matcher-time impact of each grammar optimization pass on real + * grammars (player, list, calendar, browser, ...). Each configuration is + * compared against the unoptimized baseline. + * + * All assertions are informational. This spec runs as part of the normal + * test suite but produces no hard failures — only console output. + * + * To skip (e.g. on slow machines), set TYPEAGENT_SKIP_BENCHMARKS=1. + */ + +import * as path from "path"; +import * as fs from "fs"; +import { fileURLToPath } from "url"; +import { + loadGrammarRulesNoThrow, + LoadGrammarRulesOptions, +} from "../src/grammarLoader.js"; +import { matchGrammar } from "../src/grammarMatcher.js"; +import { registerBuiltInEntities } from "../src/builtInEntities.js"; + +const __filename = fileURLToPath(import.meta.url); +const __dirname = path.dirname(__filename); + +const ITERATIONS = 500; +const SHOULD_SKIP = process.env.TYPEAGENT_SKIP_BENCHMARKS === "1"; + +function fileExists(p: string): boolean { + try { + fs.accessSync(p, fs.constants.R_OK); + return true; + } catch { + return false; + } +} + +function timeMs(fn: () => void, iterations: number): number { + const start = performance.now(); + for (let i = 0; i < iterations; i++) fn(); + return performance.now() - start; +} + +function countRulesParts( + grammar: ReturnType, +): number { + if (!grammar) return 0; + let count = 0; + const visit = (parts: any[]) => { + for (const p of parts) { + if (p.type === "rules") { + count++; + for (const r of p.rules) visit(r.parts); + } + } + }; + for (const r of grammar.rules) visit(r.parts); + return count; +} + +function benchmark( + label: string, + grammarPath: string, + requests: string[], +): void { + if (!fileExists(grammarPath)) { + console.log(`[skip] ${label}: grammar not found at ${grammarPath}`); + return; + } + registerBuiltInEntities(); + const content = fs.readFileSync(grammarPath, "utf-8"); + const configs: { name: string; opts: LoadGrammarRulesOptions }[] = [ + { name: "baseline", opts: {} }, + { + name: "inline", + opts: { optimizations: { inlineSingleAlternatives: true } }, + }, + { + name: "factor", + opts: { optimizations: { factorCommonPrefixes: true } }, + }, + { + name: "both", + opts: { + optimizations: { + inlineSingleAlternatives: true, + factorCommonPrefixes: true, + }, + }, + }, + ]; + + console.log(`\n=== ${label} ===`); + console.log( + `| config | RulesParts | match ms (${ITERATIONS}x) | speedup |`, + ); + console.log(`|-----------|------------|----------------|---------|`); + + let baselineMs = 0; + for (const cfg of configs) { + const errors: string[] = []; + const g = loadGrammarRulesNoThrow( + path.basename(grammarPath), + content, + errors, + undefined, + cfg.opts, + ); + if (!g) { + console.log(`[error] ${cfg.name}: ${errors.join("; ")}`); + continue; + } + const partCount = countRulesParts(g); + // Warm-up — also validates that the optimized grammar can run. + try { + for (const r of requests) matchGrammar(g, r); + } catch (e) { + console.log( + `[error] ${cfg.name} match failed: ${(e as Error).message}`, + ); + continue; + } + // Timed. + const ms = timeMs(() => { + for (const r of requests) matchGrammar(g, r); + }, ITERATIONS); + if (cfg.name === "baseline") baselineMs = ms; + const speedup = baselineMs > 0 ? baselineMs / ms : 1; + console.log( + `| ${cfg.name.padEnd(9)} | ${String(partCount).padStart(10)} | ${ms.toFixed(1).padStart(14)} | ${speedup.toFixed(2)}x`, + ); + } +} + +describe("Grammar Optimizer Benchmark", () => { + (SHOULD_SKIP ? it.skip : it)("player", () => { + benchmark( + "player", + path.resolve( + __dirname, + "../../../agents/player/src/agent/playerSchema.agr", + ), + [ + "pause", + "resume", + "play Shake It Off by Taylor Swift", + "select kitchen", + "set volume to 50", + "play the first track", + "skip to the next track", + "play some music", + ], + ); + expect(true).toBe(true); + }); + + (SHOULD_SKIP ? it.skip : it)("list", () => { + benchmark( + "list", + path.resolve(__dirname, "../../../agents/list/src/listSchema.agr"), + [ + "add apples to grocery list", + "remove milk from grocery list", + "create list shopping", + "clear grocery list", + ], + ); + expect(true).toBe(true); + }); + + (SHOULD_SKIP ? it.skip : it)("calendar", () => { + benchmark( + "calendar", + path.resolve( + __dirname, + "../../../agents/calendar/src/calendarSchema.agr", + ), + [ + "schedule meeting tomorrow at 3pm", + "cancel my 2pm meeting", + "show my calendar", + ], + ); + expect(true).toBe(true); + }); +}); diff --git a/ts/packages/actionGrammar/test/grammarOptimizerEquivalence.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerEquivalence.spec.ts new file mode 100644 index 000000000..feb24829e --- /dev/null +++ b/ts/packages/actionGrammar/test/grammarOptimizerEquivalence.spec.ts @@ -0,0 +1,99 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +import { loadGrammarRules } from "../src/grammarLoader.js"; +import { matchGrammar } from "../src/grammarMatcher.js"; + +const grammars: { name: string; text: string; inputs: string[] }[] = [ + { + name: "player-like", + text: ` = ; + = play $(track:string) -> { actionName: "play", track } + | pause -> { actionName: "pause" } + | stop -> { actionName: "stop" };`, + inputs: ["play hello", "pause", "stop", "unknown"], + }, + { + name: "shared-prefix-three-way", + text: ` = ; + = play the song -> "song" + | play the track -> "track" + | play the album -> "album";`, + inputs: [ + "play the song", + "play the track", + "play the album", + "play the", + ], + }, + { + name: "wrapper-rule", + text: ` = ; + = ; + = hello world;`, + inputs: ["hello world", "hello", "world"], + }, + { + name: "variable-rename-across-alternatives", + text: ` = ; + = play $(a:string) once -> { kind: "once", a } + | play $(b:string) twice -> { kind: "twice", v: b };`, + inputs: ["play hello once", "play hello twice", "play hello"], + }, + { + name: "value-on-wrapper", + text: ` = ; + = hello -> { greeting: true };`, + inputs: ["hello", "bye"], + }, +]; + +const flagCombos: { + name: string; + opts: { + inlineSingleAlternatives?: boolean; + factorCommonPrefixes?: boolean; + }; +}[] = [ + { name: "inline-only", opts: { inlineSingleAlternatives: true } }, + { name: "factor-only", opts: { factorCommonPrefixes: true } }, + { + name: "both", + opts: { + inlineSingleAlternatives: true, + factorCommonPrefixes: true, + }, + }, +]; + +function matchAll( + grammar: ReturnType, + request: string, +) { + const out = matchGrammar(grammar, request).map((m) => ({ + match: m.match, + })); + // Sort for stable comparison — match order across optimizer combos + // may differ but the multi-set of results must agree. + return out.map((x) => JSON.stringify(x.match)).sort(); +} + +describe("Grammar Optimizer - Match equivalence", () => { + for (const g of grammars) { + describe(`grammar: ${g.name}`, () => { + const baseline = loadGrammarRules("t.grammar", g.text); + for (const combo of flagCombos) { + const optimized = loadGrammarRules("t.grammar", g.text, { + optimizations: combo.opts, + }); + for (const input of g.inputs) { + it(`[${combo.name}] '${input}'`, () => { + expect(matchAll(optimized, input)).toStrictEqual( + matchAll(baseline, input), + ); + }); + } + } + }); + } +}); diff --git a/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts new file mode 100644 index 000000000..517a2c019 --- /dev/null +++ b/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts @@ -0,0 +1,120 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +import { loadGrammarRules } from "../src/grammarLoader.js"; +import { matchGrammar } from "../src/grammarMatcher.js"; +import { GrammarPart, GrammarRule, RulesPart } from "../src/grammarTypes.js"; + +function findFirstRulesPart(rules: GrammarRule[]): RulesPart | undefined { + const visit = (parts: GrammarPart[]): RulesPart | undefined => { + for (const p of parts) { + if (p.type === "rules") return p; + } + for (const p of parts) { + if (p.type === "rules") { + for (const r of p.rules) { + const inner = visit(r.parts); + if (inner) return inner; + } + } + } + return undefined; + }; + for (const r of rules) { + const found = visit(r.parts); + if (found) return found; + } + return undefined; +} + +function match(grammar: ReturnType, request: string) { + return matchGrammar(grammar, request).map((m) => m.match); +} + +describe("Grammar Optimizer - Common prefix factoring", () => { + it("factors a literal common prefix across alternatives", () => { + // Three alternatives all share "play the ". + const text = ` = ; + = play the song -> "song" | play the track -> "track" | play the album -> "album";`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + + // Match results unchanged. + for (const input of [ + "play the song", + "play the track", + "play the album", + ]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + + // The optimized AST has fewer top-level alternatives in . + const optChoice = findFirstRulesPart(optimized.rules); + const baseChoice = findFirstRulesPart(baseline.rules); + expect(optChoice).toBeDefined(); + expect(baseChoice).toBeDefined(); + expect(optChoice!.rules.length).toBeLessThan(baseChoice!.rules.length); + }); + + it("preserves match results when alternatives use different variable names", () => { + const text = ` = ; + = play $(a:string) -> { kind: "a", v: a } + | play $(b:string) -> { kind: "b", v: b };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + const baseRes = match(baseline, "play hello"); + const optRes = match(optimized, "play hello"); + // Should produce the same set of results (order may differ). + expect(optRes.length).toBe(baseRes.length); + expect(optRes).toEqual(expect.arrayContaining(baseRes)); + expect(baseRes).toEqual(expect.arrayContaining(optRes)); + }); + + it("no-op when there is only one alternative", () => { + const text = ` = play the song -> "song";`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + expect(JSON.stringify(optimized.rules)).toBe( + JSON.stringify(baseline.rules), + ); + }); + + it("no-op when alternatives share no leading parts", () => { + const text = ` = ; + = foo -> 1 | bar -> 2 | baz -> 3;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + // No shared first part, factoring has nothing to do. + const optChoice = findFirstRulesPart(optimized.rules); + const baseChoice = findFirstRulesPart(baseline.rules); + expect(optChoice!.rules.length).toBe(baseChoice!.rules.length); + for (const input of ["foo", "bar", "baz"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + it("does not touch repeat groups", () => { + const text = ` = (a x | a y)+ -> true;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + // Repeat groups aren't factored — top-level RulesPart with + // repeat=true is left as-is. Match results must still agree. + expect(match(optimized, "a x a y")).toStrictEqual( + match(baseline, "a x a y"), + ); + }); +}); diff --git a/ts/packages/actionGrammar/test/grammarOptimizerFactoringRepro.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerFactoringRepro.spec.ts new file mode 100644 index 000000000..02e32fe04 --- /dev/null +++ b/ts/packages/actionGrammar/test/grammarOptimizerFactoringRepro.spec.ts @@ -0,0 +1,113 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Targeted reproduction tests for factoring edge cases that previously + * broke the player grammar. Keep these as regression tests. + */ + +import { loadGrammarRules } from "../src/grammarLoader.js"; +import { matchGrammar } from "../src/grammarMatcher.js"; + +function match(grammar: ReturnType, s: string) { + return matchGrammar(grammar, s).map((m) => m.match); +} + +describe("Grammar Optimizer - Factoring Repro", () => { + it("handles alternatives that re-use the same variable name", () => { + const text = ` = ; + = play $(trackName:string) -> { kind: "solo", trackName } + | play $(trackName:string) by $(artist:string) -> { kind: "duet", trackName, artist };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of [ + "play Hello", + "play Shake It Off by Taylor Swift", + ]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + it("handles a group that is fully consumed by the shared prefix", () => { + const text = ` = ; + = play -> "just" + | play the song -> "song" + | play the track -> "track";`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play", "play the song", "play the track"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + it("handles mixed explicit / default value alternatives", () => { + const text = ` = ; + = play the song + | play the track -> "custom";`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play the song", "play the track"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + it("handles shared literal prefix with distinct wrapped RulesParts (player-like)", () => { + const text = ` = ; + = $(trackName:string) -> trackName + | the $(trackName:string) -> trackName; + = play $(trackName:) by $(artist:string) -> { kind: "byArtist", trackName, artist } + | play $(trackName:) from album $(albumName:string) -> { kind: "fromAlbum", trackName, albumName };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of [ + "play hello by taylor", + "play the hello by taylor", + "play hello from album unity", + ]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // Regression for the failure surfaced by the optimizer benchmark + // against the player grammar: + // + // "Internal error: No value for variable 'trackName'. + // Values: {"name":"artist","valueId":4}" + // + // Object shorthand `{ trackName }` compiles to a property element + // with `value: null` (key = "trackName", expanded at evaluation + // time to `trackName: trackName`). Variable-renaming during + // factoring must (a) detect that the key is a variable reference + // and (b) rewrite it without changing the object field name. + it("rewrites object shorthand keys when remapping variables", () => { + const text = ` = ; + = greet $(name:string) -> { name } + | greet $(other:string) twice -> { other };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["greet alice", "greet bob twice"]) { + // No "Internal error" thrown, and matches identical. + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); +}); diff --git a/ts/packages/actionGrammar/test/grammarOptimizerInline.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerInline.spec.ts new file mode 100644 index 000000000..1830956e9 --- /dev/null +++ b/ts/packages/actionGrammar/test/grammarOptimizerInline.spec.ts @@ -0,0 +1,265 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +import { loadGrammarRules } from "../src/grammarLoader.js"; +import { matchGrammar } from "../src/grammarMatcher.js"; +import { GrammarPart, GrammarRule } from "../src/grammarTypes.js"; + +function countRulesParts(rules: GrammarRule[]): number { + let n = 0; + const visit = (parts: GrammarPart[]) => { + for (const p of parts) { + if (p.type === "rules") { + n++; + for (const r of p.rules) visit(r.parts); + } + } + }; + for (const r of rules) visit(r.parts); + return n; +} + +function match(grammar: ReturnType, request: string) { + return matchGrammar(grammar, request).map((m) => m.match); +} + +describe("Grammar Optimizer - Inline single-alternative RulesPart", () => { + it("inlines a simple pass-through wrapper rule", () => { + const text = ` = -> true; + = play world;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + + // Baseline has at least one wrapping RulesPart for . + expect(countRulesParts(baseline.rules)).toBeGreaterThan( + countRulesParts(optimized.rules), + ); + expect(match(optimized, "play world")).toStrictEqual([true]); + expect(match(baseline, "play world")).toStrictEqual( + match(optimized, "play world"), + ); + }); + + it("preserves variable binding when inlining a wildcard child", () => { + const text = ` = play $(t:) -> t; + = $(name:string) -> name;`; + // Note: has a value expression, so the inliner must + // refuse to inline it (would lose the value). + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(match(optimized, "play hello")).toStrictEqual(["hello"]); + }); + + it("inlines single-part parent without value by hoisting child's value", () => { + const text = ` = ; + = hello -> { greeting: true };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + // The child rule has a value expression and the parent is a + // single-part wrapper with no value. The optimizer hoists + // child's value onto the parent and inlines child's parts. + expect(countRulesParts(optimized.rules)).toBeLessThan( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "hello")).toStrictEqual([{ greeting: true }]); + expect(match(optimized, "hello")).toStrictEqual( + match(baseline, "hello"), + ); + }); + + it("skips inlining when the part is repeated", () => { + const text = ` = ()+ -> true; + = a;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + // The outer (...)+ is itself a RulesPart with repeat=true and + // cannot be flattened. The inner reference, however, + // can collapse one level (it's a single-alternative wrapper). + expect(match(optimized, "a a a")).toStrictEqual( + match(baseline, "a a a"), + ); + }); + + it("inline + factor combined: still produces the same matches", () => { + const text = ` = play the $(t:) -> t; + = song -> "song" | track -> "track" | album -> "album";`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { + inlineSingleAlternatives: true, + factorCommonPrefixes: true, + }, + }); + for (const input of [ + "play the song", + "play the track", + "play the album", + ]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + it("optimizations off by default leave the AST unchanged", () => { + const text = ` = ; + = hello world -> true;`; + const baseline = loadGrammarRules("t.grammar", text); + const noOpt = loadGrammarRules("t.grammar", text, {}); + expect(JSON.stringify(noOpt.rules)).toBe( + JSON.stringify(baseline.rules), + ); + }); + + // Regression: child with auto (undefined) spacingMode must NOT be + // inlined into a parent with an explicit mode (e.g. "required"), + // because `undefined` is its own mode at runtime — boundaries + // resolve per character pair — and inlining changes how the matcher + // treats those boundaries. + it("skips inlining when child spacingMode (auto) differs from parent (required)", () => { + // Parent declares [spacing=required]; child + // inherits auto. The two are not equivalent at e.g. digit↔Latin + // boundaries, so the inliner must leave the wrapper in place. + const text = ` [spacing=required] = play -> true; + = hello world;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBe( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "play hello world")).toStrictEqual( + match(baseline, "play hello world"), + ); + }); + + it("inlines child with value expression by substituting into parent's value", () => { + // Parent captures into `t` and references it in its + // value expression. The inliner can substitute Inner's value + // expression for `t` and inline Inner's parts. + const text = ` = play $(t:) -> { kind: "play", what: t }; + = $(name:string) loud -> { who: name };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBeLessThan( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "play hello loud")).toStrictEqual( + match(baseline, "play hello loud"), + ); + expect(match(optimized, "play hello loud")).toStrictEqual([ + { kind: "play", what: { who: "hello" } }, + ]); + }); + + it("inlines and drops child value when child value is unobservable (no part.variable, multi-part parent)", () => { + // Child has a value expression but parent does not capture it + // via a variable AND parent has more than one part — so the + // matcher's single-part default-value rule never fires. The + // child's value is unobservable at runtime, so the inliner can + // safely drop it and inline the child's parts. + const text = ` = play now -> true; + = $(name:string) loud -> { who: name };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBeLessThan( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "play hello loud now")).toStrictEqual( + match(baseline, "play hello loud now"), + ); + expect(match(optimized, "play hello loud now")).toStrictEqual([true]); + }); + + it("inlines single-part parent without value by hoisting child's value (with bindings)", () => { + // Parent has a single part (the RulesPart) and no + // explicit value of its own — so the matcher would use + // child.value as the parent's default. The optimizer hoists + // child.value onto the parent and inlines child.parts. The + // bindings child.value references (e.g. `name`) come along in + // the inlined parts, so they remain in scope for the hoisted + // expression. + const text = ` = ; + = $(name:string) loud -> { who: name };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBeLessThan( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "hello loud")).toStrictEqual( + match(baseline, "hello loud"), + ); + expect(match(optimized, "hello loud")).toStrictEqual([ + { who: "hello" }, + ]); + }); + + it("inlines when child and parent share the same explicit spacingMode", () => { + const text = ` [spacing=required] = play -> true; + [spacing=required] = hello world;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBeLessThan( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "play hello world")).toStrictEqual( + match(baseline, "play hello world"), + ); + }); + + it("skips value-substitution inline when child binding collides with parent binding", () => { + // Parent already has `name` as a binding; child also binds + // `name`. After inlining, two `name` bindings would collide in + // the same scope, so the inliner must refuse. + const text = ` = $(name:string) says $(t:) -> { speaker: name, said: t }; + = $(name:string) loud -> name;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + // No inline of the value-bearing child. + expect(countRulesParts(optimized.rules)).toBe( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "alice says bob loud")).toStrictEqual( + match(baseline, "alice says bob loud"), + ); + }); + + it("inlines and drops child value when parent value does not reference the captured variable", () => { + // Parent binds to `t` but never uses `t` in its value + // expression. The child's value is dead at runtime, so the + // inliner can drop it and inline child.parts. + const text = ` = play $(t:) -> { kind: "play" }; + = $(name:string) loud -> { who: name };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBeLessThan( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "play hello loud")).toStrictEqual( + match(baseline, "play hello loud"), + ); + expect(match(optimized, "play hello loud")).toStrictEqual([ + { kind: "play" }, + ]); + }); +}); diff --git a/ts/packages/actionGrammar/test/grammarOptimizerSharing.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerSharing.spec.ts new file mode 100644 index 000000000..3bfac02da --- /dev/null +++ b/ts/packages/actionGrammar/test/grammarOptimizerSharing.spec.ts @@ -0,0 +1,138 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Verifies the optimizer preserves the compiler's shared-rules-array + * invariant: when two `RulesPart`s reference the same named rule, they + * must share the same `GrammarRule[]` array identity after optimization. + * + * `grammarSerializer.ts` keys its dedup map on that identity + * (`rulesToIndex.get(p.rules)`) so losing it would inflate + * serialized `.ag.json` size proportionally to the reference count. + */ + +import { loadGrammarRules } from "../src/grammarLoader.js"; +import { matchGrammar } from "../src/grammarMatcher.js"; +import { grammarToJson } from "../src/grammarSerializer.js"; +import { GrammarPart, GrammarRule, RulesPart } from "../src/grammarTypes.js"; + +function findAllRulesParts(rules: GrammarRule[]): RulesPart[] { + const out: RulesPart[] = []; + const seen = new Set(); + const visit = (parts: GrammarPart[]) => { + for (const p of parts) { + if (p.type !== "rules") continue; + out.push(p); + if (seen.has(p.rules)) continue; + seen.add(p.rules); + for (const r of p.rules) visit(r.parts); + } + }; + for (const r of rules) visit(r.parts); + return out; +} + +function match(grammar: ReturnType, s: string) { + return matchGrammar(grammar, s).map((m) => m.match); +} + +describe("Grammar Optimizer - Shared rule identity preservation", () => { + // Grammar with a named rule referenced from three different sites. + const text = ` = | | ; + = sing $(name:); + = play $(name:); + = hum $(name:); + = the song | a track | that tune;`; + + function commonRulesArrays( + grammar: ReturnType, + ): GrammarRule[][] { + const rps = findAllRulesParts(grammar.rules); + // Find every RulesPart whose body matches the shape. + return rps + .filter((p) => + p.rules.every( + (r) => + r.parts.length === 1 && + r.parts[0].type === "string" && + ["the song", "a track", "that tune"].some( + (s) => + (r.parts[0] as any).value.join(" ") === s, + ), + ), + ) + .map((p) => p.rules); + } + + it("baseline compiler produces a single shared array", () => { + const baseline = loadGrammarRules("t.grammar", text); + const arrays = commonRulesArrays(baseline); + expect(arrays.length).toBeGreaterThanOrEqual(3); + for (let i = 1; i < arrays.length; i++) { + expect(arrays[i]).toBe(arrays[0]); + } + }); + + for (const [name, opts] of [ + ["inline only", { inlineSingleAlternatives: true }], + ["factor only", { factorCommonPrefixes: true }], + [ + "both", + { + inlineSingleAlternatives: true, + factorCommonPrefixes: true, + }, + ], + ] as const) { + it(`preserves shared array identity (${name})`, () => { + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: opts, + }); + const arrays = commonRulesArrays(optimized); + expect(arrays.length).toBeGreaterThanOrEqual(3); + for (let i = 1; i < arrays.length; i++) { + expect(arrays[i]).toBe(arrays[0]); + } + }); + + it(`serialized output dedupes rule (${name})`, () => { + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: opts, + }); + const baseJson = grammarToJson(baseline); + const optJson = grammarToJson(optimized); + // The body of should appear in exactly one + // GrammarRulesJson entry on both sides. + const countCommonEntries = (json: typeof baseJson) => + json.filter( + (entry) => + Array.isArray(entry) && + entry.length === 3 && + entry.every( + (rule: any) => + rule.parts?.length === 1 && + rule.parts[0].type === "string", + ), + ).length; + expect(countCommonEntries(baseJson)).toBe(1); + expect(countCommonEntries(optJson)).toBe(1); + }); + + it(`match results unchanged (${name})`, () => { + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: opts, + }); + for (const input of [ + "sing the song", + "play a track", + "hum that tune", + ]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + } +}); diff --git a/ts/packages/actionGrammar/test/grammarOptimizerSyntheticBenchmark.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerSyntheticBenchmark.spec.ts new file mode 100644 index 000000000..4c3587d13 --- /dev/null +++ b/ts/packages/actionGrammar/test/grammarOptimizerSyntheticBenchmark.spec.ts @@ -0,0 +1,231 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Synthetic optimizer benchmark — informational only. + * + * Constructs grammars whose structure is *designed* to exercise each + * optimization in isolation and at varying scale, so the impact of + * each pass is visible without the noise of a real agent grammar. + * + * Three benchmarks are produced: + * + * 1. Pass-through chain — N levels of ` = ; = ; …` + * Targets `inlineSingleAlternatives`. + * + * 2. Wide common-prefix — N alternatives that all start with the + * same long literal prefix and diverge in the last token. + * Targets `factorCommonPrefixes`. + * + * 3. Combined — Pass-through wrappers around a wide + * common-prefix block. Targets both passes together. + * + * Set TYPEAGENT_SKIP_BENCHMARKS=1 to skip. + */ + +import { + loadGrammarRulesNoThrow, + LoadGrammarRulesOptions, +} from "../src/grammarLoader.js"; +import { matchGrammar } from "../src/grammarMatcher.js"; +import { GrammarPart } from "../src/grammarTypes.js"; + +const ITERATIONS = 500; +const SHOULD_SKIP = process.env.TYPEAGENT_SKIP_BENCHMARKS === "1"; + +const CONFIGS: { name: string; opts: LoadGrammarRulesOptions }[] = [ + { name: "baseline", opts: {} }, + { + name: "inline", + opts: { optimizations: { inlineSingleAlternatives: true } }, + }, + { + name: "factor", + opts: { optimizations: { factorCommonPrefixes: true } }, + }, + { + name: "both", + opts: { + optimizations: { + inlineSingleAlternatives: true, + factorCommonPrefixes: true, + }, + }, + }, +]; + +function timeMs(fn: () => void, iterations: number): number { + const start = performance.now(); + for (let i = 0; i < iterations; i++) fn(); + return performance.now() - start; +} + +function countRulesParts( + grammar: ReturnType, +): number { + if (!grammar) return 0; + let count = 0; + const visit = (parts: GrammarPart[]) => { + for (const p of parts) { + if (p.type === "rules") { + count++; + for (const r of p.rules) visit(r.parts); + } + } + }; + for (const r of grammar.rules) visit(r.parts); + return count; +} + +function runBenchmark( + label: string, + grammarText: string, + requests: string[], +): void { + console.log(`\n=== ${label} ===`); + console.log( + `| config | RulesParts | match ms (${ITERATIONS}x) | speedup |`, + ); + console.log(`|-----------|-----------:|---------------:|--------:|`); + let baselineMs = 0; + for (const cfg of CONFIGS) { + const errors: string[] = []; + const grammar = loadGrammarRulesNoThrow( + "synthetic.grammar", + grammarText, + errors, + undefined, + cfg.opts, + ); + if (!grammar) { + console.log(`[error] ${cfg.name}: ${errors.join("; ")}`); + continue; + } + const partCount = countRulesParts(grammar); + try { + for (const r of requests) matchGrammar(grammar, r); + } catch (e) { + console.log( + `[error] ${cfg.name} match failed: ${(e as Error).message}`, + ); + continue; + } + const ms = timeMs(() => { + for (const r of requests) matchGrammar(grammar, r); + }, ITERATIONS); + if (cfg.name === "baseline") baselineMs = ms; + const speedup = baselineMs > 0 ? baselineMs / ms : 1; + console.log( + `| ${cfg.name.padEnd(9)} | ${String(partCount).padStart(10)} | ${ms.toFixed(1).padStart(14)} | ${speedup.toFixed(2).padStart(6)}x |`, + ); + } +} + +// ─── Synthetic grammar builders ──────────────────────────────────────────── + +/** + * Pass-through chain: ` = ; = ; …; = "target"`. + * Each `
  • ` adds one nested RulesPart with no other semantics — + * exactly the shape `inlineSingleAlternatives` collapses. + */ +function buildPassthroughChain(depth: number): string { + const lines: string[] = [` = ;`]; + for (let i = 0; i < depth; i++) { + lines.push(` = ;`); + } + lines.push(` = target word here;`); + return lines.join("\n"); +} + +/** + * Wide common prefix: N alternatives that share the same long literal + * prefix and differ only in the last word. + * + * = perform the action with item one + * | perform the action with item two + * | … + */ +function buildWideCommonPrefix(width: number): string { + const prefix = "perform the action with item"; + const alts: string[] = []; + for (let i = 0; i < width; i++) { + const word = `value${String.fromCharCode(97 + (i % 26))}${Math.floor( + i / 26, + )}`; + alts.push(`${prefix} ${word} -> "${word}"`); + } + return ` = ;\n = ${alts.join("\n | ")};`; +} + +/** + * Combined pattern: two layers of pass-through wrapping around a wide + * common-prefix block — exercises both passes together. + */ +function buildCombined(width: number): string { + // Rename in the wide-prefix grammar and wrap it + // in a chain of pass-through rules. Keep every line of the + // renamed inner grammar so is actually defined. + const inner = buildWideCommonPrefix(width).replace("", ""); + return [` = ;`, ` = ;`, ` = ;`, inner].join( + "\n", + ); +} + +describe("Grammar Optimizer - Synthetic Benchmarks", () => { + (SHOULD_SKIP ? it.skip : it)("pass-through chain (depth=8)", () => { + const grammarText = buildPassthroughChain(8); + runBenchmark(`pass-through chain (depth=8)`, grammarText, [ + "target word here", + "miss", + "target word", + "no match here", + ]); + expect(true).toBe(true); + }); + + (SHOULD_SKIP ? it.skip : it)("wide common prefix (width=20)", () => { + const grammarText = buildWideCommonPrefix(20); + // Mix of matching & non-matching requests. + const requests = [ + "perform the action with item valuea0", + "perform the action with item valuet0", + "perform the action with item nothere", + "perform the action with", + "noise input that does not match", + ]; + runBenchmark(`wide common prefix (width=20)`, grammarText, requests); + expect(true).toBe(true); + }); + + (SHOULD_SKIP ? it.skip : it)("wide common prefix (width=50)", () => { + const grammarText = buildWideCommonPrefix(50); + const requests = [ + "perform the action with item valuea0", + "perform the action with item valuex0", + "perform the action with item valuew1", + "perform the action with item nothere", + "noise input", + ]; + runBenchmark(`wide common prefix (width=50)`, grammarText, requests); + expect(true).toBe(true); + }); + + (SHOULD_SKIP ? it.skip : it)( + "combined (depth=4 wrappers, width=20 prefix)", + () => { + const grammarText = buildCombined(20); + const requests = [ + "perform the action with item valuea0", + "perform the action with item valuek0", + "perform the action with item nothere", + "noise", + ]; + runBenchmark( + `combined (depth=4 wrappers, width=20 prefix)`, + grammarText, + requests, + ); + expect(true).toBe(true); + }, + ); +}); From 903e718767a996683a2ebc9eafa22ed9887d9f47 Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Wed, 22 Apr 2026 13:10:21 -0700 Subject: [PATCH 02/16] actionGrammar: move optimizer benchmarks out of jest into standalone scripts - Move grammarOptimizerBenchmark and grammarOptimizerSyntheticBenchmark from test/*.spec.ts into src/bench/ as plain node entry points. - Extract shared CONFIGS, runBenchmark, timing, and colored-speedup helpers into src/bench/benchUtil.ts. - Color the speedup column with chalk: green when >1.10x, red when <0.90x, plain otherwise. - Add 'bench', 'bench:synthetic', and 'bench:real' npm scripts; exclude dist/bench from the published package. --- ts/packages/actionGrammar/package.json | 6 +- .../actionGrammar/src/bench/benchUtil.ts | 119 +++++++++ .../src/bench/grammarOptimizerBenchmark.ts | 90 +++++++ .../grammarOptimizerSyntheticBenchmark.ts | 125 ++++++++++ .../test/grammarOptimizerBenchmark.spec.ts | 190 -------------- ...grammarOptimizerSyntheticBenchmark.spec.ts | 231 ------------------ 6 files changed, 339 insertions(+), 422 deletions(-) create mode 100644 ts/packages/actionGrammar/src/bench/benchUtil.ts create mode 100644 ts/packages/actionGrammar/src/bench/grammarOptimizerBenchmark.ts create mode 100644 ts/packages/actionGrammar/src/bench/grammarOptimizerSyntheticBenchmark.ts delete mode 100644 ts/packages/actionGrammar/test/grammarOptimizerBenchmark.spec.ts delete mode 100644 ts/packages/actionGrammar/test/grammarOptimizerSyntheticBenchmark.spec.ts diff --git a/ts/packages/actionGrammar/package.json b/ts/packages/actionGrammar/package.json index 4ef40a9d5..68b027570 100644 --- a/ts/packages/actionGrammar/package.json +++ b/ts/packages/actionGrammar/package.json @@ -24,7 +24,8 @@ }, "files": [ "dist", - "!dist/test" + "!dist/test", + "!dist/bench" ], "scripts": { "build": "npm run tsc", @@ -36,6 +37,9 @@ "test:integration": "pnpm run jest-esm --testPathPattern=\"grammarGenerator.spec.js\"", "test:local": "pnpm run jest-esm --testPathPattern=\".*[.]spec[.]js\"", "test:local:debug": "node --inspect-brk --no-warnings --experimental-vm-modules ./node_modules/jest/bin/jest.js --testPathPattern=\".*\\.spec\\.js\" --testPathIgnorePatterns=\"grammarGenerator.spec.js\"", + "bench": "npm run bench:synthetic && npm run bench:real", + "bench:synthetic": "node ./dist/bench/grammarOptimizerSyntheticBenchmark.js", + "bench:real": "node ./dist/bench/grammarOptimizerBenchmark.js", "tsc": "tsc -b" }, "dependencies": { diff --git a/ts/packages/actionGrammar/src/bench/benchUtil.ts b/ts/packages/actionGrammar/src/bench/benchUtil.ts new file mode 100644 index 000000000..528a84c15 --- /dev/null +++ b/ts/packages/actionGrammar/src/bench/benchUtil.ts @@ -0,0 +1,119 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Shared helpers for grammar optimizer benchmarks. + */ + +import chalk from "chalk"; +import { + loadGrammarRulesNoThrow, + LoadGrammarRulesOptions, +} from "../grammarLoader.js"; +import { matchGrammar } from "../grammarMatcher.js"; +import { GrammarPart } from "../grammarTypes.js"; + +export const ITERATIONS = 500; + +export const CONFIGS: { name: string; opts: LoadGrammarRulesOptions }[] = [ + { name: "baseline", opts: {} }, + { + name: "inline", + opts: { optimizations: { inlineSingleAlternatives: true } }, + }, + { + name: "factor", + opts: { optimizations: { factorCommonPrefixes: true } }, + }, + { + name: "both", + opts: { + optimizations: { + inlineSingleAlternatives: true, + factorCommonPrefixes: true, + }, + }, + }, +]; + +// Speedup is colored once it moves more than 10% from baseline. +export function colorSpeedup(speedup: number): string { + const text = `${speedup.toFixed(2)}x`.padStart(6); + if (speedup > 1.1) return chalk.green(text); + if (speedup < 0.9) return chalk.red(text); + return text; +} + +export function timeMs(fn: () => void, iterations: number): number { + const start = performance.now(); + for (let i = 0; i < iterations; i++) fn(); + return performance.now() - start; +} + +export function countRulesParts( + grammar: ReturnType, +): number { + if (!grammar) return 0; + let count = 0; + const visit = (parts: GrammarPart[]) => { + for (const p of parts) { + if (p.type === "rules") { + count++; + for (const r of p.rules) visit(r.parts); + } + } + }; + for (const r of grammar.rules) visit(r.parts); + return count; +} + +/** + * Run all CONFIGS against the given grammar text and print a comparison + * table. `label` is the section heading; `grammarName` is passed to the + * loader (used in error messages). + */ +export function runBenchmark( + label: string, + grammarName: string, + grammarText: string, + requests: string[], +): void { + console.log(`\n=== ${label} ===`); + console.log( + `| config | RulesParts | match ms (${ITERATIONS}x) | speedup |`, + ); + console.log(`|-----------|-----------:|---------------:|--------:|`); + let baselineMs = 0; + for (const cfg of CONFIGS) { + const errors: string[] = []; + const grammar = loadGrammarRulesNoThrow( + grammarName, + grammarText, + errors, + undefined, + cfg.opts, + ); + if (!grammar) { + console.log(`[error] ${cfg.name}: ${errors.join("; ")}`); + continue; + } + const partCount = countRulesParts(grammar); + // Warm-up — also validates that the optimized grammar can run. + try { + for (const r of requests) matchGrammar(grammar, r); + } catch (e) { + console.log( + `[error] ${cfg.name} match failed: ${(e as Error).message}`, + ); + continue; + } + const ms = timeMs(() => { + for (const r of requests) matchGrammar(grammar, r); + }, ITERATIONS); + if (cfg.name === "baseline") baselineMs = ms; + const speedup = baselineMs > 0 ? baselineMs / ms : 1; + console.log( + `| ${cfg.name.padEnd(9)} | ${String(partCount).padStart(10)} | ${ms.toFixed(1).padStart(14)} | ${colorSpeedup(speedup)} |`, + ); + } +} diff --git a/ts/packages/actionGrammar/src/bench/grammarOptimizerBenchmark.ts b/ts/packages/actionGrammar/src/bench/grammarOptimizerBenchmark.ts new file mode 100644 index 000000000..71b4307c7 --- /dev/null +++ b/ts/packages/actionGrammar/src/bench/grammarOptimizerBenchmark.ts @@ -0,0 +1,90 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Optimizer benchmark — informational only. + * + * Measures matcher-time impact of each grammar optimization pass on real + * grammars (player, list, calendar, browser, ...). Each configuration is + * compared against the unoptimized baseline. + * + * Run with: `pnpm run bench:real` (from this package directory). + */ + +import * as path from "path"; +import * as fs from "fs"; +import { fileURLToPath } from "url"; +import { registerBuiltInEntities } from "../builtInEntities.js"; +import { runBenchmark } from "./benchUtil.js"; + +const __dirname = path.dirname(fileURLToPath(import.meta.url)); + +function fileExists(p: string): boolean { + try { + fs.accessSync(p, fs.constants.R_OK); + return true; + } catch { + return false; + } +} + +function benchmarkFile( + label: string, + grammarPath: string, + requests: string[], +): void { + if (!fileExists(grammarPath)) { + console.log(`[skip] ${label}: grammar not found at ${grammarPath}`); + return; + } + const content = fs.readFileSync(grammarPath, "utf-8"); + runBenchmark(label, path.basename(grammarPath), content, requests); +} + +function main(): void { + registerBuiltInEntities(); + + benchmarkFile( + "player", + path.resolve( + __dirname, + "../../../agents/player/src/agent/playerSchema.agr", + ), + [ + "pause", + "resume", + "play Shake It Off by Taylor Swift", + "select kitchen", + "set volume to 50", + "play the first track", + "skip to the next track", + "play some music", + ], + ); + + benchmarkFile( + "list", + path.resolve(__dirname, "../../../agents/list/src/listSchema.agr"), + [ + "add apples to grocery list", + "remove milk from grocery list", + "create list shopping", + "clear grocery list", + ], + ); + + benchmarkFile( + "calendar", + path.resolve( + __dirname, + "../../../agents/calendar/src/calendarSchema.agr", + ), + [ + "schedule meeting tomorrow at 3pm", + "cancel my 2pm meeting", + "show my calendar", + ], + ); +} + +main(); diff --git a/ts/packages/actionGrammar/src/bench/grammarOptimizerSyntheticBenchmark.ts b/ts/packages/actionGrammar/src/bench/grammarOptimizerSyntheticBenchmark.ts new file mode 100644 index 000000000..c7c82bde6 --- /dev/null +++ b/ts/packages/actionGrammar/src/bench/grammarOptimizerSyntheticBenchmark.ts @@ -0,0 +1,125 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Synthetic optimizer benchmark — informational only. + * + * Constructs grammars whose structure is *designed* to exercise each + * optimization in isolation and at varying scale, so the impact of + * each pass is visible without the noise of a real agent grammar. + * + * Three benchmarks are produced: + * + * 1. Pass-through chain — N levels of ` = ; = ; …` + * Targets `inlineSingleAlternatives`. + * + * 2. Wide common-prefix — N alternatives that all start with the + * same long literal prefix and diverge in the last token. + * Targets `factorCommonPrefixes`. + * + * 3. Combined — Pass-through wrappers around a wide + * common-prefix block. Targets both passes together. + * + * Run with: `pnpm run bench:synthetic` (from this package directory). + */ + +import { runBenchmark } from "./benchUtil.js"; + +// ─── Synthetic grammar builders ──────────────────────────────────────────── + +/** + * Pass-through chain: ` = ; = ; …; = "target"`. + * Each `
  • ` adds one nested RulesPart with no other semantics — + * exactly the shape `inlineSingleAlternatives` collapses. + */ +function buildPassthroughChain(depth: number): string { + const lines: string[] = [` = ;`]; + for (let i = 0; i < depth; i++) { + lines.push(` = ;`); + } + lines.push(` = target word here;`); + return lines.join("\n"); +} + +/** + * Wide common prefix: N alternatives that share the same long literal + * prefix and differ only in the last word. + * + * = perform the action with item one + * | perform the action with item two + * | … + */ +function buildWideCommonPrefix(width: number): string { + const prefix = "perform the action with item"; + const alts: string[] = []; + for (let i = 0; i < width; i++) { + const word = `value${String.fromCharCode(97 + (i % 26))}${Math.floor( + i / 26, + )}`; + alts.push(`${prefix} ${word} -> "${word}"`); + } + return ` = ;\n = ${alts.join("\n | ")};`; +} + +/** + * Combined pattern: two layers of pass-through wrapping around a wide + * common-prefix block — exercises both passes together. + */ +function buildCombined(width: number): string { + // Rename in the wide-prefix grammar and wrap it + // in a chain of pass-through rules. Keep every line of the + // renamed inner grammar so is actually defined. + const inner = buildWideCommonPrefix(width).replace("", ""); + return [` = ;`, ` = ;`, ` = ;`, inner].join( + "\n", + ); +} + +function main(): void { + runBenchmark( + `pass-through chain (depth=8)`, + "synthetic.grammar", + buildPassthroughChain(8), + ["target word here", "miss", "target word", "no match here"], + ); + + runBenchmark( + `wide common prefix (width=20)`, + "synthetic.grammar", + buildWideCommonPrefix(20), + [ + "perform the action with item valuea0", + "perform the action with item valuet0", + "perform the action with item nothere", + "perform the action with", + "noise input that does not match", + ], + ); + + runBenchmark( + `wide common prefix (width=50)`, + "synthetic.grammar", + buildWideCommonPrefix(50), + [ + "perform the action with item valuea0", + "perform the action with item valuex0", + "perform the action with item valuew1", + "perform the action with item nothere", + "noise input", + ], + ); + + runBenchmark( + `combined (depth=4 wrappers, width=20 prefix)`, + "synthetic.grammar", + buildCombined(20), + [ + "perform the action with item valuea0", + "perform the action with item valuek0", + "perform the action with item nothere", + "noise", + ], + ); +} + +main(); diff --git a/ts/packages/actionGrammar/test/grammarOptimizerBenchmark.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerBenchmark.spec.ts deleted file mode 100644 index 1c04c3033..000000000 --- a/ts/packages/actionGrammar/test/grammarOptimizerBenchmark.spec.ts +++ /dev/null @@ -1,190 +0,0 @@ -// Copyright (c) Microsoft Corporation. -// Licensed under the MIT License. - -/** - * Optimizer benchmark — informational only. - * - * Measures matcher-time impact of each grammar optimization pass on real - * grammars (player, list, calendar, browser, ...). Each configuration is - * compared against the unoptimized baseline. - * - * All assertions are informational. This spec runs as part of the normal - * test suite but produces no hard failures — only console output. - * - * To skip (e.g. on slow machines), set TYPEAGENT_SKIP_BENCHMARKS=1. - */ - -import * as path from "path"; -import * as fs from "fs"; -import { fileURLToPath } from "url"; -import { - loadGrammarRulesNoThrow, - LoadGrammarRulesOptions, -} from "../src/grammarLoader.js"; -import { matchGrammar } from "../src/grammarMatcher.js"; -import { registerBuiltInEntities } from "../src/builtInEntities.js"; - -const __filename = fileURLToPath(import.meta.url); -const __dirname = path.dirname(__filename); - -const ITERATIONS = 500; -const SHOULD_SKIP = process.env.TYPEAGENT_SKIP_BENCHMARKS === "1"; - -function fileExists(p: string): boolean { - try { - fs.accessSync(p, fs.constants.R_OK); - return true; - } catch { - return false; - } -} - -function timeMs(fn: () => void, iterations: number): number { - const start = performance.now(); - for (let i = 0; i < iterations; i++) fn(); - return performance.now() - start; -} - -function countRulesParts( - grammar: ReturnType, -): number { - if (!grammar) return 0; - let count = 0; - const visit = (parts: any[]) => { - for (const p of parts) { - if (p.type === "rules") { - count++; - for (const r of p.rules) visit(r.parts); - } - } - }; - for (const r of grammar.rules) visit(r.parts); - return count; -} - -function benchmark( - label: string, - grammarPath: string, - requests: string[], -): void { - if (!fileExists(grammarPath)) { - console.log(`[skip] ${label}: grammar not found at ${grammarPath}`); - return; - } - registerBuiltInEntities(); - const content = fs.readFileSync(grammarPath, "utf-8"); - const configs: { name: string; opts: LoadGrammarRulesOptions }[] = [ - { name: "baseline", opts: {} }, - { - name: "inline", - opts: { optimizations: { inlineSingleAlternatives: true } }, - }, - { - name: "factor", - opts: { optimizations: { factorCommonPrefixes: true } }, - }, - { - name: "both", - opts: { - optimizations: { - inlineSingleAlternatives: true, - factorCommonPrefixes: true, - }, - }, - }, - ]; - - console.log(`\n=== ${label} ===`); - console.log( - `| config | RulesParts | match ms (${ITERATIONS}x) | speedup |`, - ); - console.log(`|-----------|------------|----------------|---------|`); - - let baselineMs = 0; - for (const cfg of configs) { - const errors: string[] = []; - const g = loadGrammarRulesNoThrow( - path.basename(grammarPath), - content, - errors, - undefined, - cfg.opts, - ); - if (!g) { - console.log(`[error] ${cfg.name}: ${errors.join("; ")}`); - continue; - } - const partCount = countRulesParts(g); - // Warm-up — also validates that the optimized grammar can run. - try { - for (const r of requests) matchGrammar(g, r); - } catch (e) { - console.log( - `[error] ${cfg.name} match failed: ${(e as Error).message}`, - ); - continue; - } - // Timed. - const ms = timeMs(() => { - for (const r of requests) matchGrammar(g, r); - }, ITERATIONS); - if (cfg.name === "baseline") baselineMs = ms; - const speedup = baselineMs > 0 ? baselineMs / ms : 1; - console.log( - `| ${cfg.name.padEnd(9)} | ${String(partCount).padStart(10)} | ${ms.toFixed(1).padStart(14)} | ${speedup.toFixed(2)}x`, - ); - } -} - -describe("Grammar Optimizer Benchmark", () => { - (SHOULD_SKIP ? it.skip : it)("player", () => { - benchmark( - "player", - path.resolve( - __dirname, - "../../../agents/player/src/agent/playerSchema.agr", - ), - [ - "pause", - "resume", - "play Shake It Off by Taylor Swift", - "select kitchen", - "set volume to 50", - "play the first track", - "skip to the next track", - "play some music", - ], - ); - expect(true).toBe(true); - }); - - (SHOULD_SKIP ? it.skip : it)("list", () => { - benchmark( - "list", - path.resolve(__dirname, "../../../agents/list/src/listSchema.agr"), - [ - "add apples to grocery list", - "remove milk from grocery list", - "create list shopping", - "clear grocery list", - ], - ); - expect(true).toBe(true); - }); - - (SHOULD_SKIP ? it.skip : it)("calendar", () => { - benchmark( - "calendar", - path.resolve( - __dirname, - "../../../agents/calendar/src/calendarSchema.agr", - ), - [ - "schedule meeting tomorrow at 3pm", - "cancel my 2pm meeting", - "show my calendar", - ], - ); - expect(true).toBe(true); - }); -}); diff --git a/ts/packages/actionGrammar/test/grammarOptimizerSyntheticBenchmark.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerSyntheticBenchmark.spec.ts deleted file mode 100644 index 4c3587d13..000000000 --- a/ts/packages/actionGrammar/test/grammarOptimizerSyntheticBenchmark.spec.ts +++ /dev/null @@ -1,231 +0,0 @@ -// Copyright (c) Microsoft Corporation. -// Licensed under the MIT License. - -/** - * Synthetic optimizer benchmark — informational only. - * - * Constructs grammars whose structure is *designed* to exercise each - * optimization in isolation and at varying scale, so the impact of - * each pass is visible without the noise of a real agent grammar. - * - * Three benchmarks are produced: - * - * 1. Pass-through chain — N levels of ` = ; = ; …` - * Targets `inlineSingleAlternatives`. - * - * 2. Wide common-prefix — N alternatives that all start with the - * same long literal prefix and diverge in the last token. - * Targets `factorCommonPrefixes`. - * - * 3. Combined — Pass-through wrappers around a wide - * common-prefix block. Targets both passes together. - * - * Set TYPEAGENT_SKIP_BENCHMARKS=1 to skip. - */ - -import { - loadGrammarRulesNoThrow, - LoadGrammarRulesOptions, -} from "../src/grammarLoader.js"; -import { matchGrammar } from "../src/grammarMatcher.js"; -import { GrammarPart } from "../src/grammarTypes.js"; - -const ITERATIONS = 500; -const SHOULD_SKIP = process.env.TYPEAGENT_SKIP_BENCHMARKS === "1"; - -const CONFIGS: { name: string; opts: LoadGrammarRulesOptions }[] = [ - { name: "baseline", opts: {} }, - { - name: "inline", - opts: { optimizations: { inlineSingleAlternatives: true } }, - }, - { - name: "factor", - opts: { optimizations: { factorCommonPrefixes: true } }, - }, - { - name: "both", - opts: { - optimizations: { - inlineSingleAlternatives: true, - factorCommonPrefixes: true, - }, - }, - }, -]; - -function timeMs(fn: () => void, iterations: number): number { - const start = performance.now(); - for (let i = 0; i < iterations; i++) fn(); - return performance.now() - start; -} - -function countRulesParts( - grammar: ReturnType, -): number { - if (!grammar) return 0; - let count = 0; - const visit = (parts: GrammarPart[]) => { - for (const p of parts) { - if (p.type === "rules") { - count++; - for (const r of p.rules) visit(r.parts); - } - } - }; - for (const r of grammar.rules) visit(r.parts); - return count; -} - -function runBenchmark( - label: string, - grammarText: string, - requests: string[], -): void { - console.log(`\n=== ${label} ===`); - console.log( - `| config | RulesParts | match ms (${ITERATIONS}x) | speedup |`, - ); - console.log(`|-----------|-----------:|---------------:|--------:|`); - let baselineMs = 0; - for (const cfg of CONFIGS) { - const errors: string[] = []; - const grammar = loadGrammarRulesNoThrow( - "synthetic.grammar", - grammarText, - errors, - undefined, - cfg.opts, - ); - if (!grammar) { - console.log(`[error] ${cfg.name}: ${errors.join("; ")}`); - continue; - } - const partCount = countRulesParts(grammar); - try { - for (const r of requests) matchGrammar(grammar, r); - } catch (e) { - console.log( - `[error] ${cfg.name} match failed: ${(e as Error).message}`, - ); - continue; - } - const ms = timeMs(() => { - for (const r of requests) matchGrammar(grammar, r); - }, ITERATIONS); - if (cfg.name === "baseline") baselineMs = ms; - const speedup = baselineMs > 0 ? baselineMs / ms : 1; - console.log( - `| ${cfg.name.padEnd(9)} | ${String(partCount).padStart(10)} | ${ms.toFixed(1).padStart(14)} | ${speedup.toFixed(2).padStart(6)}x |`, - ); - } -} - -// ─── Synthetic grammar builders ──────────────────────────────────────────── - -/** - * Pass-through chain: ` = ; = ; …; = "target"`. - * Each `
  • ` adds one nested RulesPart with no other semantics — - * exactly the shape `inlineSingleAlternatives` collapses. - */ -function buildPassthroughChain(depth: number): string { - const lines: string[] = [` = ;`]; - for (let i = 0; i < depth; i++) { - lines.push(` = ;`); - } - lines.push(` = target word here;`); - return lines.join("\n"); -} - -/** - * Wide common prefix: N alternatives that share the same long literal - * prefix and differ only in the last word. - * - * = perform the action with item one - * | perform the action with item two - * | … - */ -function buildWideCommonPrefix(width: number): string { - const prefix = "perform the action with item"; - const alts: string[] = []; - for (let i = 0; i < width; i++) { - const word = `value${String.fromCharCode(97 + (i % 26))}${Math.floor( - i / 26, - )}`; - alts.push(`${prefix} ${word} -> "${word}"`); - } - return ` = ;\n = ${alts.join("\n | ")};`; -} - -/** - * Combined pattern: two layers of pass-through wrapping around a wide - * common-prefix block — exercises both passes together. - */ -function buildCombined(width: number): string { - // Rename in the wide-prefix grammar and wrap it - // in a chain of pass-through rules. Keep every line of the - // renamed inner grammar so is actually defined. - const inner = buildWideCommonPrefix(width).replace("", ""); - return [` = ;`, ` = ;`, ` = ;`, inner].join( - "\n", - ); -} - -describe("Grammar Optimizer - Synthetic Benchmarks", () => { - (SHOULD_SKIP ? it.skip : it)("pass-through chain (depth=8)", () => { - const grammarText = buildPassthroughChain(8); - runBenchmark(`pass-through chain (depth=8)`, grammarText, [ - "target word here", - "miss", - "target word", - "no match here", - ]); - expect(true).toBe(true); - }); - - (SHOULD_SKIP ? it.skip : it)("wide common prefix (width=20)", () => { - const grammarText = buildWideCommonPrefix(20); - // Mix of matching & non-matching requests. - const requests = [ - "perform the action with item valuea0", - "perform the action with item valuet0", - "perform the action with item nothere", - "perform the action with", - "noise input that does not match", - ]; - runBenchmark(`wide common prefix (width=20)`, grammarText, requests); - expect(true).toBe(true); - }); - - (SHOULD_SKIP ? it.skip : it)("wide common prefix (width=50)", () => { - const grammarText = buildWideCommonPrefix(50); - const requests = [ - "perform the action with item valuea0", - "perform the action with item valuex0", - "perform the action with item valuew1", - "perform the action with item nothere", - "noise input", - ]; - runBenchmark(`wide common prefix (width=50)`, grammarText, requests); - expect(true).toBe(true); - }); - - (SHOULD_SKIP ? it.skip : it)( - "combined (depth=4 wrappers, width=20 prefix)", - () => { - const grammarText = buildCombined(20); - const requests = [ - "perform the action with item valuea0", - "perform the action with item valuek0", - "perform the action with item nothere", - "noise", - ]; - runBenchmark( - `combined (depth=4 wrappers, width=20 prefix)`, - grammarText, - requests, - ); - expect(true).toBe(true); - }, - ); -}); From e78b24af354385ed20165436a133d58774878b96 Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Wed, 22 Apr 2026 13:58:06 -0700 Subject: [PATCH 03/16] Skip inlining shared single-alternative rules in grammar optimizer The inliner spreads a child rule's parts into the parent at the call site. When the child's GrammarRule[] array was referenced from multiple RulesParts (named-rule sharing established by the compiler), inlining at each site duplicated the child's parts, defeating the serializer's identity-based dedup and bloating .ag.json output proportional to the reference count. Add a one-time reference-count pre-pass over the input AST and refuse to inline any RulesPart whose body has more than one incoming reference. Single-reference single-alternative rules continue to be inlined. --- .../actionGrammar/src/grammarOptimizer.ts | 58 ++++++++++++- .../test/grammarOptimizerSharing.spec.ts | 87 ++++++++++++++++++- 2 files changed, 139 insertions(+), 6 deletions(-) diff --git a/ts/packages/actionGrammar/src/grammarOptimizer.ts b/ts/packages/actionGrammar/src/grammarOptimizer.ts index 54e467b0f..608a8fcfe 100644 --- a/ts/packages/actionGrammar/src/grammarOptimizer.ts +++ b/ts/packages/actionGrammar/src/grammarOptimizer.ts @@ -80,7 +80,13 @@ export function inlineSingleAlternativeRules( ): GrammarRule[] { const counter = { inlined: 0 }; const memo: RulesArrayMemo = new Map(); - const result = inlineRulesArray(rules, counter, memo); + // Reference count over the input AST: how many `RulesPart`s point at + // each `GrammarRule[]` array. Used to refuse inlining a shared + // array, which would otherwise duplicate the child's parts at every + // call site and bloat the serialized grammar (the serializer dedups + // by array identity). + const refCounts = countRulesArrayRefs(rules); + const result = inlineRulesArray(rules, counter, memo, refCounts); if (counter.inlined > 0) { debug(`inlined ${counter.inlined} single-alternative RulesParts`); } @@ -89,16 +95,41 @@ export function inlineSingleAlternativeRules( type RulesArrayMemo = Map; +/** + * Count how many `RulesPart` references each `GrammarRule[]` array has + * across the AST reachable from `rules`. The top-level array itself is + * counted as 1 (treated as if held by an implicit root reference) so + * single-alternative top-level rules are also protected from inlining + * if shared. Recurses each unique array exactly once via `visited`. + */ +function countRulesArrayRefs(rules: GrammarRule[]): Map { + const counts = new Map(); + const visited = new Set(); + function walk(arr: GrammarRule[]) { + counts.set(arr, (counts.get(arr) ?? 0) + 1); + if (visited.has(arr)) return; + visited.add(arr); + for (const r of arr) { + for (const p of r.parts) { + if (p.type === "rules") walk(p.rules); + } + } + } + walk(rules); + return counts; +} + function inlineRulesArray( rules: GrammarRule[], counter: { inlined: number }, memo: RulesArrayMemo, + refCounts: Map, ): GrammarRule[] { const cached = memo.get(rules); if (cached !== undefined) return cached; // Reserve the slot before recursing so cycles (if any) terminate. memo.set(rules, rules); - const next = rules.map((r) => inlineRule(r, counter, memo)); + const next = rules.map((r) => inlineRule(r, counter, memo, refCounts)); const changed = next.some((r, i) => r !== rules[i]); const result = changed ? next : rules; memo.set(rules, result); @@ -109,12 +140,14 @@ function inlineRule( rule: GrammarRule, counter: { inlined: number }, memo: RulesArrayMemo, + refCounts: Map, ): GrammarRule { const { parts, changed, valueSubstitutions, valueAssignment } = inlineParts( rule.parts, rule, counter, memo, + refCounts, ); if (!changed) { return rule; @@ -162,6 +195,7 @@ function inlineParts( parentRule: GrammarRule, counter: { inlined: number }, memo: RulesArrayMemo, + refCounts: Map, ): { parts: GrammarPart[]; changed: boolean; @@ -179,11 +213,27 @@ function inlineParts( } // Recurse into nested rules first (post-order), preserving // shared-array identity via memo. - const inlinedRules = inlineRulesArray(p.rules, counter, memo); + const inlinedRules = inlineRulesArray( + p.rules, + counter, + memo, + refCounts, + ); const rewritten: RulesPart = inlinedRules !== p.rules ? { ...p, rules: inlinedRules } : p; - const replacement = tryInlineRulesPart(rewritten, parentRule); + // Refuse to inline a RulesPart whose body is shared by more than + // one reference: inlining duplicates the child's parts at the + // call site, but the original array is still referenced from the + // other call sites — net effect is N copies in the serialized + // grammar instead of 1 dedup'd entry. Reference counts come + // from the *input* AST; the rewritten array shares identity with + // it via the memo when no nested change occurred, and otherwise + // is unique to this site (so inlining is safe). + const shared = (refCounts.get(p.rules) ?? 1) > 1; + const replacement = shared + ? undefined + : tryInlineRulesPart(rewritten, parentRule); if (replacement !== undefined) { counter.inlined++; changed = true; diff --git a/ts/packages/actionGrammar/test/grammarOptimizerSharing.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerSharing.spec.ts index 3bfac02da..42c0bce11 100644 --- a/ts/packages/actionGrammar/test/grammarOptimizerSharing.spec.ts +++ b/ts/packages/actionGrammar/test/grammarOptimizerSharing.spec.ts @@ -56,8 +56,7 @@ describe("Grammar Optimizer - Shared rule identity preservation", () => { r.parts.length === 1 && r.parts[0].type === "string" && ["the song", "a track", "that tune"].some( - (s) => - (r.parts[0] as any).value.join(" ") === s, + (s) => (r.parts[0] as any).value.join(" ") === s, ), ), ) @@ -136,3 +135,87 @@ describe("Grammar Optimizer - Shared rule identity preservation", () => { }); } }); + +describe("Grammar Optimizer - Shared single-alternative rule is not inlined", () => { + // has a single alternative AND is referenced from multiple + // call sites. Inlining it would duplicate "the song" at every call + // site in the serialized JSON; the optimizer must refuse based on + // the input reference count. + const text = ` = | | ; + = sing $(x:) -> x; + = play $(x:) -> x; + = hum $(x:) -> x; + = the song -> "song";`; + + function innerRulesArrays( + grammar: ReturnType, + ): GrammarRule[][] { + return findAllRulesParts(grammar.rules) + .filter( + (p) => + p.rules.length === 1 && + p.rules[0].parts.length === 1 && + p.rules[0].parts[0].type === "string" && + (p.rules[0].parts[0] as any).value.join(" ") === "the song", + ) + .map((p) => p.rules); + } + + it("inliner preserves shared array identity", () => { + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + const arrays = innerRulesArrays(optimized); + expect(arrays.length).toBeGreaterThanOrEqual(3); + for (let i = 1; i < arrays.length; i++) { + expect(arrays[i]).toBe(arrays[0]); + } + }); + + it("serialized output dedupes shared single-alt rule", () => { + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + const json = grammarToJson(optimized); + // Exactly one GrammarRulesJson entry should hold the "the song" + // body (the shared ). + const entries = json.filter( + (entry) => + Array.isArray(entry) && + entry.length === 1 && + entry[0].parts?.length === 1 && + entry[0].parts[0].type === "string" && + (entry[0].parts[0] as any).value.join(" ") === "the song", + ); + expect(entries.length).toBe(1); + }); + + it("still inlines a single-alternative rule referenced only once", () => { + const single = ` = sing $(x:) -> x; + = the song -> "song";`; + const baseline = loadGrammarRules("t.grammar", single); + const optimized = loadGrammarRules("t.grammar", single, { + optimizations: { inlineSingleAlternatives: true }, + }); + // The single reference should be inlined → fewer RulesParts. + const baseCount = findAllRulesParts(baseline.rules).length; + const optCount = findAllRulesParts(optimized.rules).length; + expect(optCount).toBeLessThan(baseCount); + }); + + it("match results unchanged for shared single-alt ", () => { + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + for (const input of [ + "sing the song", + "play the song", + "hum the song", + ]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); +}); From b2e1dc431e1784d309dcb132983ae470415e0c13 Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Wed, 22 Apr 2026 14:19:11 -0700 Subject: [PATCH 04/16] Factor common prefixes across top-level rules in grammar optimizer MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The matcher treats top-level alternatives the same way it treats inner RulesPart alternatives — each is queued as its own MatchState and produces its own GrammarMatchResult. So factoring shared prefixes across top-level rules is semantically safe (cardinality and per-rule values are preserved via the existing __opt_factor capture). After nested factoring completes, wrap the top-level Grammar.rules in a synthetic RulesPart and reuse factorRulesPart with the same fixed-point iteration used for nested groups. This destroys the 1:1 correspondence between top-level rule indices and the original source. That mapping must be recovered via separate metadata if anything downstream depends on it. Also fuse the map+some pass in inlineRulesArray and factorRulesArray into a single loop that only allocates a new array once an element actually changes. --- .../actionGrammar/src/grammarOptimizer.ts | 61 ++++++++++++++++--- .../test/grammarOptimizerFactoring.spec.ts | 26 ++++++++ 2 files changed, 77 insertions(+), 10 deletions(-) diff --git a/ts/packages/actionGrammar/src/grammarOptimizer.ts b/ts/packages/actionGrammar/src/grammarOptimizer.ts index 608a8fcfe..7efac104a 100644 --- a/ts/packages/actionGrammar/src/grammarOptimizer.ts +++ b/ts/packages/actionGrammar/src/grammarOptimizer.ts @@ -129,9 +129,20 @@ function inlineRulesArray( if (cached !== undefined) return cached; // Reserve the slot before recursing so cycles (if any) terminate. memo.set(rules, rules); - const next = rules.map((r) => inlineRule(r, counter, memo, refCounts)); - const changed = next.some((r, i) => r !== rules[i]); - const result = changed ? next : rules; + // Single-pass: only allocate `next` once an element actually changes, + // then back-fill prior unchanged entries. Avoids the wasted map+some + // walk when no rule in this array is rewritten. + let next: GrammarRule[] | undefined; + for (let i = 0; i < rules.length; i++) { + const r = inlineRule(rules[i], counter, memo, refCounts); + if (next !== undefined) { + next.push(r); + } else if (r !== rules[i]) { + next = rules.slice(0, i); + next.push(r); + } + } + const result = next ?? rules; memo.set(rules, result); return result; } @@ -435,9 +446,14 @@ function findExistingVariable( /** * Walk all RulesParts and factor common leading parts shared by two or - * more alternatives within the same RulesPart. The top-level - * Grammar.rules array is not factored because each top-level alternative - * is reported separately by the matcher. + * more alternatives within the same RulesPart. After nested factoring + * completes, the top-level `Grammar.rules` array is also factored against + * itself: the matcher treats top-level alternatives the same way it + * treats inner `RulesPart` alternatives (each is queued as its own + * `MatchState` and produces its own result), so factoring is semantically + * safe. This intentionally destroys the 1:1 correspondence between + * top-level rule indices and the original source — that mapping must be + * recovered via separate metadata if needed downstream. * * Uses an identity memo over `GrammarRule[]` arrays so shared named * rules (multiple `RulesPart`s pointing at the same array) still share @@ -446,7 +462,22 @@ function findExistingVariable( export function factorCommonPrefixes(rules: GrammarRule[]): GrammarRule[] { const counter = { factored: 0 }; const memo: RulesArrayMemo = new Map(); - const result = factorRulesArray(rules, counter, memo); + let result = factorRulesArray(rules, counter, memo); + + // Top-level factoring: wrap the (already nested-factored) top-level + // rules in a synthetic `RulesPart` so we can reuse `factorRulesPart` + // unchanged. Iterate to a fixed point exactly like `factorParts` + // does for nested groups. Newly synthesized suffix `RulesPart`s + // produced here are not themselves re-walked, matching the existing + // behavior for nested factoring. + let working: RulesPart = { type: "rules", rules: result }; + for (let i = 0; i < 8; i++) { + const next = factorRulesPart(working, counter); + if (next === working) break; + working = next; + } + result = working.rules; + if (counter.factored > 0) { debug(`factored ${counter.factored} common prefix groups`); } @@ -461,9 +492,19 @@ function factorRulesArray( const cached = memo.get(rules); if (cached !== undefined) return cached; memo.set(rules, rules); - const next = rules.map((r) => factorRule(r, counter, memo)); - const changed = next.some((r, i) => r !== rules[i]); - const result = changed ? next : rules; + // Single-pass: only allocate `next` once an element actually changes + // (see inlineRulesArray for rationale). + let next: GrammarRule[] | undefined; + for (let i = 0; i < rules.length; i++) { + const r = factorRule(rules[i], counter, memo); + if (next !== undefined) { + next.push(r); + } else if (r !== rules[i]) { + next = rules.slice(0, i); + next.push(r); + } + } + const result = next ?? rules; memo.set(rules, result); return result; } diff --git a/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts index 517a2c019..16822c140 100644 --- a/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts +++ b/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts @@ -117,4 +117,30 @@ describe("Grammar Optimizer - Common prefix factoring", () => { match(baseline, "a x a y"), ); }); + + it("factors common prefixes across top-level rules", () => { + // Three top-level alternatives all share "play the ". + // Top-level factoring should reduce the rule count and preserve + // match results. + const text = ` = play the song -> "song" + | play the track -> "track" + | play the album -> "album";`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + // Factoring collapses the 3 top-level alternatives into 1 + // (a shared-prefix rule with a 3-alternative suffix RulesPart). + expect(optimized.rules.length).toBeLessThan(baseline.rules.length); + for (const input of [ + "play the song", + "play the track", + "play the album", + ]) { + const baseRes = match(baseline, input); + const optRes = match(optimized, input); + expect(optRes.length).toBe(baseRes.length); + expect(optRes).toEqual(expect.arrayContaining(baseRes)); + } + }); }); From d14cf8b9711bc69cd905328f99ee082ad3ab0869 Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Wed, 22 Apr 2026 14:35:37 -0700 Subject: [PATCH 05/16] Drop unused fixed-point loop in factorCommonPrefixes The loop existed defensively but the current grouping/intersection logic converges in a single pass: groups are seeded from rules that pairwise fail the share check, so each group's canonical prefix can't share with any other group's prefix on a second pass. Newly synthesized suffix RulesParts are intentionally not re-walked, and single-rule wrappers hit the early return. Call factorRulesPart once at each site; remove factorRulesPartToFixedPoint. --- .../actionGrammar/src/grammarOptimizer.ts | 27 +++++-------------- 1 file changed, 7 insertions(+), 20 deletions(-) diff --git a/ts/packages/actionGrammar/src/grammarOptimizer.ts b/ts/packages/actionGrammar/src/grammarOptimizer.ts index 7efac104a..5e4c4fea0 100644 --- a/ts/packages/actionGrammar/src/grammarOptimizer.ts +++ b/ts/packages/actionGrammar/src/grammarOptimizer.ts @@ -466,17 +466,11 @@ export function factorCommonPrefixes(rules: GrammarRule[]): GrammarRule[] { // Top-level factoring: wrap the (already nested-factored) top-level // rules in a synthetic `RulesPart` so we can reuse `factorRulesPart` - // unchanged. Iterate to a fixed point exactly like `factorParts` - // does for nested groups. Newly synthesized suffix `RulesPart`s - // produced here are not themselves re-walked, matching the existing - // behavior for nested factoring. - let working: RulesPart = { type: "rules", rules: result }; - for (let i = 0; i < 8; i++) { - const next = factorRulesPart(working, counter); - if (next === working) break; - working = next; - } - result = working.rules; + // unchanged. Newly synthesized suffix `RulesPart`s produced here are + // not themselves re-walked, matching the existing behavior for nested + // factoring. + const wrapper: RulesPart = { type: "rules", rules: result }; + result = factorRulesPart(wrapper, counter).rules; if (counter.factored > 0) { debug(`factored ${counter.factored} common prefix groups`); @@ -534,17 +528,10 @@ function factorParts( // Recurse into nested rules first, preserving shared-array // identity via memo. const recursedRules = factorRulesArray(p.rules, counter, memo); - let working: RulesPart = + const recursed: RulesPart = recursedRules !== p.rules ? { ...p, rules: recursedRules } : p; - // Factor with bounded iteration to fixed point. Newly produced - // suffix `RulesPart`s are not shared by construction, so they - // don't need memo entries. - for (let i = 0; i < 8; i++) { - const next = factorRulesPart(working, counter); - if (next === working) break; - working = next; - } + const working = factorRulesPart(recursed, counter); if (working !== p) changed = true; out.push(working); } From dda9ea7abd026da01121671b65053c70ac4c63b6 Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Wed, 22 Apr 2026 17:05:34 -0700 Subject: [PATCH 06/16] actionGrammar: rewrite factorRulesPart as trie + post-order emission Replaces the pairwise-shape-comparison factoring with an incremental trie: rules are inserted edge-by-edge (per-token for StringPart; conservative match by typeName / array identity / matcherName for other parts), and the trie is then walked post-order to emit factored rules. Variables on wildcard / number / rules edges are canonicalized to the first inserter's name, with later inserters' value expressions remapped at emission. Behavioral notes: - Suffix sub-prefixes are now factored (e.g. 'play song x | play song y | play album z' factors both 'play' and inner 'song'). - A failed eligibility check at one fork now triggers a local bailout (members are flattened with their prefix prepended) instead of refusing to factor the whole group, allowing factoring above the failing fork. - Strengthened the wholly-consumed check to refuse empty-parts members regardless of value-presence (the matcher's default-value resolver cannot handle 'parts:[], value: undefined' inside a wrapped RulesPart). Tests: existing 90 optimizer tests pass; added 10 new tests covering the suffix-sharing win and 9 risk categories (cross-scope bailout, RulesPart array-identity preservation, strict-prefix overlap with mixed values, multi-level factoring, variable canonicalization including object-shorthand, interleaved match-order preservation, nested wrapper-variable scoping). --- .../actionGrammar/src/grammarOptimizer.ts | 738 ++++++++++-------- .../test/grammarOptimizerFactoring.spec.ts | 39 + .../test/grammarOptimizerTrieRisks.spec.ts | 235 ++++++ 3 files changed, 665 insertions(+), 347 deletions(-) create mode 100644 ts/packages/actionGrammar/test/grammarOptimizerTrieRisks.spec.ts diff --git a/ts/packages/actionGrammar/src/grammarOptimizer.ts b/ts/packages/actionGrammar/src/grammarOptimizer.ts index 5e4c4fea0..1b2a806a8 100644 --- a/ts/packages/actionGrammar/src/grammarOptimizer.ts +++ b/ts/packages/actionGrammar/src/grammarOptimizer.ts @@ -4,6 +4,7 @@ import registerDebug from "debug"; import { CompiledObjectElement, + CompiledSpacingMode, CompiledValueNode, Grammar, GrammarPart, @@ -539,8 +540,33 @@ function factorParts( } /** - * One pass of common-prefix factoring inside a single RulesPart. - * Returns the same object if nothing changed. + * Common-prefix factoring inside a single RulesPart, implemented as a + * trie-build + post-order emission. + * + * Each rule is inserted as a sequence of "atomic" steps: + * - StringPart explodes into one ("string", token) edge per token in + * `value[]` (so `["play", "song"]` and `["play", "album"]` share the + * "play" edge but branch at "song"/"album"); + * - VarStringPart, VarNumberPart, RulesPart, PhraseSetPart each yield + * one edge. RulesPart edges key by `rules` array identity so that + * two `` references share the same edge — preserving the + * dedup invariant `grammarSerializer.ts` relies on. + * + * Variables on wildcard/number/rules edges are carried by the first + * inserter; later inserters with different names accumulate a per-rule + * remap (local→canonical) that is applied to the terminal's `value` at + * emission time. + * + * Emission walks the trie post-order: single-child / no-terminal chains + * are path-compressed back into a flat parts array (with adjacent + * StringParts re-merged), and multi-member nodes become wrapper + * `RulesPart`s. Per-fork eligibility checks are applied at each wrapper + * site; failure causes a *local* bailout — the would-be members are + * emitted as separate full rules with the canonical prefix prepended, + * losing factoring at that fork only (factoring above and below the + * fork still applies). + * + * Returns the same object if no factoring took place. */ function factorRulesPart( part: RulesPart, @@ -551,348 +577,407 @@ function factorRulesPart( // such groups untouched to stay safe. return part; } - const rules = part.rules; - if (rules.length < 2) return part; - - // Group alternatives that share at least one leading part (or at - // least one leading string token) with the group's lead alternative. - // Preserve original ordering. - const groups: { members: number[] }[] = []; - const consumed = new Set(); - for (let i = 0; i < rules.length; i++) { - if (consumed.has(i)) continue; - const group: { members: number[] } = { members: [i] }; - consumed.add(i); - for (let j = i + 1; j < rules.length; j++) { - if (consumed.has(j)) continue; - const sp = sharedPrefixShape(rules[i], rules[j]); - if (sp.fullParts > 0 || sp.stringTokens > 0) { - group.members.push(j); - consumed.add(j); - } - } - groups.push(group); - } + if (part.rules.length < 2) return part; - if (groups.every((g) => g.members.length < 2)) return part; + const root: TrieNode = { children: [], terminals: [], firstIdx: 0 }; + for (let i = 0; i < part.rules.length; i++) { + insertRuleIntoTrie(root, part.rules[i], i); + } + const state: EmitState = { didFactor: false }; + const items: { idx: number; rules: GrammarRule[] }[] = []; + for (const c of root.children) { + items.push({ idx: c.firstIdx, rules: emitFromNode(c, state) }); + } + items.sort((a, b) => a.idx - b.idx); const newRules: GrammarRule[] = []; - let didFactor = false; - for (const g of groups) { - if (g.members.length < 2) { - newRules.push(rules[g.members[0]]); - continue; - } - const members = g.members.map((i) => rules[i]); - // Intersect prefix shapes across all members (using member[0] as - // canonical reference). - let shape: PrefixShape = { - fullParts: members[0].parts.length, - stringTokens: 0, - }; - for (let mi = 1; mi < members.length; mi++) { - const s = sharedPrefixShape(members[0], members[mi]); - if (s.fullParts < shape.fullParts) { - shape = { - fullParts: s.fullParts, - stringTokens: s.stringTokens, - }; - } else if ( - s.fullParts === shape.fullParts && - s.stringTokens < shape.stringTokens - ) { - shape = { - fullParts: s.fullParts, - stringTokens: s.stringTokens, - }; - } - } - if (shape.fullParts === 0 && shape.stringTokens === 0) { - for (const m of members) newRules.push(m); - continue; - } + for (const it of items) newRules.push(...it.rules); - // Refuse to factor if any alternative would be wholly consumed by - // the shared prefix AND has a value expression — the suffix - // alternative would become empty-parts. - const wholeConsumed = (m: GrammarRule): boolean => { - if ( - m.parts.length !== - shape.fullParts + (shape.stringTokens > 0 ? 1 : 0) - ) { - return false; - } - if (shape.stringTokens === 0) { - return m.parts.length === shape.fullParts; - } - const last = m.parts[shape.fullParts]; - return ( - last.type === "string" && - last.value.length === shape.stringTokens - ); - }; - if (members.some((m) => wholeConsumed(m) && m.value !== undefined)) { - for (const m of members) newRules.push(m); - continue; - } + if (!state.didFactor) return part; + counter.factored++; + return { ...part, rules: newRules }; +} - // Build canonical prefix parts. - const canonicalParts: GrammarPart[] = members[0].parts.slice( - 0, - shape.fullParts, - ); - if (shape.stringTokens > 0) { - const lead = members[0].parts[shape.fullParts]; - if (lead.type !== "string") { - // Shouldn't happen (shape guarantees string), bail safely. - for (const m of members) newRules.push(m); - continue; - } - canonicalParts.push({ - type: "string", - value: lead.value.slice(0, shape.stringTokens), - }); - } - const canonicalNames = collectVariableNames(canonicalParts); - - // Build per-member variable remap from member-local prefix names - // to canonical names taken from the lead alternative. Only the - // full-parts range carries variables (partial string tokens have - // no variable bindings). - const memberRemaps: Map[] = members.map((m) => - buildPrefixRemap(canonicalParts, m.parts, shape.fullParts), - ); +// ── Trie data structures ───────────────────────────────────────────────── - // Compute per-member suffix parts, splitting the partial - // StringPart if needed. - const memberSuffixParts: GrammarPart[][] = members.map((m) => { - if (shape.stringTokens === 0) { - return m.parts.slice(shape.fullParts); - } - const lead = m.parts[shape.fullParts]; - if (lead.type !== "string") { - return m.parts.slice(shape.fullParts); // defensive - } - const remaining = lead.value.slice(shape.stringTokens); - const rest = m.parts.slice(shape.fullParts + 1); - if (remaining.length === 0) { - return rest; - } - return [ - { type: "string", value: remaining } as GrammarPart, - ...rest, - ]; - }); - - // Verify suffix bindings won't shadow shared canonical names. - let collision = false; - for (let mi = 0; mi < members.length && !collision; mi++) { - const suffixVars = collectVariableNames(memberSuffixParts[mi]); - const remap = memberRemaps[mi]; - for (const v of suffixVars) { - const renamed = remap.get(v) ?? v; - if (canonicalNames.has(renamed)) { - collision = true; - break; - } - } - } - if (collision) { - for (const m of members) newRules.push(m); - continue; - } +type TrieEdge = + | { kind: "string"; token: string } + | { + kind: "wildcard"; + typeName: string; + optional: boolean; + canonicalVariable: string; + } + | { kind: "number"; optional: boolean; canonicalVariable: string } + | { + kind: "rules"; + rules: GrammarRule[]; + optional: boolean; + repeat: boolean; + name: string | undefined; + canonicalVariable: string | undefined; + } + | { kind: "phraseSet"; matcherName: string }; - // Refuse to factor when any member's value expression references - // a variable bound in the shared prefix. The matcher scopes - // value variables per nested rule, so the suffix's value cannot - // see canonical-prefix bindings — factoring would break match - // results. - let crossScopeRef = false; - for (let mi = 0; mi < members.length && !crossScopeRef; mi++) { - const m = members[mi]; - if (m.value === undefined) continue; - const remap = memberRemaps[mi]; - const referenced = collectVariableReferences(m.value); - for (const v of referenced) { - const renamed = remap.get(v) ?? v; - if (canonicalNames.has(renamed)) { - crossScopeRef = true; - break; - } +type Terminal = { + idx: number; + value: CompiledValueNode | undefined; + spacingMode: CompiledSpacingMode | undefined; + /** local→canonical variable rename accumulated along the path. */ + remap: Map; +}; + +type TrieNode = { + /** undefined only at the root. */ + edge?: TrieEdge; + children: TrieNode[]; + terminals: Terminal[]; + /** Lowest insertion index of any rule passing through this node. */ + firstIdx: number; +}; + +type EmitState = { didFactor: boolean }; + +/** A node is "linear" iff it has no terminals and exactly one child. */ +function isLinearNode(n: TrieNode): boolean { + return n.terminals.length === 0 && n.children.length === 1; +} + +// ── Trie insertion ─────────────────────────────────────────────────────── + +function insertRuleIntoTrie( + root: TrieNode, + rule: GrammarRule, + idx: number, +): void { + let node = root; + const remap = new Map(); + for (const stepEdge of partsToEdgeSteps(rule.parts)) { + let matched: TrieNode | undefined; + for (const c of node.children) { + if (c.edge && edgeKeyMatches(c.edge, stepEdge)) { + matched = c; + break; } } - if (crossScopeRef) { - for (const m of members) newRules.push(m); - continue; + if (matched !== undefined) { + collectStepRemap(matched.edge!, stepEdge, remap); + node = matched; + } else { + const newChild: TrieNode = { + edge: stepEdge, + children: [], + terminals: [], + firstIdx: idx, + }; + node.children.push(newChild); + node = newChild; } + } + node.terminals.push({ + idx, + value: rule.value, + spacingMode: rule.spacingMode, + remap, + }); +} - // Refuse to factor when value-presence pattern is mixed across - // members. Mixing explicit-value and implicit-default alternatives - // inside a new wrapper rule changes the matcher's default-value - // semantics for the implicit cases. - const valuePresence = members.map((m) => m.value !== undefined); - const allHaveValue = valuePresence.every((v) => v); - const noneHaveValue = valuePresence.every((v) => !v); - if (!allHaveValue && !noneHaveValue) { - for (const m of members) newRules.push(m); - continue; +/** Yield each rule.parts as a sequence of trie edges (StringPart explodes). */ +function* partsToEdgeSteps(parts: GrammarPart[]): Generator { + for (const p of parts) { + switch (p.type) { + case "string": + for (const tok of p.value) yield { kind: "string", token: tok }; + break; + case "wildcard": + yield { + kind: "wildcard", + typeName: p.typeName, + optional: !!p.optional, + canonicalVariable: p.variable, + }; + break; + case "number": + yield { + kind: "number", + optional: !!p.optional, + canonicalVariable: p.variable, + }; + break; + case "rules": + yield { + kind: "rules", + rules: p.rules, + optional: !!p.optional, + repeat: !!p.repeat, + name: p.name, + canonicalVariable: p.variable, + }; + break; + case "phraseSet": + yield { kind: "phraseSet", matcherName: p.matcherName }; + break; } + } +} - // Refuse to factor when (no member has explicit value) and any - // suffix would end up with a multi-part shape: the matcher's - // single-part default-value rule no longer applies, silently - // turning a valid default into `undefined`. - if (noneHaveValue) { - const anySuffixMultipart = members.some((m) => { - const suffixLen = - m.parts.length - - shape.fullParts - - (shape.stringTokens > 0 && - m.parts[shape.fullParts]?.type === "string" && - (m.parts[shape.fullParts] as any).value.length === - shape.stringTokens - ? 1 - : 0); - return suffixLen > 1; - }); - if (anySuffixMultipart) { - for (const m of members) newRules.push(m); - continue; - } +/** True if step's key fields match the existing edge (ignoring variable). */ +function edgeKeyMatches(edge: TrieEdge, step: TrieEdge): boolean { + if (edge.kind !== step.kind) return false; + switch (edge.kind) { + case "string": + return edge.token === (step as typeof edge).token; + case "wildcard": { + const s = step as typeof edge; + return edge.typeName === s.typeName && edge.optional === s.optional; } - - const suffixRules: GrammarRule[] = members.map((m, mi) => { - const remap = memberRemaps[mi]; - const suffixParts = memberSuffixParts[mi].map((p) => - remapPartVariables(p, remap), + case "number": { + const s = step as typeof edge; + return edge.optional === s.optional; + } + case "rules": { + const s = step as typeof edge; + return ( + edge.rules === s.rules && + edge.optional === s.optional && + edge.repeat === s.repeat ); - const suffixValue = - m.value !== undefined - ? remapValueVariables(m.value, remap) - : undefined; - const out: GrammarRule = { parts: suffixParts }; - if (suffixValue !== undefined) out.value = suffixValue; - if (m.spacingMode !== undefined) out.spacingMode = m.spacingMode; - return out; - }); - - // If any suffix carries a value expression, the factored wrapper - // rule must capture it — otherwise the matcher's value-tracking - // policy would drop the nested value (parent has > 1 part with no - // explicit value). Generate a fresh variable name that does not - // collide with the shared prefix or any suffix. - const anySuffixHasValue = suffixRules.some( - (r) => r.value !== undefined, - ); - const suffixRulesPart: RulesPart = { - type: "rules", - rules: suffixRules, - }; - const factoredAlt: GrammarRule = { - parts: [...canonicalParts, suffixRulesPart], - }; - if (anySuffixHasValue) { - const reserved = new Set(canonicalNames); - for (const r of suffixRules) { - for (const v of collectVariableNames(r.parts)) reserved.add(v); - } - let gen = "__opt_factor"; - let i = 0; - while (reserved.has(gen)) { - i++; - gen = `__opt_factor_${i}`; - } - suffixRulesPart.variable = gen; - factoredAlt.value = { type: "variable", name: gen }; } - const firstSpacing = members[0].spacingMode; - if ( - members.every((m) => m.spacingMode === firstSpacing) && - firstSpacing !== undefined - ) { - factoredAlt.spacingMode = firstSpacing; + case "phraseSet": { + const s = step as typeof edge; + return edge.matcherName === s.matcherName; } - - newRules.push(factoredAlt); - didFactor = true; - counter.factored++; } +} - if (!didFactor) return part; - return { ...part, rules: newRules }; +function collectStepRemap( + canonicalEdge: TrieEdge, + stepEdge: TrieEdge, + remap: Map, +): void { + if (canonicalEdge.kind === "string" || canonicalEdge.kind === "phraseSet") { + return; + } + const canonical = canonicalEdge.canonicalVariable; + const local = (stepEdge as typeof canonicalEdge).canonicalVariable; + if (canonical !== undefined && local !== undefined && canonical !== local) { + remap.set(local, canonical); + } } -// Compare two parts for "structurally equal modulo variable name". -function partsEqualForFactoring(a: GrammarPart, b: GrammarPart): boolean { - if (a.type !== b.type) return false; - switch (a.type) { - case "string": { - const bs = b as typeof a; - if (a.value.length !== bs.value.length) return false; - for (let i = 0; i < a.value.length; i++) { - if (a.value[i] !== bs.value[i]) return false; - } - return true; - } - case "phraseSet": - return a.matcherName === (b as typeof a).matcherName; +// ── Trie emission ──────────────────────────────────────────────────────── + +function edgeToPart(edge: TrieEdge): GrammarPart { + switch (edge.kind) { + case "string": + return { type: "string", value: [edge.token] }; case "wildcard": { - const bw = b as typeof a; - return ( - a.typeName === bw.typeName && - (a.optional ?? false) === (bw.optional ?? false) - ); + const out: GrammarPart = { + type: "wildcard", + typeName: edge.typeName, + variable: edge.canonicalVariable, + }; + if (edge.optional) out.optional = true; + return out; } case "number": { - const bn = b as typeof a; - return (a.optional ?? false) === (bn.optional ?? false); + const out: GrammarPart = { + type: "number", + variable: edge.canonicalVariable, + }; + if (edge.optional) out.optional = true; + return out; } case "rules": { - const br = b as typeof a; - return ( - a.rules === br.rules && - (a.optional ?? false) === (br.optional ?? false) && - (a.repeat ?? false) === (br.repeat ?? false) - ); + const out: RulesPart = { type: "rules", rules: edge.rules }; + if (edge.canonicalVariable !== undefined) { + out.variable = edge.canonicalVariable; + } + if (edge.optional) out.optional = true; + if (edge.repeat) out.repeat = true; + if (edge.name !== undefined) out.name = edge.name; + return out; } + case "phraseSet": + return { type: "phraseSet", matcherName: edge.matcherName }; } } -function sharedPrefixLength(a: GrammarRule, b: GrammarRule): number { - const max = Math.min(a.parts.length, b.parts.length); - let i = 0; - while (i < max && partsEqualForFactoring(a.parts[i], b.parts[i])) i++; - return i; +/** Append `part` to `prefix`, folding when both ends are StringParts. */ +function appendPart(prefix: GrammarPart[], part: GrammarPart): GrammarPart[] { + if (prefix.length === 0) return [part]; + const last = prefix[prefix.length - 1]; + if (last.type === "string" && part.type === "string") { + const merged: GrammarPart = { + type: "string", + value: [...last.value, ...part.value], + }; + return [...prefix.slice(0, prefix.length - 1), merged]; + } + return [...prefix, part]; } -type PrefixShape = { - // Number of leading parts where both rules match via - // partsEqualForFactoring. - fullParts: number; - // If the next part on both sides is a StringPart with a non-empty - // common leading token sequence, this records its length. - stringTokens: number; -}; +/** Concatenate two parts arrays, folding at the seam if both ends are strings. */ +function concatParts(a: GrammarPart[], b: GrammarPart[]): GrammarPart[] { + if (a.length === 0) return b.slice(); + if (b.length === 0) return a.slice(); + const last = a[a.length - 1]; + const first = b[0]; + if (last.type === "string" && first.type === "string") { + const merged: GrammarPart = { + type: "string", + value: [...last.value, ...first.value], + }; + return [...a.slice(0, a.length - 1), merged, ...b.slice(1)]; + } + return [...a, ...b]; +} + +function terminalToRule(t: Terminal): GrammarRule { + let value = t.value; + if (value !== undefined && t.remap.size > 0) { + value = remapValueVariables(value, t.remap); + } + const out: GrammarRule = { parts: [] }; + if (value !== undefined) out.value = value; + if (t.spacingMode !== undefined) out.spacingMode = t.spacingMode; + return out; +} + +/** + * Emit the subtree rooted at `node` (whose edge becomes the first part). + * - Returns one rule when the subtree is a single linear path or + * factors cleanly at the first fork. + * - Returns multiple when a fork's eligibility check failed (bailout): + * each would-be member is emitted as a full rule with the canonical + * prefix prepended. + */ +function emitFromNode(node: TrieNode, state: EmitState): GrammarRule[] { + // Path-compress: walk down single-child / no-terminal chain, but + // stop *before* entering a node that would itself be a fork — that + // way the fork's edge becomes the first part of each emitted member + // (avoiding empty-parts members at the fork, which would defeat + // factoring via the wholeConsumed-with-value check below). + let prefix: GrammarPart[] = [edgeToPart(node.edge!)]; + let current = node; + while ( + current.terminals.length === 0 && + current.children.length === 1 && + isLinearNode(current.children[0]) + ) { + current = current.children[0]; + prefix = appendPart(prefix, edgeToPart(current.edge!)); + } -function sharedPrefixShape(a: GrammarRule, b: GrammarRule): PrefixShape { - const full = sharedPrefixLength(a, b); - let stringTokens = 0; - if (full < a.parts.length && full < b.parts.length) { - const pa = a.parts[full]; - const pb = b.parts[full]; - if (pa.type === "string" && pb.type === "string") { - const max = Math.min(pa.value.length, pb.value.length); - while ( - stringTokens < max && - pa.value[stringTokens] === pb.value[stringTokens] - ) { - stringTokens++; + // Members at this fork = its terminals (each as an empty-parts rule) + // plus each child's emitted subtree (in original insertion order). + const items: { idx: number; rules: GrammarRule[] }[] = []; + for (const t of current.terminals) { + items.push({ idx: t.idx, rules: [terminalToRule(t)] }); + } + for (const c of current.children) { + items.push({ idx: c.firstIdx, rules: emitFromNode(c, state) }); + } + items.sort((a, b) => a.idx - b.idx); + const members: GrammarRule[] = []; + for (const it of items) members.push(...it.rules); + + if (members.length === 0) { + // Defensive: every reachable node has terminals or children. + return [{ parts: prefix }]; + } + if (members.length === 1) { + const m = members[0]; + return [{ ...m, parts: concatParts(prefix, m.parts) }]; + } + + // Multi-member fork: try to wrap; bail out if any check fails. + if (checkFactoringEligible(prefix, members) !== undefined) { + return members.map((m) => ({ + ...m, + parts: concatParts(prefix, m.parts), + })); + } + state.didFactor = true; + return [buildWrapperRule(prefix, members)]; +} + +/** + * Per-fork eligibility checks (lifted from the previous implementation). + * Returns `undefined` when factoring is safe, or a short reason string. + */ +function checkFactoringEligible( + prefix: GrammarPart[], + members: GrammarRule[], +): string | undefined { + // Empty-parts members never compose cleanly inside a wrapped + // RulesPart: with a value, the matcher would have to treat + // `{parts:[], value: V}` as a degenerate match (today's algorithm + // refuses this); without a value, the matcher's default-value + // resolver throws ("missing value for default") because the + // empty-parts rule has nothing to default from. + if (members.some((m) => m.parts.length === 0)) { + return "whole-consumed"; + } + const valuePresence = members.map((m) => m.value !== undefined); + const allHaveValue = valuePresence.every((v) => v); + const noneHaveValue = valuePresence.every((v) => !v); + if (!allHaveValue && !noneHaveValue) { + return "mixed-value-presence"; + } + if (noneHaveValue && members.some((m) => m.parts.length > 1)) { + return "implicit-default-multipart"; + } + const canonicalNames = collectVariableNames(prefix); + if (canonicalNames.size > 0) { + for (const m of members) { + if (m.value !== undefined) { + for (const v of collectVariableReferences(m.value)) { + if (canonicalNames.has(v)) return "cross-scope-ref"; + } + } + for (const v of collectVariableNames(m.parts)) { + if (canonicalNames.has(v)) return "binding-shadow"; } } } - return { fullParts: full, stringTokens }; + return undefined; +} + +function buildWrapperRule( + prefix: GrammarPart[], + members: GrammarRule[], +): GrammarRule { + const suffixRulesPart: RulesPart = { type: "rules", rules: members }; + const factoredAlt: GrammarRule = { + parts: [...prefix, suffixRulesPart], + }; + if (members.some((m) => m.value !== undefined)) { + const reserved = new Set(collectVariableNames(prefix)); + for (const m of members) { + for (const v of collectVariableNames(m.parts)) reserved.add(v); + } + let gen = "__opt_factor"; + let i = 0; + while (reserved.has(gen)) { + i++; + gen = `__opt_factor_${i}`; + } + suffixRulesPart.variable = gen; + factoredAlt.value = { type: "variable", name: gen }; + } + const firstSpacing = members[0].spacingMode; + if ( + firstSpacing !== undefined && + members.every((m) => m.spacingMode === firstSpacing) + ) { + factoredAlt.spacingMode = firstSpacing; + } + return factoredAlt; } +// ── Variable name / value-expression utilities (shared with inliner) ───── + function collectVariableNames(parts: GrammarPart[]): Set { const out = new Set(); for (const p of parts) { @@ -971,47 +1056,6 @@ function collectVariableReferences(node: CompiledValueNode): Set { return out; } -function buildPrefixRemap( - canonicalParts: GrammarPart[], - memberParts: GrammarPart[], - sharedLen: number, -): Map { - const remap = new Map(); - for (let i = 0; i < sharedLen; i++) { - const cv = bindingName(canonicalParts[i]); - const mv = bindingName(memberParts[i]); - if (cv !== undefined && mv !== undefined && cv !== mv) { - remap.set(mv, cv); - } - } - return remap; -} - -function remapPartVariables( - part: GrammarPart, - remap: Map, -): GrammarPart { - if (remap.size === 0) return part; - switch (part.type) { - case "wildcard": - case "number": - if (part.variable && remap.has(part.variable)) { - return { ...part, variable: remap.get(part.variable)! }; - } - return part; - case "rules": - // Rename this part's own variable; do NOT recurse into nested - // rules — those have their own scope. - if (part.variable && remap.has(part.variable)) { - return { ...part, variable: remap.get(part.variable)! }; - } - return part; - case "string": - case "phraseSet": - return part; - } -} - function remapValueVariables( node: CompiledValueNode, remap: Map, diff --git a/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts index 16822c140..66841d2a3 100644 --- a/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts +++ b/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts @@ -118,6 +118,45 @@ describe("Grammar Optimizer - Common prefix factoring", () => { ); }); + it("factors shared sub-prefixes inside the suffix group", () => { + // Two of the three alternatives share a longer prefix (`play song`) + // beyond the global shared prefix (`play `). The optimizer should + // factor the deeper sharing as well, not just the outermost. + const text = ` = ; + = play song $(x:string) -> { kind: "song-x", x } + | play song $(y:string) -> { kind: "song-y", y } + | play album $(z:string) -> { kind: "album", z };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of [ + "play song hello", + "play album world", + "play unknown", + ]) { + const baseRes = match(baseline, input); + const optRes = match(optimized, input); + expect(optRes.length).toBe(baseRes.length); + expect(optRes).toEqual(expect.arrayContaining(baseRes)); + } + // Structural: the optimized AST should have nested factoring — + // top-level RulesPart with one alternative whose suffix RulesPart + // itself contains a further factored rule for `song`. + const optChoice = findFirstRulesPart(optimized.rules); + expect(optChoice).toBeDefined(); + // reduces to a single shared-prefix wrapper. + expect(optChoice!.rules.length).toBe(1); + const factored = optChoice!.rules[0]; + // Find the inner RulesPart (the suffix group). + const innerWrapper = factored.parts.find((p) => p.type === "rules"); + expect(innerWrapper).toBeDefined(); + // The inner suffix group should have collapsed `song x | song y` so + // its rule count is 2 (one combined `song …` alt + the `album …` + // alt) rather than 3. + expect((innerWrapper as { rules: unknown[] }).rules.length).toBe(2); + }); + it("factors common prefixes across top-level rules", () => { // Three top-level alternatives all share "play the ". // Top-level factoring should reduce the rule count and preserve diff --git a/ts/packages/actionGrammar/test/grammarOptimizerTrieRisks.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerTrieRisks.spec.ts new file mode 100644 index 000000000..3f385b180 --- /dev/null +++ b/ts/packages/actionGrammar/test/grammarOptimizerTrieRisks.spec.ts @@ -0,0 +1,235 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Targeted tests for the trie-based common-prefix factoring rewrite: + * each test exercises a specific risk category called out during the + * design (see grammarOptimizer.ts factorRulesPart docstring). + */ + +import { loadGrammarRules } from "../src/grammarLoader.js"; +import { matchGrammar } from "../src/grammarMatcher.js"; +import { GrammarPart, GrammarRule, RulesPart } from "../src/grammarTypes.js"; + +function match(grammar: ReturnType, s: string) { + return matchGrammar(grammar, s).map((m) => m.match); +} + +function findAllRulesParts(rules: GrammarRule[]): RulesPart[] { + const out: RulesPart[] = []; + const visit = (parts: GrammarPart[]) => { + for (const p of parts) { + if (p.type === "rules") { + out.push(p); + for (const r of p.rules) visit(r.parts); + } + } + }; + for (const r of rules) visit(r.parts); + return out; +} + +describe("Grammar Optimizer - Trie risks", () => { + // ── Risk: cross-scope reference forces bailout, but factoring above + // the bailed fork still applies. + it("bailout at one fork still allows factoring above", () => { + // binds `trackName`; both alternatives reference + // it in their value expression — factoring the RulesPart + // would put the binding into outer scope, which the matcher + // can't see. The deep fork bails, but `play` should still get + // factored at the outer level. + const text = ` = ; + = $(trackName:string) -> trackName | the $(trackName:string) -> trackName; + = play $(trackName:) by $(artist:string) -> { kind: "by", trackName, artist } + | play $(trackName:) from album $(albumName:string) -> { kind: "from", trackName, albumName };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of [ + "play hello by alice", + "play the world from album unity", + ]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: shared RulesPart array identity must be preserved (the + // serializer dedupes by Map). + it("preserves shared RulesPart array identity", () => { + // Two top-level alternatives both reference . After + // factoring, every emitted RulesPart that points at + // should share the same `rules` array object. + const text = ` = ; + = a -> 1 | b -> 2; + = play $(x:) -> { kind: "play", x } + | stop $(x:) -> { kind: "stop", x };`; + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + const innerRulesArrays = new Set(); + for (const rp of findAllRulesParts(optimized.rules)) { + // Heuristic: any RulesPart with two child rules whose first + // parts are both phraseSet/string and whose values are 1 / 2 + // is the body. + if ( + rp.rules.length === 2 && + rp.rules[0].value !== undefined && + rp.rules[1].value !== undefined && + JSON.stringify(rp.rules[0].value) === + '{"type":"literal","value":1}' && + JSON.stringify(rp.rules[1].value) === + '{"type":"literal","value":2}' + ) { + innerRulesArrays.add(rp.rules); + } + } + // Both references to produce edges that point at the + // same `rules` array (Set size === 1). + expect(innerRulesArrays.size).toBe(1); + }); + + // ── Risk: a rule whose entire path is a strict prefix of another's + // path becomes a terminal AND a forking node at the same + // trie spot. + it("handles a rule that is a strict prefix of another (no values)", () => { + const text = ` = ; + = play + | play song;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play", "play song"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: same as above, but mixed value-presence forces bailout + // (the shorter rule has explicit value, the longer doesn't). + it("handles strict-prefix overlap with mixed value-presence", () => { + const text = ` = ; + = play -> "just" + | play song;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play", "play song"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: deep multi-level factoring — three layers of shared + // prefix should all collapse. + it("factors at multiple depths (a b c x | a b c y | a b d z)", () => { + const text = ` = ; + = a b c x -> 1 + | a b c y -> 2 + | a b d z -> 3;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["a b c x", "a b c y", "a b d z"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: variable name collision across alternatives — the lead + // alternative's variable name wins; later alternatives' value + // expressions must be remapped. + it("canonicalizes variable names from differently-named bindings", () => { + const text = ` = ; + = play $(track:string) once -> { kind: "once", track } + | play $(song:string) twice -> { kind: "twice", v: song };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play hello once", "play hello twice"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: factoring intersects with object-shorthand value. After + // remapping, `{ name }` from a non-lead alternative must be + // expanded to `{ name: }` so the field key + // doesn't change. + it("rewrites object shorthand keys when remapping (non-lead alt)", () => { + const text = ` = ; + = greet $(name:string) -> { name } + | greet $(other:string) twice -> { other };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["greet alice", "greet bob twice"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: order preservation across multiple groups at the same + // trie level — output ordering should match the original + // rule order semantically (same matches). + it("preserves match order across interleaved groups", () => { + // Three groups: foo*, bar*, foo* again. Trie merges the two + // foo* rules (insertion-order at root), bar stays separate. + const text = ` = ; + = foo a -> 1 + | bar -> 2 + | foo b -> 3;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["foo a", "foo b", "bar"]) { + // Same multi-set of match results. + const baseRes = match(baseline, input).map((m) => + JSON.stringify(m), + ); + const optRes = match(optimized, input).map((m) => + JSON.stringify(m), + ); + expect(optRes.sort()).toStrictEqual(baseRes.sort()); + } + }); + + // ── Risk: nested factoring + outer factoring composing — the inner + // RulesPart returned by emit() is reused as a member at the + // outer level, so the wrapper's variable name must not + // collide with the inner wrapper's. + it("avoids wrapper-variable collisions across nested factoring", () => { + const text = ` = ; + = play song red -> "sr" + | play song blue -> "sb" + | play album green -> "ag" + | play album yellow -> "ay";`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of [ + "play song red", + "play song blue", + "play album green", + "play album yellow", + ]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); +}); From cffdee61e6937548b79e2d139cb0723ec263e440 Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Wed, 22 Apr 2026 20:44:53 -0700 Subject: [PATCH 07/16] actionGrammar: minor cleanups in factorRulesPart - Split TrieRoot from TrieNode so non-root edges are non-optional, removing four non-null assertions and the redundant edge guard. - Collapse single-line edgeKeyMatches branches; keep block scope only for the variants that read multiple fields. - Use items.flatMap(it => it.rules) at the two emission sites. --- .../actionGrammar/src/grammarOptimizer.ts | 61 +++++++++++-------- 1 file changed, 34 insertions(+), 27 deletions(-) diff --git a/ts/packages/actionGrammar/src/grammarOptimizer.ts b/ts/packages/actionGrammar/src/grammarOptimizer.ts index 1b2a806a8..8925bc08a 100644 --- a/ts/packages/actionGrammar/src/grammarOptimizer.ts +++ b/ts/packages/actionGrammar/src/grammarOptimizer.ts @@ -579,7 +579,7 @@ function factorRulesPart( } if (part.rules.length < 2) return part; - const root: TrieNode = { children: [], terminals: [], firstIdx: 0 }; + const root: TrieRoot = { children: [], terminals: [] }; for (let i = 0; i < part.rules.length; i++) { insertRuleIntoTrie(root, part.rules[i], i); } @@ -590,8 +590,7 @@ function factorRulesPart( items.push({ idx: c.firstIdx, rules: emitFromNode(c, state) }); } items.sort((a, b) => a.idx - b.idx); - const newRules: GrammarRule[] = []; - for (const it of items) newRules.push(...it.rules); + const newRules: GrammarRule[] = items.flatMap((it) => it.rules); if (!state.didFactor) return part; counter.factored++; @@ -627,9 +626,19 @@ type Terminal = { remap: Map; }; +/** + * Root of the trie. Distinct from `TrieNode` so that `edge` can be + * required on every non-root node — eliminating non-null assertions in + * the insertion and emission code. Terminals on the root represent + * empty-parts input rules (rare but legal). + */ +type TrieRoot = { + children: TrieNode[]; + terminals: Terminal[]; +}; + type TrieNode = { - /** undefined only at the root. */ - edge?: TrieEdge; + edge: TrieEdge; children: TrieNode[]; terminals: Terminal[]; /** Lowest insertion index of any rule passing through this node. */ @@ -646,35 +655,36 @@ function isLinearNode(n: TrieNode): boolean { // ── Trie insertion ─────────────────────────────────────────────────────── function insertRuleIntoTrie( - root: TrieNode, + root: TrieRoot, rule: GrammarRule, idx: number, ): void { - let node = root; + let children = root.children; + let terminals = root.terminals; const remap = new Map(); for (const stepEdge of partsToEdgeSteps(rule.parts)) { let matched: TrieNode | undefined; - for (const c of node.children) { - if (c.edge && edgeKeyMatches(c.edge, stepEdge)) { + for (const c of children) { + if (edgeKeyMatches(c.edge, stepEdge)) { matched = c; break; } } if (matched !== undefined) { - collectStepRemap(matched.edge!, stepEdge, remap); - node = matched; + collectStepRemap(matched.edge, stepEdge, remap); } else { - const newChild: TrieNode = { + matched = { edge: stepEdge, children: [], terminals: [], firstIdx: idx, }; - node.children.push(newChild); - node = newChild; + children.push(matched); } + children = matched.children; + terminals = matched.terminals; } - node.terminals.push({ + terminals.push({ idx, value: rule.value, spacingMode: rule.spacingMode, @@ -724,6 +734,8 @@ function* partsToEdgeSteps(parts: GrammarPart[]): Generator { /** True if step's key fields match the existing edge (ignoring variable). */ function edgeKeyMatches(edge: TrieEdge, step: TrieEdge): boolean { if (edge.kind !== step.kind) return false; + // After the kind check, `step` has the same variant as `edge`; the + // cast inside each branch narrows it accordingly. switch (edge.kind) { case "string": return edge.token === (step as typeof edge).token; @@ -731,10 +743,8 @@ function edgeKeyMatches(edge: TrieEdge, step: TrieEdge): boolean { const s = step as typeof edge; return edge.typeName === s.typeName && edge.optional === s.optional; } - case "number": { - const s = step as typeof edge; - return edge.optional === s.optional; - } + case "number": + return edge.optional === (step as typeof edge).optional; case "rules": { const s = step as typeof edge; return ( @@ -743,10 +753,8 @@ function edgeKeyMatches(edge: TrieEdge, step: TrieEdge): boolean { edge.repeat === s.repeat ); } - case "phraseSet": { - const s = step as typeof edge; - return edge.matcherName === s.matcherName; - } + case "phraseSet": + return edge.matcherName === (step as typeof edge).matcherName; } } @@ -858,7 +866,7 @@ function emitFromNode(node: TrieNode, state: EmitState): GrammarRule[] { // way the fork's edge becomes the first part of each emitted member // (avoiding empty-parts members at the fork, which would defeat // factoring via the wholeConsumed-with-value check below). - let prefix: GrammarPart[] = [edgeToPart(node.edge!)]; + let prefix: GrammarPart[] = [edgeToPart(node.edge)]; let current = node; while ( current.terminals.length === 0 && @@ -866,7 +874,7 @@ function emitFromNode(node: TrieNode, state: EmitState): GrammarRule[] { isLinearNode(current.children[0]) ) { current = current.children[0]; - prefix = appendPart(prefix, edgeToPart(current.edge!)); + prefix = appendPart(prefix, edgeToPart(current.edge)); } // Members at this fork = its terminals (each as an empty-parts rule) @@ -879,8 +887,7 @@ function emitFromNode(node: TrieNode, state: EmitState): GrammarRule[] { items.push({ idx: c.firstIdx, rules: emitFromNode(c, state) }); } items.sort((a, b) => a.idx - b.idx); - const members: GrammarRule[] = []; - for (const it of items) members.push(...it.rules); + const members: GrammarRule[] = items.flatMap((it) => it.rules); if (members.length === 0) { // Defensive: every reachable node has terminals or children. From a07c9ffd3704a7a476456c4b5418f98a586ec793 Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Thu, 23 Apr 2026 13:20:33 -0700 Subject: [PATCH 08/16] actionGrammar: opaque canonical names in trie factoring MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Allocate fresh '__opt_v_' canonical names for every variable-bearing edge in the factoring trie instead of using the first inserter's user- supplied variable name. Eliminates two collision classes: - Outer-name shadow: under first-inserter-wins, a non-lead alternative whose value referenced an outer free variable matching the lead's local binding name would silently alias the lead's binding. Opaque canonicals cannot collide with any user name. - Bound vs. unbound RulesPart references: parity check in edgeKeyMatches keeps '' and '$(v:)' as separate trie children rather than silently merging (which would either invent or drop a binding). Other changes: - Split TrieStep (yielded by partsToEdgeSteps with 'local') from TrieEdge (stored in trie with 'canonical'). - Lead inserter now records its remap too (previously was a no-op). - recordStepRemap throws on conflicting overwrite of the same local to two different canonicals (defensive; unreachable in well-formed input). - Removed binding-shadow check in checkFactoringEligible (impossible with opaque canonicals); kept cross-scope-ref check, now expressed in canonical-name terms — it remains necessary because nested rule scope is fresh in the matcher (entering RulesPart resets valueIds). Tests: optimizer suite 100 \u2192 102; new tests cover outer-name shadow and bound-vs-unbound parity. Full action-grammar suite 2322 pass. --- .../actionGrammar/src/grammarOptimizer.ts | 205 ++++++++++++++---- .../test/grammarOptimizerTrieRisks.spec.ts | 88 ++++++++ 2 files changed, 250 insertions(+), 43 deletions(-) diff --git a/ts/packages/actionGrammar/src/grammarOptimizer.ts b/ts/packages/actionGrammar/src/grammarOptimizer.ts index 8925bc08a..9fc2f2c6d 100644 --- a/ts/packages/actionGrammar/src/grammarOptimizer.ts +++ b/ts/packages/actionGrammar/src/grammarOptimizer.ts @@ -579,9 +579,10 @@ function factorRulesPart( } if (part.rules.length < 2) return part; + const buildState: BuildState = { nextCanonicalId: 0 }; const root: TrieRoot = { children: [], terminals: [] }; for (let i = 0; i < part.rules.length; i++) { - insertRuleIntoTrie(root, part.rules[i], i); + insertRuleIntoTrie(root, part.rules[i], i, buildState); } const state: EmitState = { didFactor: false }; @@ -599,22 +600,67 @@ function factorRulesPart( // ── Trie data structures ───────────────────────────────────────────────── +// Variable handling notes +// ------------------------ +// Variable-bearing edges (wildcard/number, and bound rules) carry an +// **opaque canonical** name (`__opt_v_`) allocated at insertion +// time, *not* the source's variable name. This avoids two collision +// classes that any "first inserter wins" scheme is vulnerable to: +// +// (a) A non-lead inserter's value expression references an outer- +// scope variable whose name happens to match the lead's local +// binding. Renaming the local onto the lead would silently +// shadow the outer name. +// +// (b) A `rules` edge is bound on one inserter and unbound on +// another; merging would either invent a binding the unbound +// inserter never had or drop a binding the bound inserter +// depends on. +// +// (a) is solved by canonicals being opaque: `__opt_v_` cannot +// collide with any user-named variable. (b) is solved by the parity +// check in `edgeKeyMatches` for the `rules` kind: bound and unbound +// references no longer merge into the same trie edge. +// +// `partsToEdgeSteps` yields *steps* describing the source (with +// `local` field), `insertRuleIntoTrie` matches steps against existing +// edges and either reuses an edge (recording `local → canonical` in +// the per-rule remap) or allocates a new edge with a fresh canonical. +// Every inserter — *including the lead* — records its remap; the lead +// is no longer an exception because its local also differs from the +// canonical. + +type TrieStep = + | { kind: "string"; token: string } + | { kind: "wildcard"; typeName: string; optional: boolean; local: string } + | { kind: "number"; optional: boolean; local: string } + | { + kind: "rules"; + rules: GrammarRule[]; + optional: boolean; + repeat: boolean; + name: string | undefined; + local: string | undefined; + } + | { kind: "phraseSet"; matcherName: string }; + type TrieEdge = | { kind: "string"; token: string } | { kind: "wildcard"; typeName: string; optional: boolean; - canonicalVariable: string; + canonical: string; } - | { kind: "number"; optional: boolean; canonicalVariable: string } + | { kind: "number"; optional: boolean; canonical: string } | { kind: "rules"; rules: GrammarRule[]; optional: boolean; repeat: boolean; name: string | undefined; - canonicalVariable: string | undefined; + /** undefined iff every inserter at this edge was unbound. */ + canonical: string | undefined; } | { kind: "phraseSet"; matcherName: string }; @@ -626,6 +672,14 @@ type Terminal = { remap: Map; }; +type BuildState = { + nextCanonicalId: number; +}; + +function freshCanonical(state: BuildState): string { + return `__opt_v_${state.nextCanonicalId++}`; +} + /** * Root of the trie. Distinct from `TrieNode` so that `edge` can be * required on every non-root node — eliminating non-null assertions in @@ -658,29 +712,29 @@ function insertRuleIntoTrie( root: TrieRoot, rule: GrammarRule, idx: number, + buildState: BuildState, ): void { let children = root.children; let terminals = root.terminals; const remap = new Map(); - for (const stepEdge of partsToEdgeSteps(rule.parts)) { + for (const step of partsToEdgeSteps(rule.parts)) { let matched: TrieNode | undefined; for (const c of children) { - if (edgeKeyMatches(c.edge, stepEdge)) { + if (edgeKeyMatches(c.edge, step)) { matched = c; break; } } - if (matched !== undefined) { - collectStepRemap(matched.edge, stepEdge, remap); - } else { + if (matched === undefined) { matched = { - edge: stepEdge, + edge: stepToEdge(step, buildState), children: [], terminals: [], firstIdx: idx, }; children.push(matched); } + recordStepRemap(matched.edge, step, remap); children = matched.children; terminals = matched.terminals; } @@ -692,8 +746,8 @@ function insertRuleIntoTrie( }); } -/** Yield each rule.parts as a sequence of trie edges (StringPart explodes). */ -function* partsToEdgeSteps(parts: GrammarPart[]): Generator { +/** Yield each rule.parts as a sequence of trie steps (StringPart explodes). */ +function* partsToEdgeSteps(parts: GrammarPart[]): Generator { for (const p of parts) { switch (p.type) { case "string": @@ -704,14 +758,14 @@ function* partsToEdgeSteps(parts: GrammarPart[]): Generator { kind: "wildcard", typeName: p.typeName, optional: !!p.optional, - canonicalVariable: p.variable, + local: p.variable, }; break; case "number": yield { kind: "number", optional: !!p.optional, - canonicalVariable: p.variable, + local: p.variable, }; break; case "rules": @@ -721,7 +775,7 @@ function* partsToEdgeSteps(parts: GrammarPart[]): Generator { optional: !!p.optional, repeat: !!p.repeat, name: p.name, - canonicalVariable: p.variable, + local: p.variable, }; break; case "phraseSet": @@ -731,46 +785,103 @@ function* partsToEdgeSteps(parts: GrammarPart[]): Generator { } } -/** True if step's key fields match the existing edge (ignoring variable). */ -function edgeKeyMatches(edge: TrieEdge, step: TrieEdge): boolean { +/** Allocate a new trie edge from a step, minting a fresh canonical when needed. */ +function stepToEdge(step: TrieStep, buildState: BuildState): TrieEdge { + switch (step.kind) { + case "string": + case "phraseSet": + return step; + case "wildcard": + return { + kind: "wildcard", + typeName: step.typeName, + optional: step.optional, + canonical: freshCanonical(buildState), + }; + case "number": + return { + kind: "number", + optional: step.optional, + canonical: freshCanonical(buildState), + }; + case "rules": + return { + kind: "rules", + rules: step.rules, + optional: step.optional, + repeat: step.repeat, + name: step.name, + canonical: + step.local !== undefined + ? freshCanonical(buildState) + : undefined, + }; + } +} + +/** + * True if `step`'s key fields match `edge` for trie merging. For + * variable-bearing kinds the *names* are ignored (they get remapped), + * but for `rules` edges binding *presence* must agree — see notes + * above the `TrieStep`/`TrieEdge` types. + */ +function edgeKeyMatches(edge: TrieEdge, step: TrieStep): boolean { if (edge.kind !== step.kind) return false; // After the kind check, `step` has the same variant as `edge`; the // cast inside each branch narrows it accordingly. switch (edge.kind) { case "string": - return edge.token === (step as typeof edge).token; + return edge.token === (step as { token: string }).token; case "wildcard": { - const s = step as typeof edge; + const s = step as { typeName: string; optional: boolean }; return edge.typeName === s.typeName && edge.optional === s.optional; } case "number": - return edge.optional === (step as typeof edge).optional; + return edge.optional === (step as { optional: boolean }).optional; case "rules": { - const s = step as typeof edge; + const s = step as { + rules: GrammarRule[]; + optional: boolean; + repeat: boolean; + local: string | undefined; + }; return ( edge.rules === s.rules && edge.optional === s.optional && - edge.repeat === s.repeat + edge.repeat === s.repeat && + (edge.canonical === undefined) === (s.local === undefined) ); } case "phraseSet": - return edge.matcherName === (step as typeof edge).matcherName; + return ( + edge.matcherName === + (step as { matcherName: string }).matcherName + ); } } -function collectStepRemap( - canonicalEdge: TrieEdge, - stepEdge: TrieEdge, +/** + * Record the `local → canonical` rename for one inserter at one trie + * step. Throws on conflict (same local mapped to two canonicals on + * the same path), which would indicate either a malformed source rule + * with duplicate local names or a bug in the trie insertion logic. + */ +function recordStepRemap( + edge: TrieEdge, + step: TrieStep, remap: Map, ): void { - if (canonicalEdge.kind === "string" || canonicalEdge.kind === "phraseSet") { - return; - } - const canonical = canonicalEdge.canonicalVariable; - const local = (stepEdge as typeof canonicalEdge).canonicalVariable; - if (canonical !== undefined && local !== undefined && canonical !== local) { - remap.set(local, canonical); + if (edge.kind === "string" || edge.kind === "phraseSet") return; + const local = (step as { local: string | undefined }).local; + const canonical = edge.canonical; + if (local === undefined || canonical === undefined) return; + const prior = remap.get(local); + if (prior !== undefined && prior !== canonical) { + throw new Error( + `Internal optimizer error: variable '${local}' bound to multiple canonicals ('${prior}' then '${canonical}')`, + ); } + remap.set(local, canonical); } // ── Trie emission ──────────────────────────────────────────────────────── @@ -783,7 +894,7 @@ function edgeToPart(edge: TrieEdge): GrammarPart { const out: GrammarPart = { type: "wildcard", typeName: edge.typeName, - variable: edge.canonicalVariable, + variable: edge.canonical, }; if (edge.optional) out.optional = true; return out; @@ -791,15 +902,15 @@ function edgeToPart(edge: TrieEdge): GrammarPart { case "number": { const out: GrammarPart = { type: "number", - variable: edge.canonicalVariable, + variable: edge.canonical, }; if (edge.optional) out.optional = true; return out; } case "rules": { const out: RulesPart = { type: "rules", rules: edge.rules }; - if (edge.canonicalVariable !== undefined) { - out.variable = edge.canonicalVariable; + if (edge.canonical !== undefined) { + out.variable = edge.canonical; } if (edge.optional) out.optional = true; if (edge.repeat) out.repeat = true; @@ -935,17 +1046,25 @@ function checkFactoringEligible( if (noneHaveValue && members.some((m) => m.parts.length > 1)) { return "implicit-default-multipart"; } - const canonicalNames = collectVariableNames(prefix); - if (canonicalNames.size > 0) { + // Cross-scope-ref: nested rule scope is fresh at the matcher level + // (entering a `RulesPart` resets `valueIds`). If a member's value + // references a name that the wrapper's prefix binds, that reference + // would resolve to nothing at runtime. Detect and bail out so each + // member is emitted at the wrapper's level instead, putting the + // binding back in scope. + // + // Binding-shadow (a member's own binding colliding with a prefix + // binding) is no longer reachable: canonicals are opaque + // `__opt_v_` names allocated globally per `factorRulesPart` call, + // so two distinct edges always get distinct canonicals. + const prefixCanonicals = collectVariableNames(prefix); + if (prefixCanonicals.size > 0) { for (const m of members) { if (m.value !== undefined) { for (const v of collectVariableReferences(m.value)) { - if (canonicalNames.has(v)) return "cross-scope-ref"; + if (prefixCanonicals.has(v)) return "cross-scope-ref"; } } - for (const v of collectVariableNames(m.parts)) { - if (canonicalNames.has(v)) return "binding-shadow"; - } } } return undefined; diff --git a/ts/packages/actionGrammar/test/grammarOptimizerTrieRisks.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerTrieRisks.spec.ts index 3f385b180..eb0a073f2 100644 --- a/ts/packages/actionGrammar/test/grammarOptimizerTrieRisks.spec.ts +++ b/ts/packages/actionGrammar/test/grammarOptimizerTrieRisks.spec.ts @@ -232,4 +232,92 @@ describe("Grammar Optimizer - Trie risks", () => { ); } }); + + // ── Risk: outer-name shadow. With first-inserter-wins canonical + // naming, a non-lead alternative whose value happens to use + // a name that matches the lead's local binding would have + // the local renamed onto the lead's canonical, silently + // changing what name the value resolves against. With + // opaque canonicals (`__opt_v_`) this collision class is + // impossible by construction; the emitted variable name is + // synthetic and cannot collide with any user-named ref. + it("opaque canonicals avoid outer-name shadowing", () => { + // Both alternatives bind their wildcard but the *non-lead* one + // happens to spell its local with the same name (`x`) the lead + // would have used as canonical. Under first-inserter-wins the + // second's value `{tag: "B", v: x}` would alias the lead's `x`; + // under opaque canonicals each side keeps its own remap and the + // emitted output is unambiguous. + const text = ` = ; + = play $(x:string) once -> { tag: "A", v: x } + | play $(x:string) twice -> { tag: "B", v: x };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play hello once", "play world twice"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: bound vs. unbound RulesPart references at the same edge. + // Without binding-presence parity they would merge, either + // inventing a binding the unbound side never had or + // dropping a binding the bound side depends on. + it("does not merge bound and unbound references", () => { + // Two alternatives both reference ; the second binds it + // and uses the binding in its value. Parity check should keep + // them as separate trie children. + const text = ` = ; + = a -> 1 | b -> 2; + = play -> "no-bind" + | play $(v:) -> { kind: "bound", v };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play a", "play b"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: under "first-inserter-wins" canonical naming, the lead's + // local becomes the canonical for the merged prefix edge. + // A NON-LEAD alternative can have a SUFFIX binding whose + // name happens to match the lead's local — and whose value + // expression references that name. Under the broken + // scheme, the non-lead's value references the suffix + // binding, but matcher resolution hits the prefix binding + // first (the suffix binding is in the wrapper's nested + // scope and the value would *not* see it correctly). + // + // Under the opaque scheme: prefix canonical is `__opt_v_0` + // (synthetic, cannot collide with user names), and the + // non-lead's suffix binding `x` stays `x` after remap (its + // local doesn't get renamed because the suffix binding is + // on a DIFFERENT trie edge from the prefix). Value `{x}` + // resolves to the suffix binding correctly. + // + // Critically, this also exercises the "lead must record + // its own remap" property: the lead's `x` local in its + // value expression must be remapped to `__opt_v_0`. + // Without that remap, the matcher fails to resolve `x`. + it("opaque canonicals + lead remap handle prefix/suffix name reuse", () => { + const text = ` = ; + = play $(x:string) -> { kind: "lead", v: x } + | play $(a:string) then $(x:string) -> { kind: "alt", v: x };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play hello", "play first then second"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); }); From 04cf20e9c60c991b234dcf40b8d701128443ee67 Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Thu, 23 Apr 2026 16:08:43 -0700 Subject: [PATCH 09/16] =?UTF-8?q?actionGrammar:=20=CE=B1-rename=20child=20?= =?UTF-8?q?bindings=20during=20inline;=20unify=20substitute=20walks?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Inline pass: unconditionally α-rename child rule top-level bindings to fresh __opt_inline_ names (per-parent counter), eliminating collisions in both substitute and drop branches. - Unify Substitute and Drop branches in tryInlineRulesPart; Hoist remains a separate fast path. - Batch per-parent value substitutions into a single Map and a single AST walk in inlineRule. - Unify remapValueVariables and substituteValueVariable into substituteValueVariables(node, Map); remapValueVariables is now a thin wrapper. - Add tests for branch-3 collision rename and per-parent counter uniqueness. --- .../actionGrammar/src/grammarOptimizer.ts | 375 +++++++----------- .../test/grammarOptimizerInline.spec.ts | 66 ++- 2 files changed, 209 insertions(+), 232 deletions(-) diff --git a/ts/packages/actionGrammar/src/grammarOptimizer.ts b/ts/packages/actionGrammar/src/grammarOptimizer.ts index 9fc2f2c6d..dc87436bc 100644 --- a/ts/packages/actionGrammar/src/grammarOptimizer.ts +++ b/ts/packages/actionGrammar/src/grammarOptimizer.ts @@ -154,12 +154,17 @@ function inlineRule( memo: RulesArrayMemo, refCounts: Map, ): GrammarRule { + // Per-parent-rule counter for opaque α-rename names. Shared + // across every `tryInlineRulesPart` call for this rule so two + // inlinings into the same parent never mint the same fresh name. + const renameState: RenameState = { next: 0 }; const { parts, changed, valueSubstitutions, valueAssignment } = inlineParts( rule.parts, rule, counter, memo, refCounts, + renameState, ); if (!changed) { return rule; @@ -169,13 +174,11 @@ function inlineRule( value = valueAssignment; } if (valueSubstitutions.length > 0 && value !== undefined) { + const subs = new Map(); for (const sub of valueSubstitutions) { - value = substituteValueVariable( - value, - sub.variable, - sub.replacement, - ); + subs.set(sub.variable, sub.replacement); } + value = substituteValueVariables(value, subs); } if (value === rule.value) { return { ...rule, parts }; @@ -202,12 +205,16 @@ type TryInlineResult = { valueAssignment?: CompiledValueNode; }; +/** Per-parent-rule fresh-name counter used by α-renaming. */ +type RenameState = { next: number }; + function inlineParts( parts: GrammarPart[], parentRule: GrammarRule, counter: { inlined: number }, memo: RulesArrayMemo, refCounts: Map, + renameState: RenameState, ): { parts: GrammarPart[]; changed: boolean; @@ -245,7 +252,7 @@ function inlineParts( const shared = (refCounts.get(p.rules) ?? 1) > 1; const replacement = shared ? undefined - : tryInlineRulesPart(rewritten, parentRule); + : tryInlineRulesPart(rewritten, parentRule, renameState); if (replacement !== undefined) { counter.inlined++; changed = true; @@ -286,6 +293,7 @@ function inlineParts( function tryInlineRulesPart( part: RulesPart, parentRule: GrammarRule, + renameState: RenameState, ): TryInlineResult | undefined { if (part.repeat || part.optional) { return undefined; @@ -314,55 +322,38 @@ function tryInlineRulesPart( // The child rule may carry its own value expression. After // inlining, child.parts move into the parent and the explicit // child.value can no longer fire on its own. child.value is - // observable to the matcher in exactly two ways — handle each, + // observable to the matcher in two ways; we handle each, and // otherwise the value is dead and can be dropped: // - // (1) Substitute: parent captures via `part.variable` AND - // parent.value references that variable. Substitute - // child.value for the variable in parent.value. + // (Hoist) parent has no value of its own and exactly one + // part (this RulesPart). Synthesize a value + // assignment from child.value onto the parent — + // this is what the matcher's single-part + // default-value rule would have computed at + // runtime. // - // (2) Hoist: parent has no value of its own and exactly one - // part (this RulesPart). The matcher's single-part - // default-value rule would have promoted the captured - // child.value into the parent's value at runtime. - // Synthesize that assignment explicitly on the parent. + // (Substitute) parent captures via `part.variable` AND has + // its own value expression. Substitute + // child.value for the captured variable in + // parent.value. // - // (3) Drop: child.value is unobservable; inline child.parts - // and forget the value. + // (Drop) child.value is unobservable: inline child.parts + // and forget the value. // - // child.value's references to child's own part bindings remain - // in scope after inlining since those bindings move from - // child.parts → parent.parts. Only case (1) needs an additional - // collision check against the parent's *other* parts. + // The Substitute and Drop cases share the same parts handling — + // child's top-level bindings are α-renamed (so they can't collide + // with parent's other parts) and the renamed child.value is + // either folded into parent.value (Substitute) or discarded + // (Drop). child.value's references to child's own part bindings + // remain in scope after inlining since those bindings move from + // child.parts → parent.parts. if (child.value !== undefined) { - // (1) Substitution. - if (part.variable !== undefined && parentRule.value !== undefined) { - const parentRefs = collectVariableReferences(parentRule.value); - if (parentRefs.has(part.variable)) { - // Refuse if child's top-level bindings would collide - // with bindings already in parent's other parts. - const childBindings = collectVariableNames(child.parts); - for (const otherPart of parentRule.parts) { - if (otherPart === part) continue; - const v = bindingName(otherPart); - if (v !== undefined && childBindings.has(v)) { - return undefined; - } - } - return { - parts: child.parts, - valueSubstitution: { - variable: part.variable, - replacement: child.value, - }, - }; - } - // Parent has its own value and doesn't reference the - // captured variable — fall through to drop. - } - - // (2) Hoist onto a single-part parent without its own value. - // No collision check needed: parent has no other parts. + // (Hoist) Parent has no value of its own and exactly one + // part (this RulesPart). The matcher's single-part + // default-value rule would have promoted the captured + // child.value into the parent's value at runtime; synthesize + // that assignment explicitly. No α-rename needed: parent has + // no other parts for child.parts' bindings to collide with. if (parentRule.value === undefined && parentRule.parts.length === 1) { return { parts: child.parts, @@ -370,8 +361,28 @@ function tryInlineRulesPart( }; } - // (3) Drop: child.value is unobservable at runtime. - return { parts: child.parts }; + // Otherwise: α-rename child's top-level bindings to fresh + // opaque names so they can't collide with parent's other + // parts, and apply the same remap to child.value. Then: + // - if the parent captures via `part.variable` AND has its + // own value expression, fold the renamed child.value into + // it (substitution). When parent.value doesn't reference + // `part.variable` the substitution is a no-op walk and we + // get the same result as the drop case. + // - otherwise child.value is unobservable at runtime; drop + // it and inline only the renamed child.parts. + const { parts: renamedParts, value: renamedValue } = + renameAllChildBindings(child.parts, child.value, renameState); + if (part.variable !== undefined && parentRule.value !== undefined) { + return { + parts: renamedParts, + valueSubstitution: { + variable: part.variable, + replacement: renamedValue!, + }, + }; + } + return { parts: renamedParts }; } // If the parent expects to capture this RulesPart into a variable, the @@ -1119,11 +1130,47 @@ function collectVariableNames(parts: GrammarPart[]): Set { return out; } -function bindingName(p: GrammarPart): string | undefined { - if (p.type === "wildcard" || p.type === "number" || p.type === "rules") { - return p.variable; +/** + * α-rename every top-level binding in `parts` to a fresh opaque name + * (`__opt_inline_`), and apply the same remap to `value` if given. + * Returns the original arrays/nodes when there are no top-level + * bindings to rename. + * + * Only top-level bindings are touched: nested rule scopes are not + * visible from outside their nested rule and therefore can't collide + * with anything in the parent we're inlining into. + */ +function renameAllChildBindings( + parts: GrammarPart[], + value: CompiledValueNode | undefined, + renameState: RenameState, +): { parts: GrammarPart[]; value: CompiledValueNode | undefined } { + let remap: Map | undefined; + let outParts: GrammarPart[] | undefined; + for (let i = 0; i < parts.length; i++) { + const p = parts[i]; + if ( + (p.type !== "wildcard" && + p.type !== "number" && + p.type !== "rules") || + p.variable === undefined + ) { + if (outParts !== undefined) outParts.push(p); + continue; + } + const fresh = `__opt_inline_${renameState.next++}`; + if (remap === undefined) remap = new Map(); + remap.set(p.variable, fresh); + if (outParts === undefined) outParts = parts.slice(0, i); + outParts.push({ ...p, variable: fresh }); } - return undefined; + if (remap === undefined) { + return { parts, value }; + } + return { + parts: outParts ?? parts, + value: value !== undefined ? remapValueVariables(value, remap) : value, + }; } function collectVariableReferences(node: CompiledValueNode): Set { @@ -1187,218 +1234,92 @@ function remapValueVariables( remap: Map, ): CompiledValueNode { if (remap.size === 0) return node; - switch (node.type) { - case "literal": - return node; - case "variable": - if (remap.has(node.name)) { - return { ...node, name: remap.get(node.name)! }; - } - return node; - case "object": { - const value: CompiledObjectElement[] = node.value.map((el) => { - if (el.type === "spread") { - return { - ...el, - argument: remapValueVariables(el.argument, remap), - }; - } - if (el.value === null) { - // Shorthand { foo } = { foo: foo }. If the key is - // being remapped, expand to a full property so the - // key (object field name) stays the same while the - // value references the new variable name. - if (remap.has(el.key)) { - return { - ...el, - value: { - type: "variable" as const, - name: remap.get(el.key)!, - }, - }; - } - return el; - } - return { - ...el, - value: remapValueVariables(el.value, remap), - }; - }); - return { ...node, value }; - } - case "array": - return { - ...node, - value: node.value.map((v) => remapValueVariables(v, remap)), - }; - case "binaryExpression": - return { - ...node, - left: remapValueVariables(node.left, remap), - right: remapValueVariables(node.right, remap), - }; - case "unaryExpression": - return { - ...node, - operand: remapValueVariables(node.operand, remap), - }; - case "conditionalExpression": - return { - ...node, - test: remapValueVariables(node.test, remap), - consequent: remapValueVariables(node.consequent, remap), - alternate: remapValueVariables(node.alternate, remap), - }; - case "memberExpression": - return { - ...node, - object: remapValueVariables(node.object, remap), - property: - typeof node.property === "string" - ? node.property - : remapValueVariables(node.property, remap), - }; - case "callExpression": - return { - ...node, - callee: remapValueVariables(node.callee, remap), - arguments: node.arguments.map((a) => - remapValueVariables(a, remap), - ), - }; - case "spreadElement": - return { - ...node, - argument: remapValueVariables(node.argument, remap), - }; - case "templateLiteral": - return { - ...node, - expressions: node.expressions.map((e) => - remapValueVariables(e, remap), - ), - }; + // Renaming is just substitution where each replacement is a fresh + // variable node carrying the new name. + const subs = new Map(); + for (const [from, to] of remap) { + subs.set(from, { type: "variable", name: to }); } + return substituteValueVariables(node, subs); } /** - * Replace every reference to the variable `name` in `node` with a deep - * copy of `replacement`. Used by the inliner when a child rule with an - * explicit value expression is folded into its parent: the parent's - * value expression's reference to the captured variable is substituted - * with the child's own value expression. + * Replace each reference to a variable in `node` with the matching + * replacement node from `substitutions`. Variables not present in the + * map are left untouched. + * + * Used in two ways by the inliner / factorer: + * - α-rename (via `remapValueVariables`): replacements are fresh + * `{ type: "variable", name: }` nodes. + * - Value-expression substitution: replacements are arbitrary value + * expressions copied from a child rule's `value`. + * + * Object shorthand `{ foo }` (which means `{ foo: foo }`) is expanded + * to a full property `{ foo: }` whenever the shorthand + * key matches a substitution, so the field name on the resulting + * object stays the same. */ -function substituteValueVariable( +function substituteValueVariables( node: CompiledValueNode, - name: string, - replacement: CompiledValueNode, + substitutions: Map, ): CompiledValueNode { + if (substitutions.size === 0) return node; + const sub = (n: CompiledValueNode): CompiledValueNode => + substituteValueVariables(n, substitutions); switch (node.type) { case "literal": return node; - case "variable": - return node.name === name ? replacement : node; + case "variable": { + const r = substitutions.get(node.name); + return r !== undefined ? r : node; + } case "object": { const value: CompiledObjectElement[] = node.value.map((el) => { if (el.type === "spread") { - return { - ...el, - argument: substituteValueVariable( - el.argument, - name, - replacement, - ), - }; + return { ...el, argument: sub(el.argument) }; } if (el.value === null) { - // Shorthand { foo } = { foo: foo }. If the key is - // the variable being substituted, expand to the - // full property form { foo: }. - if (el.key === name) { - return { ...el, value: replacement }; + const r = substitutions.get(el.key); + if (r !== undefined) { + return { ...el, value: r }; } return el; } - return { - ...el, - value: substituteValueVariable(el.value, name, replacement), - }; + return { ...el, value: sub(el.value) }; }); return { ...node, value }; } case "array": - return { - ...node, - value: node.value.map((v) => - substituteValueVariable(v, name, replacement), - ), - }; + return { ...node, value: node.value.map(sub) }; case "binaryExpression": - return { - ...node, - left: substituteValueVariable(node.left, name, replacement), - right: substituteValueVariable(node.right, name, replacement), - }; + return { ...node, left: sub(node.left), right: sub(node.right) }; case "unaryExpression": - return { - ...node, - operand: substituteValueVariable( - node.operand, - name, - replacement, - ), - }; + return { ...node, operand: sub(node.operand) }; case "conditionalExpression": return { ...node, - test: substituteValueVariable(node.test, name, replacement), - consequent: substituteValueVariable( - node.consequent, - name, - replacement, - ), - alternate: substituteValueVariable( - node.alternate, - name, - replacement, - ), + test: sub(node.test), + consequent: sub(node.consequent), + alternate: sub(node.alternate), }; case "memberExpression": return { ...node, - object: substituteValueVariable(node.object, name, replacement), + object: sub(node.object), property: typeof node.property === "string" ? node.property - : substituteValueVariable( - node.property, - name, - replacement, - ), + : sub(node.property), }; case "callExpression": return { ...node, - callee: substituteValueVariable(node.callee, name, replacement), - arguments: node.arguments.map((a) => - substituteValueVariable(a, name, replacement), - ), + callee: sub(node.callee), + arguments: node.arguments.map(sub), }; case "spreadElement": - return { - ...node, - argument: substituteValueVariable( - node.argument, - name, - replacement, - ), - }; + return { ...node, argument: sub(node.argument) }; case "templateLiteral": - return { - ...node, - expressions: node.expressions.map((e) => - substituteValueVariable(e, name, replacement), - ), - }; + return { ...node, expressions: node.expressions.map(sub) }; } } diff --git a/ts/packages/actionGrammar/test/grammarOptimizerInline.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerInline.spec.ts index 1830956e9..fbf56e4a4 100644 --- a/ts/packages/actionGrammar/test/grammarOptimizerInline.spec.ts +++ b/ts/packages/actionGrammar/test/grammarOptimizerInline.spec.ts @@ -223,23 +223,27 @@ describe("Grammar Optimizer - Inline single-alternative RulesPart", () => { ); }); - it("skips value-substitution inline when child binding collides with parent binding", () => { + it("α-renames colliding child bindings during value-substitution inline", () => { // Parent already has `name` as a binding; child also binds - // `name`. After inlining, two `name` bindings would collide in - // the same scope, so the inliner must refuse. + // `name`. Rather than refuse, the inliner α-renames the + // child's colliding top-level binding (and its references in + // child.value) to a fresh opaque name before substituting. const text = ` = $(name:string) says $(t:) -> { speaker: name, said: t }; = $(name:string) loud -> name;`; const baseline = loadGrammarRules("t.grammar", text); const optimized = loadGrammarRules("t.grammar", text, { optimizations: { inlineSingleAlternatives: true }, }); - // No inline of the value-bearing child. - expect(countRulesParts(optimized.rules)).toBe( + // Inlining proceeded (one fewer RulesPart layer). + expect(countRulesParts(optimized.rules)).toBeLessThan( countRulesParts(baseline.rules), ); expect(match(optimized, "alice says bob loud")).toStrictEqual( match(baseline, "alice says bob loud"), ); + expect(match(optimized, "alice says bob loud")).toStrictEqual([ + { speaker: "alice", said: "bob" }, + ]); }); it("inlines and drops child value when parent value does not reference the captured variable", () => { @@ -262,4 +266,56 @@ describe("Grammar Optimizer - Inline single-alternative RulesPart", () => { { kind: "play" }, ]); }); + + it("α-renames child bindings when dropping child value (branch 3 collision)", () => { + // Parent has its own binding `name` and uses it in the value + // expression. The unbound falls through to branch (3) + // (drop child.value, inline child.parts). Without renaming, + // child's `name` binding would collide with parent's `name` and + // the matcher's last-write-wins value resolution would shadow + // parent's `name` with the inlined child's, giving the wrong + // result. With α-rename, the inlined binding gets a fresh + // opaque name and parent's `name` resolves correctly. + const text = ` = $(name:string) says -> { said: name }; + = $(name:string) loud -> name;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + // Inlining still proceeds. + expect(countRulesParts(optimized.rules)).toBeLessThan( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "alice says bob loud")).toStrictEqual( + match(baseline, "alice says bob loud"), + ); + expect(match(optimized, "alice says bob loud")).toStrictEqual([ + { said: "alice" }, + ]); + }); + + it("mints unique fresh names across multiple inlines into the same parent", () => { + // Two distinct child rules are inlined into the same parent via + // branch (1) substitution. Each child binds `name` at the top + // level. The per-parent rename counter must produce distinct + // fresh names for the two inlined bindings; otherwise the two + // bindings would collide in the parent's parts list and the + // value substitutions would resolve to the wrong source. + const text = ` = $(a:) and $(b:) -> { x: a, y: b }; + = $(name:string) here -> name; + = $(name:string) there -> name;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBeLessThan( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "alice here and bob there")).toStrictEqual( + match(baseline, "alice here and bob there"), + ); + expect(match(optimized, "alice here and bob there")).toStrictEqual([ + { x: "alice", y: "bob" }, + ]); + }); }); From d72dde8fd9c48501ef1024a2701f64b463bb1712 Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Thu, 23 Apr 2026 20:02:58 -0700 Subject: [PATCH 10/16] actionGrammar: broaden inline Hoist branch; add optimizer coverage tests - inlineSingleAlternatives: handle multi-part parents that capture a value-producing child via part.variable by hoisting child.value into a synthesized valueAssignment (matches the matcher's default-value rule). Drop withPropagatedVariable / findExistingVariable in favor of in-place re-targeting of the single binding-friendly child part. - Tests: add coverage for optional-flag preservation, spacing-mode mismatches, refCount>1 inline refusal, bound-nested-rules-part regression, plus factoring eligibility and trie-edge variants. --- .../actionGrammar/src/grammarOptimizer.ts | 161 +++++----- .../test/grammarOptimizerFactoring.spec.ts | 89 ++++++ .../test/grammarOptimizerInline.spec.ts | 302 ++++++++++++++++++ 3 files changed, 467 insertions(+), 85 deletions(-) diff --git a/ts/packages/actionGrammar/src/grammarOptimizer.ts b/ts/packages/actionGrammar/src/grammarOptimizer.ts index dc87436bc..99caf7bcb 100644 --- a/ts/packages/actionGrammar/src/grammarOptimizer.ts +++ b/ts/packages/actionGrammar/src/grammarOptimizer.ts @@ -197,10 +197,13 @@ type TryInlineResult = { /** * When set, the parent rule had no value expression of its own and * this inlining synthesizes one — copying what the matcher would - * have computed via the single-part default-value rule (i.e. the - * captured child rule's value). Only valid when the parent had a - * single part and no `value`; in that situation no other inlining - * decision in the same parent can collide. + * have computed via its default-value rule (i.e. the captured child + * rule's value). At most one assignment is possible per parent + * rule: the matcher's default-value rule requires exactly one + * variable on the parent, so two inlinings each producing a + * valueAssignment would mean the parent originally had two + * variables and `hasValue=false` — a grammar the compiler + * rejects (or warns about). */ valueAssignment?: CompiledValueNode; }; @@ -264,9 +267,10 @@ function inlineParts( } if (replacement.valueAssignment !== undefined) { // valueAssignment is only produced when the parent had - // exactly one part (this RulesPart) and no value of its - // own — so at most one assignment is possible per - // parent rule. + // no value of its own and the matcher's default-value + // rule would have used child.value as the parent's + // result — see TryInlineResult.valueAssignment for why + // this can fire at most once per parent rule. valueAssignment = replacement.valueAssignment; } } else { @@ -348,31 +352,38 @@ function tryInlineRulesPart( // remain in scope after inlining since those bindings move from // child.parts → parent.parts. if (child.value !== undefined) { - // (Hoist) Parent has no value of its own and exactly one - // part (this RulesPart). The matcher's single-part - // default-value rule would have promoted the captured - // child.value into the parent's value at runtime; synthesize - // that assignment explicitly. No α-rename needed: parent has - // no other parts for child.parts' bindings to collide with. - if (parentRule.value === undefined && parentRule.parts.length === 1) { + // α-rename child's top-level bindings to fresh opaque names so + // they can't collide with any other top-level bindings the + // parent already has, and apply the same remap to child.value. + // Skipped when parent has only this RulesPart as its single + // part — there are no siblings to collide with. + const { parts: renamedParts, value: renamedValue } = + parentRule.parts.length === 1 + ? { parts: child.parts, value: child.value } + : renameAllChildBindings(child.parts, child.value, renameState); + + // (Hoist) Parent has no value of its own and the matcher + // would have computed the parent's value via its + // default-value rule using `child.value` — either because + // parent has a single part (this RulesPart) or because + // parent's only variable is `part.variable` (which captured + // child.value at runtime). Synthesize that assignment + // explicitly. + if ( + parentRule.value === undefined && + (parentRule.parts.length === 1 || part.variable !== undefined) + ) { return { - parts: child.parts, - valueAssignment: child.value, + parts: renamedParts, + valueAssignment: renamedValue!, }; } - // Otherwise: α-rename child's top-level bindings to fresh - // opaque names so they can't collide with parent's other - // parts, and apply the same remap to child.value. Then: - // - if the parent captures via `part.variable` AND has its - // own value expression, fold the renamed child.value into - // it (substitution). When parent.value doesn't reference - // `part.variable` the substitution is a no-op walk and we - // get the same result as the drop case. - // - otherwise child.value is unobservable at runtime; drop - // it and inline only the renamed child.parts. - const { parts: renamedParts, value: renamedValue } = - renameAllChildBindings(child.parts, child.value, renameState); + // (Substitute) parent captures via `part.variable` AND has + // its own value expression — fold the renamed child.value + // into it. When parent.value doesn't reference + // `part.variable` the substitution is a no-op walk and we + // get the same result as the drop case. if (part.variable !== undefined && parentRule.value !== undefined) { return { parts: renamedParts, @@ -382,76 +393,56 @@ function tryInlineRulesPart( }, }; } + + // (Drop) child.value is unobservable at runtime; drop it and + // inline only the renamed child.parts. return { parts: renamedParts }; } // If the parent expects to capture this RulesPart into a variable, the - // child rule must provide a single binding-friendly part to take the - // variable name; otherwise we'd silently drop the binding. + // child rule must provide exactly one binding-friendly part to take + // the variable name; otherwise we'd silently drop the binding. + // Multiple variable-bearing parts would mean child relied on an + // explicit value expression (which the no-value branch rules out) + // or violated the matcher's default-value contract; either way we + // can't safely re-target the parent's binding. Other parts in + // child (string / phraseSet literals) come along unchanged. if (part.variable !== undefined) { - if (child.parts.length !== 1) { - return undefined; - } - const only = child.parts[0]; - const bound = withPropagatedVariable(only, part.variable); - if (bound === undefined) { - return undefined; + let bindingIdx = -1; + let bindingCp: + | Extract + | undefined; + for (let i = 0; i < child.parts.length; i++) { + const cp = child.parts[i]; + if ( + cp.type === "wildcard" || + cp.type === "number" || + cp.type === "rules" + ) { + if (bindingIdx !== -1) { + return undefined; + } + bindingIdx = i; + bindingCp = cp; + } } - // Guard against duplicate variable names being introduced into the - // parent's parts list. - if (findExistingVariable(parentRule.parts, part.variable, part)) { + if (bindingCp === undefined) { return undefined; } - return { parts: [bound] }; + const newParts = child.parts.slice(); + newParts[bindingIdx] = { ...bindingCp, variable: part.variable }; + // No duplicate-name guard here: if the parent already had two + // top-level parts bound to `part.variable` (the RulesPart and + // some sibling), that collision predates inlining and the + // matcher's behavior on it is unchanged when we replace the + // RulesPart with a wildcard/number/rules part bound to the + // same name. + return { parts: newParts }; } return { parts: child.parts }; } -/** - * Return a clone of `part` with `variable` set, or undefined if the part - * cannot safely carry a variable binding via inlining. - * - * We only propagate onto direct-capture parts (wildcard/number). Pushing - * a variable onto a nested RulesPart is unsafe in the general case: the - * inner rule may compute its value via an expression that references - * names not reachable from the new parent scope, or it may provide no - * structural value at all, causing the parent's binding to miss. - */ -function withPropagatedVariable( - part: GrammarPart, - variable: string, -): GrammarPart | undefined { - switch (part.type) { - case "wildcard": - case "number": - return { ...part, variable }; - case "rules": - case "string": - case "phraseSet": - return undefined; - } -} - -function findExistingVariable( - parts: GrammarPart[], - name: string, - skip: GrammarPart, -): boolean { - for (const p of parts) { - if (p === skip) continue; - if ( - (p.type === "wildcard" || - p.type === "number" || - p.type === "rules") && - p.variable === name - ) { - return true; - } - } - return false; -} - // ───────────────────────────────────────────────────────────────────────────── // Optimization #2: factor common prefixes across alternatives // ───────────────────────────────────────────────────────────────────────────── diff --git a/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts index 66841d2a3..4c2a8aa35 100644 --- a/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts +++ b/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts @@ -182,4 +182,93 @@ describe("Grammar Optimizer - Common prefix factoring", () => { expect(optRes).toEqual(expect.arrayContaining(baseRes)); } }); + + // ── Eligibility-bailout coverage ──────────────────────────────────── + + it("bails out (implicit-default-multipart) when value-less members differ in part count", () => { + // All three alternatives lack an explicit `value` and the + // matcher's default-value rule applies. Two of them are + // multi-part (more than one inlined part after factoring the + // shared `play `). The factorer's eligibility check refuses + // wrapping because a wrapped `RulesPart` whose members default + // would feed the matcher's "missing value for default" path. + // Match results must still agree. + const text = ` = $(t:) -> { v: t }; + = play $(a:string) | play $(b:string) loud | play $(c:string) softly;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of [ + "play hello", + "play hello loud", + "play hello softly", + ]) { + const baseRes = match(baseline, input); + const optRes = match(optimized, input); + expect(optRes.length).toBe(baseRes.length); + expect(optRes).toEqual(expect.arrayContaining(baseRes)); + } + }); + + it("bails out (mixed-value-presence) when some members have value and others don't", () => { + // Two alternatives share a literal prefix. One has an + // explicit value, the other doesn't. The factorer's + // eligibility check refuses to wrap a fork with mixed + // value-presence: a wrapper RulesPart binds either everything + // or nothing, not both. + const text = ` = $(t:) -> { v: t }; + = play $(a:string) -> { kind: "a", a } | play $(b:string);`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + const baseRes = match(baseline, "play hello"); + const optRes = match(optimized, "play hello"); + expect(optRes.length).toBe(baseRes.length); + expect(optRes).toEqual(expect.arrayContaining(baseRes)); + }); + + // ── Trie-edge variant coverage ────────────────────────────────────── + + it("does not merge wildcards with different optional flags into one edge", () => { + // Two alternatives share `play $(x:string)` shape but one + // marks the wildcard optional. `edgeKeyMatches` requires + // matching `optional` flags, so the factorer must keep them as + // distinct edges. Match results agree. + const text = ` = play $(x:string) here -> { kind: "here", x } + | play $(y:string)? there -> { kind: "there", y };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of [ + "play hello here", + "play hello there", + "play there", + ]) { + const baseRes = match(baseline, input); + const optRes = match(optimized, input); + expect(optRes.length).toBe(baseRes.length); + expect(optRes).toEqual(expect.arrayContaining(baseRes)); + } + }); + + it("does not merge wildcards with different typeNames into one edge", () => { + // `string` and `wildcard` typeNames produce distinct trie + // edges (different runtime entity-type semantics). Match + // results agree. + const text = ` = play $(x:string) once -> { kind: "s", x } + | play $(y:wildcard) twice -> { kind: "w", y };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play hello once", "play hello twice"]) { + const baseRes = match(baseline, input); + const optRes = match(optimized, input); + expect(optRes.length).toBe(baseRes.length); + expect(optRes).toEqual(expect.arrayContaining(baseRes)); + } + }); }); diff --git a/ts/packages/actionGrammar/test/grammarOptimizerInline.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerInline.spec.ts index fbf56e4a4..b18078e1b 100644 --- a/ts/packages/actionGrammar/test/grammarOptimizerInline.spec.ts +++ b/ts/packages/actionGrammar/test/grammarOptimizerInline.spec.ts @@ -318,4 +318,306 @@ describe("Grammar Optimizer - Inline single-alternative RulesPart", () => { { x: "alice", y: "bob" }, ]); }); + + // ── propagate-variable branch (child has no value expression) ─────── + // + // When the child rule has no explicit `value` and the parent + // captures the RulesPart into a variable, the inliner re-targets + // the parent's binding onto the child's single variable-bearing + // part. Other (literal) child parts come along unchanged. + + it("propagates parent's variable onto child's single wildcard part", () => { + // Child is just `$(name:string)` with no value — the matcher's + // default-value rule binds parent's `t` to whatever `name` + // captured. After inlining, parent's `t` lands directly on the + // wildcard. + const text = ` = play $(t:) -> { what: t }; + = $(name:string);`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBeLessThan( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "play hello")).toStrictEqual( + match(baseline, "play hello"), + ); + expect(match(optimized, "play hello")).toStrictEqual([ + { what: "hello" }, + ]); + }); + + it("propagates parent's variable onto child's single number part", () => { + const text = ` = set $(v:) -> { value: v }; + = $(n:number);`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBeLessThan( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "set 42")).toStrictEqual( + match(baseline, "set 42"), + ); + expect(match(optimized, "set 42")).toStrictEqual([{ value: 42 }]); + }); + + it("propagates parent's variable onto child's single nested rules part", () => { + // Child has a single `` reference (a RulesPart) with no + // value. Parent binds it via `t`. After inlining, parent's + // `t` should land on the inner RulesPart so it captures + // 's value. + const text = ` = play $(t:) -> { what: t }; + = ; + = $(name:string) -> { who: name };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBeLessThan( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "play hello")).toStrictEqual( + match(baseline, "play hello"), + ); + expect(match(optimized, "play hello")).toStrictEqual([ + { what: { who: "hello" } }, + ]); + }); + + it("propagates parent's variable through multi-part child with single variable + literal siblings", () => { + // Child is `the $(name:string)` — one literal sibling plus one + // wildcard. Matcher default-value binds parent's `t` to the + // wildcard's capture. After inlining, the literal "the" comes + // along unchanged and parent's `t` lands on the wildcard. + const text = ` = play $(t:) -> { what: t }; + = the $(name:string);`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBeLessThan( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "play the hello")).toStrictEqual( + match(baseline, "play the hello"), + ); + expect(match(optimized, "play the hello")).toStrictEqual([ + { what: "hello" }, + ]); + }); + + // ── default-value behaviors not handled by the propagate branch ───── + // + // The matcher's `defaultValue` flag also fires for single-part + // rules whose only part is a string literal or phraseSet — the + // matcher returns the literal text as the rule's value. Neither + // string nor phraseSet parts can carry a `variable`, so the + // optimizer's propagate branch refuses to inline these. Verify + // that the wrapper stays in place AND that the value still flows. + + it("leaves single string-literal child nested when parent captures via variable", () => { + // ` = hello;` produces "hello" as its value; parent's + // `t` binds to that. Inlining would require putting `t` onto + // the string part — which the type system forbids — so the + // optimizer must leave the wrapper rule alone. + const text = ` = play $(t:) -> { what: t }; + = hello;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + // Wrapper stays — same rules-part count. + expect(countRulesParts(optimized.rules)).toBe( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "play hello")).toStrictEqual( + match(baseline, "play hello"), + ); + expect(match(optimized, "play hello")).toStrictEqual([ + { what: "hello" }, + ]); + }); + + it("leaves multi-string-literal child nested when parent captures via variable", () => { + // ` = hello world;` is a single-part rule (the two + // tokens compile into one StringPart). Same situation as the + // single-token case: matcher returns the matched text as the + // value, but it can't be propagated onto the StringPart. + const text = ` = $(x:) -> x; + = hello world;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBe( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "hello world")).toStrictEqual( + match(baseline, "hello world"), + ); + expect(match(optimized, "hello world")).toStrictEqual(["hello world"]); + }); + + it("inlines unbound rule reference when parent has no capture (drop branch)", () => { + // Parent has no `part.variable`, so the propagate branch is + // skipped and the final fall-through `return { parts: child.parts }` + // applies. The matcher's default-value behavior on the + // wrapper is irrelevant since nothing observes it. + const text = ` = play now -> true; + = hello;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBeLessThan( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "play hello now")).toStrictEqual( + match(baseline, "play hello now"), + ); + expect(match(optimized, "play hello now")).toStrictEqual([true]); + }); + + it("multi-part parent with no value defaults to captured child.value", () => { + // Parent has multi parts (`play now`), no explicit + // value, and captures the RulesPart into `t`. The compiler + // marks the parent as `hasValue` because variableCount === 1, + // and the matcher's default-value rule produces `t`'s captured + // value (= child.value = `{ who: name }`) as the parent's + // result. Verify the optimizer preserves this behavior — the + // hoist branch only fires for single-part parents, so this + // case must take the substitute / drop path or refuse to + // inline. + const text = ` = play $(t:) now; + = $(name:string) loud -> { who: name };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(match(optimized, "play hello loud now")).toStrictEqual( + match(baseline, "play hello loud now"), + ); + expect(match(baseline, "play hello loud now")).toStrictEqual([ + { who: "hello" }, + ]); + }); + + it("multi-part parent with no value defaults to captured child via propagate branch", () => { + // Same parent shape, but child has no explicit value (one + // variable → matcher defaults to that variable). After + // inlining, the propagate branch re-targets parent's `t` onto + // the child's wildcard, and the matcher's default-value rule + // still produces the wildcard's capture as the parent's value. + const text = ` = play $(t:) now; + = $(name:string);`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBeLessThan( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "play hello now")).toStrictEqual( + match(baseline, "play hello now"), + ); + expect(match(baseline, "play hello now")).toStrictEqual(["hello"]); + }); + + // ── High-priority coverage gaps ───────────────────────────────────── + + it("propagate branch preserves child's optional flag on the bound part", () => { + // Child's variable-bearing part is optional. After inlining, + // parent's `t` lands on a part that's still optional. Match + // both with and without the optional segment present. + const text = ` = play $(t:) -> { what: t }; + = $(name:string)? world;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + for (const input of ["play hello world", "play world"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + it("inline skips when child spacingMode is 'none' but parent is auto", () => { + // Different explicit modes on parent and child must NOT inline: + // the child's "none" boundary semantics differ from auto at e.g. + // digit↔Latin transitions. + const text = ` = play -> true; + [spacing=none] = a b;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBe( + countRulesParts(baseline.rules), + ); + expect(match(optimized, "play ab")).toStrictEqual( + match(baseline, "play ab"), + ); + }); + + it("inline skips when parent and child have differing explicit spacing modes", () => { + // Required vs none: matcher boundary behavior differs entirely; + // inlining would change accepted inputs. + const text = ` [spacing=required] = play -> true; + [spacing=none] = ab;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBe( + countRulesParts(baseline.rules), + ); + }); + + it("inline refuses to duplicate a shared child rule (refCounts > 1)", () => { + // is referenced by two distinct call sites. Inlining + // it at one site would still leave the other site referencing + // the original array — net effect is a duplicate copy in the + // serialized grammar instead of one shared rule. The optimizer + // must refuse. + const text = ` = play -> "play" | sing -> "sing"; + = hello;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(countRulesParts(optimized.rules)).toBe( + countRulesParts(baseline.rules), + ); + for (const input of ["play hello", "sing hello"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // Regression: when the child's single variable-bearing part is + // itself a *bound* nested rules part (and child has no explicit + // value), the propagate branch re-targets parent's variable onto + // the rules part. Earlier this diverged from baseline; now fixed + // by the broadened Hoist branch (which handles multi-part parents + // capturing a value-producing child via `part.variable`). + it("propagate branch handles bound nested rules part", () => { + const text = ` = play $(t:) -> { what: t }; + = the $(x:); + = $(name:string) -> { who: name };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { inlineSingleAlternatives: true }, + }); + expect(match(optimized, "play the hello")).toStrictEqual( + match(baseline, "play the hello"), + ); + expect(match(baseline, "play the hello")).toStrictEqual([ + { what: { who: "hello" } }, + ]); + }); }); From 76a185fe7eade597ddc50c3c06aa64164f3f4ec1 Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Thu, 23 Apr 2026 22:04:04 -0700 Subject: [PATCH 11/16] =?UTF-8?q?actionGrammar:=20optimizer=20cleanup=20?= =?UTF-8?q?=E2=80=94=20Map-keyed=20trie,=20factor=20refactor,=20docs?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - factorRulesPart split into thin RulesPart wrapper + flat factorRules core; top-level factoring now reuses factorRules directly instead of wrapping Grammar.rules in a synthetic RulesPart. - Trie children switched from TrieNode[] (linear edgeKeyMatches scan) to Map keyed by stepMergeKey for O(1) insertion. GrammarRule[] array identity preserved via a per-BuildState WeakMap interner so the rules-edge merge key stays primitive without losing array-identity sharing. Removed now-unused edgeKeyMatches. - emitFromNode prefix construction made linear: appendPart (returned a fresh array per step → O(depth²)) replaced with in-place appendPartInPlace. - factorCommonPrefixes JSDoc warns that top-level factoring destroys the 1:1 source-rule index correspondence. - Inliner second-pass comment corrected: factoring never emits single-alternative wrappers itself; the second inline pass exists to collapse sub-RulesParts inside emitted suffixes that became inline-eligible. - BuildState JSDoc clarifies its scope and contrasts with RenameState. - bench/grammarOptimizerBenchmark.ts: comment documents the dist/bench path-relative grammar-file assumption. - README adds an 'Optimizer benchmarks' section explaining the build-then-run flow for pnpm run bench:synthetic / bench:real. - docs/architecture/actionGrammar.md: rewrote the Compile-time optimizations section to match current code (Hoist/Substitute/Drop sub-strategies, opaque __opt_v_/__opt_inline_ canonicals, parity check, local bailout, no fixed-point loop, top-level index caveat, factorRules vs. factorRulesPart, Map-keyed children, src/bench/ standalone scripts). - Test consolidation: merged grammarOptimizerFactoringRepro.spec.ts and grammarOptimizerTrieRisks.spec.ts into grammarOptimizerFactoring.spec.ts (28 tests preserved exactly; 5 spec files → 3). - Full action-grammar suite passes (2339 tests); prettier clean. --- ts/docs/architecture/actionGrammar.md | 222 ++++++---- ts/packages/actionGrammar/README.md | 18 + .../src/bench/grammarOptimizerBenchmark.ts | 6 + .../actionGrammar/src/grammarOptimizer.ts | 251 +++++++---- .../test/grammarOptimizerFactoring.spec.ts | 408 ++++++++++++++++++ .../grammarOptimizerFactoringRepro.spec.ts | 113 ----- .../test/grammarOptimizerTrieRisks.spec.ts | 323 -------------- 7 files changed, 731 insertions(+), 610 deletions(-) delete mode 100644 ts/packages/actionGrammar/test/grammarOptimizerFactoringRepro.spec.ts delete mode 100644 ts/packages/actionGrammar/test/grammarOptimizerTrieRisks.spec.ts diff --git a/ts/docs/architecture/actionGrammar.md b/ts/docs/architecture/actionGrammar.md index dde268f75..a455e773e 100644 --- a/ts/docs/architecture/actionGrammar.md +++ b/ts/docs/architecture/actionGrammar.md @@ -394,15 +394,16 @@ through `LoadGrammarRulesOptions.optimizations`: ```typescript loadGrammarRules("agent.agr", text, { - optimizations: { - inlineSingleAlternatives: true, - factorCommonPrefixes: true, - }, + optimizations: { + inlineSingleAlternatives: true, + factorCommonPrefixes: true, + }, }); ``` The optimizer runs after value-expression validation, so it operates on -fully-compiled `CompiledValueNode`s. +fully-compiled `CompiledValueNode`s. It is skipped entirely when the +compile produced any errors (the AST may be partial). #### Pass 1 — Inline single-alternative `RulesPart` @@ -412,22 +413,47 @@ rule's parts. This removes one layer of `ParentMatchState` push/pop and `finalizeNestedRule` in the matcher, which is common for named rules that simply delegate to a single sub-rule. -A `RulesPart` is inlined only when **all** of the following hold: - -- `part.rules.length === 1` -- `!part.repeat` and `!part.optional` (loop-back / optional semantics - must be preserved) -- The child rule has no explicit `value` expression (an inlined value - would no longer fire under the parent's value-tracking policy) -- The child rule's `spacingMode` is compatible with the parent - (either `undefined` to inherit, or identical to the parent's mode) -- If `part.variable` is set, the child must consist of a single - direct-capture part (`wildcard` or `number`) so the variable name - can be propagated onto it. Variable propagation is **never** pushed - onto a nested `RulesPart` — that scope is structurally distinct from - the parent's and would silently drop the binding for cases like - `$(x:)` where `` produces a nested object via its own - value expression. +A `RulesPart` is inlined only when **all** structural preconditions +hold: + +- `part.rules.length === 1` and `!part.repeat && !part.optional` +- The child rule has at least one part +- `child.spacingMode === parentRule.spacingMode` (exact equality; + `undefined` is treated as a distinct "auto" mode at the matcher + level, not as "inherit from parent") +- The body of the child's `rules` array is not shared by any other + `RulesPart` in the input AST. A pre-pass reference-counts every + `GrammarRule[]` array; inlining a shared body would duplicate the + child's parts at every call site and defeat the serializer's + identity-based dedup (see "shared-rule identity preservation" + below). + +When the child carries an explicit `value` expression, one of three +sub-strategies fires: + +- **Hoist** — parent has no value of its own and either has a single + part (this `RulesPart`) or captures via `part.variable`. The matcher + would have computed the parent's value at runtime via its + default-value rule using `child.value`; we synthesize that + assignment explicitly onto the parent's `value`. +- **Substitute** — parent captures via `part.variable` AND has its + own value expression. The (α-renamed) `child.value` is substituted + for the captured variable in `parent.value`. +- **Drop** — `child.value` is unobservable at runtime; inline only the + child's parts. + +For Substitute and Drop the child's top-level bindings are α-renamed +to fresh opaque names (`__opt_inline_`, per-parent counter) so +they cannot collide with sibling parts the parent already has. Hoist +into a single-part parent skips the rename — there are no siblings. + +When the child has no value expression and the parent captures via +`part.variable`, the child must contain exactly one binding-friendly +part (`wildcard`, `number`, or `rules`). The parent's variable name +is re-targeted onto that single part in place. (This is the only +case where a binding may legitimately move onto a nested +`RulesPart` — the absence of a child value rules out the silent-drop +hazard that the value-bearing branches handle via Hoist/Substitute.) #### Pass 2 — Common prefix factoring @@ -442,79 +468,113 @@ play the track -> "track" ⇒ | track -> "track" play the album -> "album" | album -> "album") ``` -**Prefix shape.** Two alternatives share a prefix of `(fullParts, -stringTokens)` shape: `fullParts` parts that are structurally equal -(via `partsEqualForFactoring`), optionally followed by `stringTokens` -matching leading tokens of a shared `StringPart`. The partial-string -case lets `play the song | play the track` factor even though -`play the song` and `play the track` are each tokenized into a single -multi-token `StringPart`. - -**Variable remapping.** `partsEqualForFactoring` treats variable parts -(`wildcard`, `number`, `rules`) as equal when their type/shape matches -even if the variable names differ. The lead alternative provides the -canonical names; for each non-lead member a `remap: Map` is built and applied to the suffix's parts and value -expression via `remapPartVariables` and `remapValueVariables`. Object -shorthand `{ x }` (compiled as `{ key: "x", value: null }`) is -expanded to `{ key: "x", value: variable("renamed") }` during remap so -the object field name stays the same. - -**Wrapper value capture.** When any suffix carries a value expression, -the new wrapper rule has more than one part and the matcher's default -single-part value-tracking policy no longer fires. The optimizer -generates a fresh variable name (`__opt_factor`, `__opt_factor_1`, …), -binds the suffix `RulesPart` to it, and produces a wrapper value +**Top-level factoring.** After nested factoring completes, the +top-level `Grammar.rules` array is factored against itself — the +matcher treats top-level alternatives the same way it treats inner +`RulesPart` alternatives. The trie build/emit core (`factorRules`) +operates on a flat `GrammarRule[]`, so it is reused directly at the +top level without wrapping the array in a synthetic `RulesPart`. +This intentionally **destroys the 1:1 correspondence between +top-level rule indices and the original source**; downstream consumers +that depend on that mapping must capture it before enabling +`factorCommonPrefixes`. + +**Implementation.** Factoring is implemented as a trie build + +post-order emission inside `factorRules` (with `factorRulesPart` as a +thin `RulesPart`-aware wrapper that handles the `repeat`/`optional` +bailout): + +- Each rule is inserted as a sequence of "atomic" steps. `StringPart` + explodes into one `(string, token)` edge per token in `value[]`, + so `["play", "song"]` and `["play", "album"]` share the `"play"` + edge but branch at the next token. `wildcard`, `number`, `rules`, + and `phraseSet` parts each yield one edge. `rules` edges key by + `rules` array identity. +- `edgeKeyMatches` ignores variable _names_ on variable-bearing edges + but requires binding _presence parity_ on `rules` edges (so + `` and `$(v:)` do not silently merge into the same + child). In the current implementation this comparison is encoded + into a primitive `stepMergeKey` so that the trie children map can + perform an O(1) lookup instead of an O(siblings) scan. +- Emission walks the trie post-order. Single-child / no-terminal + chains are path-compressed back into a flat parts array (with + adjacent `StringPart`s re-merged at the seam), and multi-member + nodes become wrapper `RulesPart`s. + +**Opaque canonical names.** Variable-bearing trie edges carry a fresh +opaque canonical name (`__opt_v_`) allocated per `factorRules` +invocation, _not_ the first inserter's user-supplied variable name. This eliminates two collision classes +that any "first inserter wins" scheme is vulnerable to: outer-scope +shadow (a non-lead's value referencing an outer name that happens to +match the lead's local) and bound-vs-unbound `rules` parity. Each +inserter accumulates a `local → canonical` remap that is applied to +its terminal's `value` expression at emission time. `remapValueVariables` +expands object-shorthand `{ foo }` to `{ foo: }` so the +object field name stays the same. + +**Wrapper value capture.** When any factored member carries a value +expression, the wrapper rule has more than one part and the matcher's +default single-part value rule no longer fires. The optimizer +generates a fresh `__opt_factor` / `__opt_factor_` name (avoiding +any name already bound in prefix or members), binds the suffix +`RulesPart` to it, and sets the wrapper's `value` to `{ type: "variable", name: "__opt_factor" }` — preserving the suffix value through the new nesting level. -**Iteration.** Factoring is applied to a fixed point per `RulesPart` -(capped at 8 rounds) since a freshly-factored suffix may itself share -a new prefix among its members. When both passes are enabled, Pass 1 -runs again after factoring so that any single-alternative wrappers -produced by factoring collapse away. +**Local bailout on eligibility failure.** Per-fork eligibility checks +run before each wrapper is built; on failure the would-be members +are emitted as separate full rules with the canonical prefix +prepended, losing factoring at _that fork only_. Factoring above and +below the failing fork still applies. Failure reasons: + +- **Whole-consumed.** A member's parts were entirely consumed by the + prefix (empty-parts suffix) — the matcher cannot default-value + resolve an empty-parts rule inside a wrapped `RulesPart`. +- **Mixed value presence.** Some members carry explicit `value`, + others rely on default-value semantics; wrapping would silently + drop the implicit values. +- **Implicit-default multipart.** All members rely on default values + but at least one suffix would end up with more than one part, + where the matcher's single-part default-value policy no longer + applies. +- **Cross-scope reference.** A suffix's value expression references + a canonical name bound by the prefix. Nested rule scope is fresh + in the matcher (entering a `RulesPart` resets `valueIds`), so the + suffix cannot see prefix bindings — bail out so each member emits + at the wrapper's level instead. + +The earlier "binding shadow" guard is no longer needed: opaque +canonicals allocated globally per `factorRules` call cannot +collide with each other. + +**No fixed-point loop.** Factoring is applied once per group of +alternatives — the trie's grouping converges in a single pass and +freshly synthesized suffix `RulesPart`s are intentionally not +re-walked. When both passes are enabled, the optimizer runs Pass 1 +once more after Pass 2 so that any sub-`RulesPart`s inside the +emitted suffixes that have become inline-eligible can collapse. **Shared-rule identity preservation.** Both passes memoize their output by `GrammarRule[]` array identity. The compiler points every reference to the same named rule (``) at the same underlying -`rules` array so [grammarSerializer.ts](packages/actionGrammar/src/grammarSerializer.ts) +`rules` array so [grammarSerializer.ts](../../packages/actionGrammar/src/grammarSerializer.ts) can dedupe via `rulesToIndex.get(p.rules)`. The optimizer preserves that invariant: two `RulesPart`s that originally pointed at the same array still point at the same (possibly new) array after the pass — -keeping `.ag.json` size proportional to unique rule bodies, and -allowing `partsEqualForFactoring`'s `a.rules === b.rules` check to -keep matching across multiple references. - -**Safety guards.** The optimizer refuses to factor when any of the -following would change semantics: - -- **Mixed value presence.** Some members have an explicit value, others - rely on default-value semantics. Wrapping changes the parent shape - and would silently drop the implicit values. -- **Multi-part defaulted suffix.** All members rely on default values - but at least one suffix would end up with more than one part, where - the matcher's single-part default-value policy no longer applies. -- **Cross-scope value reference.** A suffix's value expression - references (after remap) a variable bound in the canonical prefix. - The matcher scopes value variables per nested rule, so the suffix - cannot see prefix bindings. -- **Suffix–prefix variable collision.** A suffix-bound variable name - collides with a canonical-prefix name after remap — would shadow the - outer binding. -- **Wholly-consumed alternative with explicit value.** The shared - prefix consumes every part of some alternative that also has an - explicit value — leaves an empty-parts suffix that cannot carry the - value cleanly. +keeping `.ag.json` size proportional to unique rule bodies. #### Equivalence and benchmarks -The new `grammarOptimizer*` test specs cover unit behavior, structural -equivalence (every flag combination produces identical `matchGrammar` -output across a set of curated and real-agent grammars), and an -informational `grammarOptimizerBenchmark.spec.ts` patterned on -`dfaBenchmark.spec.ts` that prints matcher-time numbers per -configuration. Set `TYPEAGENT_SKIP_BENCHMARKS=1` to skip the -benchmark spec. +The `grammarOptimizer*.spec.ts` test suite covers unit behavior of +both passes, regression repros for previously broken factoring +patterns, structural-equivalence checks (every flag combination +produces identical `matchGrammar` output across curated and +real-agent grammars), and shared-array preservation. Standalone +informational benchmarks live under +[packages/actionGrammar/src/bench/](../../packages/actionGrammar/src/bench/); +run them via `pnpm run bench:synthetic` and `pnpm run bench:real` +from the package directory (a `pnpm run tsc` build is required first +since the bench scripts execute the compiled `dist/bench/` output). ### Matching backend diff --git a/ts/packages/actionGrammar/README.md b/ts/packages/actionGrammar/README.md index 709eee7ff..068c50eea 100644 --- a/ts/packages/actionGrammar/README.md +++ b/ts/packages/actionGrammar/README.md @@ -141,6 +141,24 @@ Key test suites: - `nfaRealGrammars.spec.ts` — End-to-end tests with production grammars - `dfa.spec.ts` — DFA compiler correctness - `dfaBenchmark.spec.ts` — Performance benchmarks +- `grammarOptimizer*.spec.ts` — Compile-time AST optimizer (inline + factor passes) + +## Optimizer benchmarks + +Standalone benchmarks for the opt-in compile-time grammar optimizer +([`src/bench/`](src/bench/)) are not part of the jest suite. They +execute the compiled output, so a build is required first: + +```bash +pnpm run tsc +pnpm run bench:synthetic # synthetic pass-through / wide-prefix grammars +pnpm run bench:real # real agent grammars (player, list, calendar) +pnpm run bench # both +``` + +Each script prints a per-configuration table comparing baseline, +inline-only, factor-only, and both. Speedup is colored once it moves +more than 10% from baseline. ## Downstream consumers diff --git a/ts/packages/actionGrammar/src/bench/grammarOptimizerBenchmark.ts b/ts/packages/actionGrammar/src/bench/grammarOptimizerBenchmark.ts index 71b4307c7..63ea029f9 100644 --- a/ts/packages/actionGrammar/src/bench/grammarOptimizerBenchmark.ts +++ b/ts/packages/actionGrammar/src/bench/grammarOptimizerBenchmark.ts @@ -44,6 +44,12 @@ function benchmarkFile( function main(): void { registerBuiltInEntities(); + // Grammar paths are resolved relative to this file's compiled + // location (`dist/bench/`). They point at sibling agent packages + // via `../../../agents//...` and assume the standard + // `packages/` layout in the workspace. If an agent grammar moves + // or the dist layout changes, the `[skip]` branch in `benchmarkFile` + // keeps the script running and prints a clear diagnostic. benchmarkFile( "player", path.resolve( diff --git a/ts/packages/actionGrammar/src/grammarOptimizer.ts b/ts/packages/actionGrammar/src/grammarOptimizer.ts index 99caf7bcb..97d615453 100644 --- a/ts/packages/actionGrammar/src/grammarOptimizer.ts +++ b/ts/packages/actionGrammar/src/grammarOptimizer.ts @@ -25,6 +25,12 @@ export type GrammarOptimizationOptions = { /** * Factor common leading parts shared across alternatives in a RulesPart. * Avoids re-matching the shared prefix while exploring each alternative. + * + * Also factors the top-level `Grammar.rules` array against itself, + * which **destroys the 1:1 correspondence between top-level rule + * indices and the original source**. Downstream consumers that + * depend on that mapping (e.g. for diagnostics that quote a source + * rule by index) must capture it before this pass runs. */ factorCommonPrefixes?: boolean; }; @@ -51,8 +57,12 @@ export function optimizeGrammar( if (options.factorCommonPrefixes) { rules = factorCommonPrefixes(rules); if (options.inlineSingleAlternatives) { - // Factoring can produce new single-alternative wrapper rules; - // run the inliner once more so they collapse. + // Factoring never emits a single-alternative wrapper itself + // (factorRulesPart only wraps when members.length >= 2), but + // the suffix RulesParts it builds can contain inner + // single-alternative RulesParts that were not visible to + // Pass 1 in their pre-factored shape. Re-run the inliner so + // those collapse. rules = inlineSingleAlternativeRules(rules); } } @@ -467,13 +477,13 @@ export function factorCommonPrefixes(rules: GrammarRule[]): GrammarRule[] { const memo: RulesArrayMemo = new Map(); let result = factorRulesArray(rules, counter, memo); - // Top-level factoring: wrap the (already nested-factored) top-level - // rules in a synthetic `RulesPart` so we can reuse `factorRulesPart` - // unchanged. Newly synthesized suffix `RulesPart`s produced here are - // not themselves re-walked, matching the existing behavior for nested - // factoring. - const wrapper: RulesPart = { type: "rules", rules: result }; - result = factorRulesPart(wrapper, counter).rules; + // Top-level factoring: the matcher treats top-level alternatives the + // same way it treats inner `RulesPart` alternatives (each is queued + // as its own `MatchState` and produces its own result), so the same + // trie-based factoring applies. Newly synthesized suffix + // `RulesPart`s produced here are not themselves re-walked, matching + // the existing behavior for nested factoring. + result = factorRules(result, counter); if (counter.factored > 0) { debug(`factored ${counter.factored} common prefix groups`); @@ -542,8 +552,32 @@ function factorParts( } /** - * Common-prefix factoring inside a single RulesPart, implemented as a - * trie-build + post-order emission. + * Common-prefix factoring inside a single RulesPart. Thin wrapper + * around `factorRules` that respects the `RulesPart`'s repeat/optional + * flags (which change matcher loop-back semantics and so block + * factoring) and re-wraps the factored alternatives back into the + * `RulesPart` shape on success. + */ +function factorRulesPart( + part: RulesPart, + counter: { factored: number }, +): RulesPart { + if (part.repeat || part.optional) { + // Repeat/optional change the matcher's loop-back semantics; leave + // such groups untouched to stay safe. + return part; + } + const factored = factorRules(part.rules, counter); + if (factored === part.rules) return part; + return { ...part, rules: factored }; +} + +/** + * Common-prefix factoring over a flat list of alternatives, implemented + * as a trie-build + post-order emission. Used both for the + * alternatives inside a single `RulesPart` (via `factorRulesPart`) and + * for the top-level `Grammar.rules` array (which the matcher treats + * the same way as inner alternatives). * * Each rule is inserted as a sequence of "atomic" steps: * - StringPart explodes into one ("string", token) edge per token in @@ -568,36 +602,35 @@ function factorParts( * losing factoring at that fork only (factoring above and below the * fork still applies). * - * Returns the same object if no factoring took place. + * Returns the same array if no factoring took place. */ -function factorRulesPart( - part: RulesPart, +function factorRules( + rules: GrammarRule[], counter: { factored: number }, -): RulesPart { - if (part.repeat || part.optional) { - // Repeat/optional change the matcher's loop-back semantics; leave - // such groups untouched to stay safe. - return part; - } - if (part.rules.length < 2) return part; +): GrammarRule[] { + if (rules.length < 2) return rules; - const buildState: BuildState = { nextCanonicalId: 0 }; - const root: TrieRoot = { children: [], terminals: [] }; - for (let i = 0; i < part.rules.length; i++) { - insertRuleIntoTrie(root, part.rules[i], i, buildState); + const buildState: BuildState = { + nextCanonicalId: 0, + rulesArrayIds: new WeakMap(), + nextRulesArrayId: 0, + }; + const root: TrieRoot = { children: new Map(), terminals: [] }; + for (let i = 0; i < rules.length; i++) { + insertRuleIntoTrie(root, rules[i], i, buildState); } const state: EmitState = { didFactor: false }; const items: { idx: number; rules: GrammarRule[] }[] = []; - for (const c of root.children) { + for (const c of root.children.values()) { items.push({ idx: c.firstIdx, rules: emitFromNode(c, state) }); } items.sort((a, b) => a.idx - b.idx); const newRules: GrammarRule[] = items.flatMap((it) => it.rules); - if (!state.didFactor) return part; + if (!state.didFactor) return rules; counter.factored++; - return { ...part, rules: newRules }; + return newRules; } // ── Trie data structures ───────────────────────────────────────────────── @@ -674,28 +707,90 @@ type Terminal = { remap: Map; }; +/** + * Per-`factorRulesPart`-invocation counter used to mint opaque canonical + * variable names (`__opt_v_`) on variable-bearing trie edges, plus + * an interner for `GrammarRule[]` array identities (used to build a + * primitive-keyed children Map without losing array-identity merging + * for `` references). + * + * Scope is one `RulesPart` because canonicals never escape the wrapper + * rule we're about to emit — a fresh BuildState per invocation is + * enough to guarantee within-RulesPart uniqueness. Distinct from + * `RenameState` (which scopes per-parent-rule and produces + * `__opt_inline_` names for the inliner pass). + */ +/** + * Per-`factorRulesPart`-invocation counter used to mint opaque canonical + * variable names (`__opt_v_`) on variable-bearing trie edges, plus + * an interner for `GrammarRule[]` array identities (used to build a + * primitive-keyed children Map without losing array-identity merging + * for `` references). + */ type BuildState = { nextCanonicalId: number; + rulesArrayIds: WeakMap; + nextRulesArrayId: number; }; function freshCanonical(state: BuildState): string { return `__opt_v_${state.nextCanonicalId++}`; } +function rulesArrayId(state: BuildState, rules: GrammarRule[]): number { + let id = state.rulesArrayIds.get(rules); + if (id === undefined) { + id = state.nextRulesArrayId++; + state.rulesArrayIds.set(rules, id); + } + return id; +} + +/** + * Compute a primitive merge key for a trie step. Two steps with + * the same key share a child node at insertion time — the same + * pairing `edgeKeyMatches` performs by walking sibling edges, but + * O(1) via a `Map` lookup. For variable-bearing + * kinds the variable *name* is omitted (names are remapped); for + * `rules` edges the binding presence (bound vs. unbound) is encoded + * so they don't merge — mirrors the parity check in `edgeKeyMatches`. + */ +function stepMergeKey(step: TrieStep, state: BuildState): string { + switch (step.kind) { + case "string": + return `s:${step.token}`; + case "wildcard": + return `w:${step.typeName}:${step.optional ? 1 : 0}`; + case "number": + return `n:${step.optional ? 1 : 0}`; + case "rules": { + const id = rulesArrayId(state, step.rules); + return `r:${id}:${step.optional ? 1 : 0}:${step.repeat ? 1 : 0}:${step.local !== undefined ? 1 : 0}`; + } + case "phraseSet": + return `p:${step.matcherName}`; + } +} + /** * Root of the trie. Distinct from `TrieNode` so that `edge` can be * required on every non-root node — eliminating non-null assertions in * the insertion and emission code. Terminals on the root represent * empty-parts input rules (rare but legal). + * + * `children` is a `Map` keyed by `stepMergeKey` so + * insertion is O(1) per step rather than O(siblings); JS Maps preserve + * insertion order, so emission still walks children in the order they + * were first inserted. */ type TrieRoot = { - children: TrieNode[]; + children: Map; terminals: Terminal[]; }; type TrieNode = { edge: TrieEdge; - children: TrieNode[]; + children: Map; terminals: Terminal[]; /** Lowest insertion index of any rule passing through this node. */ firstIdx: number; @@ -705,7 +800,14 @@ type EmitState = { didFactor: boolean }; /** A node is "linear" iff it has no terminals and exactly one child. */ function isLinearNode(n: TrieNode): boolean { - return n.terminals.length === 0 && n.children.length === 1; + return n.terminals.length === 0 && n.children.size === 1; +} + +/** Return the sole child of a linear node (caller must guarantee linearity). */ +function onlyChild(n: TrieNode): TrieNode { + // Map iteration order is insertion order; for size===1 there is + // exactly one entry to read. + return n.children.values().next().value!; } // ── Trie insertion ─────────────────────────────────────────────────────── @@ -720,21 +822,16 @@ function insertRuleIntoTrie( let terminals = root.terminals; const remap = new Map(); for (const step of partsToEdgeSteps(rule.parts)) { - let matched: TrieNode | undefined; - for (const c of children) { - if (edgeKeyMatches(c.edge, step)) { - matched = c; - break; - } - } + const key = stepMergeKey(step, buildState); + let matched = children.get(key); if (matched === undefined) { matched = { edge: stepToEdge(step, buildState), - children: [], + children: new Map(), terminals: [], firstIdx: idx, }; - children.push(matched); + children.set(key, matched); } recordStepRemap(matched.edge, step, remap); children = matched.children; @@ -821,47 +918,6 @@ function stepToEdge(step: TrieStep, buildState: BuildState): TrieEdge { } } -/** - * True if `step`'s key fields match `edge` for trie merging. For - * variable-bearing kinds the *names* are ignored (they get remapped), - * but for `rules` edges binding *presence* must agree — see notes - * above the `TrieStep`/`TrieEdge` types. - */ -function edgeKeyMatches(edge: TrieEdge, step: TrieStep): boolean { - if (edge.kind !== step.kind) return false; - // After the kind check, `step` has the same variant as `edge`; the - // cast inside each branch narrows it accordingly. - switch (edge.kind) { - case "string": - return edge.token === (step as { token: string }).token; - case "wildcard": { - const s = step as { typeName: string; optional: boolean }; - return edge.typeName === s.typeName && edge.optional === s.optional; - } - case "number": - return edge.optional === (step as { optional: boolean }).optional; - case "rules": { - const s = step as { - rules: GrammarRule[]; - optional: boolean; - repeat: boolean; - local: string | undefined; - }; - return ( - edge.rules === s.rules && - edge.optional === s.optional && - edge.repeat === s.repeat && - (edge.canonical === undefined) === (s.local === undefined) - ); - } - case "phraseSet": - return ( - edge.matcherName === - (step as { matcherName: string }).matcherName - ); - } -} - /** * Record the `local → canonical` rename for one inserter at one trie * step. Throws on conflict (same local mapped to two canonicals on @@ -924,18 +980,27 @@ function edgeToPart(edge: TrieEdge): GrammarPart { } } -/** Append `part` to `prefix`, folding when both ends are StringParts. */ -function appendPart(prefix: GrammarPart[], part: GrammarPart): GrammarPart[] { - if (prefix.length === 0) return [part]; +/** + * Append `part` to `prefix` in place, folding when both ends are + * StringParts (i.e. merging `last.value` and `part.value` into one + * `StringPart`). Mutating in place keeps path-compression linear in + * chain depth — returning a fresh array on every step would be + * O(depth²). + */ +function appendPartInPlace(prefix: GrammarPart[], part: GrammarPart): void { + if (prefix.length === 0) { + prefix.push(part); + return; + } const last = prefix[prefix.length - 1]; if (last.type === "string" && part.type === "string") { - const merged: GrammarPart = { + prefix[prefix.length - 1] = { type: "string", value: [...last.value, ...part.value], }; - return [...prefix.slice(0, prefix.length - 1), merged]; + return; } - return [...prefix, part]; + prefix.push(part); } /** Concatenate two parts arrays, folding at the seam if both ends are strings. */ @@ -979,15 +1044,15 @@ function emitFromNode(node: TrieNode, state: EmitState): GrammarRule[] { // way the fork's edge becomes the first part of each emitted member // (avoiding empty-parts members at the fork, which would defeat // factoring via the wholeConsumed-with-value check below). - let prefix: GrammarPart[] = [edgeToPart(node.edge)]; + const prefix: GrammarPart[] = [edgeToPart(node.edge)]; let current = node; while ( current.terminals.length === 0 && - current.children.length === 1 && - isLinearNode(current.children[0]) + current.children.size === 1 && + isLinearNode(onlyChild(current)) ) { - current = current.children[0]; - prefix = appendPart(prefix, edgeToPart(current.edge)); + current = onlyChild(current); + appendPartInPlace(prefix, edgeToPart(current.edge)); } // Members at this fork = its terminals (each as an empty-parts rule) @@ -996,7 +1061,7 @@ function emitFromNode(node: TrieNode, state: EmitState): GrammarRule[] { for (const t of current.terminals) { items.push({ idx: t.idx, rules: [terminalToRule(t)] }); } - for (const c of current.children) { + for (const c of current.children.values()) { items.push({ idx: c.firstIdx, rules: emitFromNode(c, state) }); } items.sort((a, b) => a.idx - b.idx); diff --git a/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts index 4c2a8aa35..e2b2e7a0e 100644 --- a/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts +++ b/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts @@ -272,3 +272,411 @@ describe("Grammar Optimizer - Common prefix factoring", () => { } }); }); + +// ─── Merged from grammarOptimizerFactoringRepro.spec.ts ────────────────────── +describe("Grammar Optimizer - Factoring Repro", () => { + it("handles alternatives that re-use the same variable name", () => { + const text = ` = ; + = play $(trackName:string) -> { kind: "solo", trackName } + | play $(trackName:string) by $(artist:string) -> { kind: "duet", trackName, artist };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of [ + "play Hello", + "play Shake It Off by Taylor Swift", + ]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + it("handles a group that is fully consumed by the shared prefix", () => { + const text = ` = ; + = play -> "just" + | play the song -> "song" + | play the track -> "track";`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play", "play the song", "play the track"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + it("handles mixed explicit / default value alternatives", () => { + const text = ` = ; + = play the song + | play the track -> "custom";`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play the song", "play the track"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + it("handles shared literal prefix with distinct wrapped RulesParts (player-like)", () => { + const text = ` = ; + = $(trackName:string) -> trackName + | the $(trackName:string) -> trackName; + = play $(trackName:) by $(artist:string) -> { kind: "byArtist", trackName, artist } + | play $(trackName:) from album $(albumName:string) -> { kind: "fromAlbum", trackName, albumName };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of [ + "play hello by taylor", + "play the hello by taylor", + "play hello from album unity", + ]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // Regression for the failure surfaced by the optimizer benchmark + // against the player grammar: + // + // "Internal error: No value for variable 'trackName'. + // Values: {"name":"artist","valueId":4}" + // + // Object shorthand `{ trackName }` compiles to a property element + // with `value: null` (key = "trackName", expanded at evaluation + // time to `trackName: trackName`). Variable-renaming during + // factoring must (a) detect that the key is a variable reference + // and (b) rewrite it without changing the object field name. + it("rewrites object shorthand keys when remapping variables", () => { + const text = ` = ; + = greet $(name:string) -> { name } + | greet $(other:string) twice -> { other };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["greet alice", "greet bob twice"]) { + // No "Internal error" thrown, and matches identical. + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); +}); + +// ─── Merged from grammarOptimizerTrieRisks.spec.ts ────────────────────────── +function findAllRulesParts(rules: GrammarRule[]): RulesPart[] { + const out: RulesPart[] = []; + const visit = (parts: GrammarPart[]) => { + for (const p of parts) { + if (p.type === "rules") { + out.push(p); + for (const r of p.rules) visit(r.parts); + } + } + }; + for (const r of rules) visit(r.parts); + return out; +} + +describe("Grammar Optimizer - Trie risks", () => { + // ── Risk: cross-scope reference forces bailout, but factoring above + // the bailed fork still applies. + it("bailout at one fork still allows factoring above", () => { + // binds `trackName`; both alternatives reference + // it in their value expression — factoring the RulesPart + // would put the binding into outer scope, which the matcher + // can't see. The deep fork bails, but `play` should still get + // factored at the outer level. + const text = ` = ; + = $(trackName:string) -> trackName | the $(trackName:string) -> trackName; + = play $(trackName:) by $(artist:string) -> { kind: "by", trackName, artist } + | play $(trackName:) from album $(albumName:string) -> { kind: "from", trackName, albumName };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of [ + "play hello by alice", + "play the world from album unity", + ]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: shared RulesPart array identity must be preserved (the + // serializer dedupes by Map). + it("preserves shared RulesPart array identity", () => { + // Two top-level alternatives both reference . After + // factoring, every emitted RulesPart that points at + // should share the same `rules` array object. + const text = ` = ; + = a -> 1 | b -> 2; + = play $(x:) -> { kind: "play", x } + | stop $(x:) -> { kind: "stop", x };`; + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + const innerRulesArrays = new Set(); + for (const rp of findAllRulesParts(optimized.rules)) { + // Heuristic: any RulesPart with two child rules whose first + // parts are both phraseSet/string and whose values are 1 / 2 + // is the body. + if ( + rp.rules.length === 2 && + rp.rules[0].value !== undefined && + rp.rules[1].value !== undefined && + JSON.stringify(rp.rules[0].value) === + '{"type":"literal","value":1}' && + JSON.stringify(rp.rules[1].value) === + '{"type":"literal","value":2}' + ) { + innerRulesArrays.add(rp.rules); + } + } + // Both references to produce edges that point at the + // same `rules` array (Set size === 1). + expect(innerRulesArrays.size).toBe(1); + }); + + // ── Risk: a rule whose entire path is a strict prefix of another's + // path becomes a terminal AND a forking node at the same + // trie spot. + it("handles a rule that is a strict prefix of another (no values)", () => { + const text = ` = ; + = play + | play song;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play", "play song"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: same as above, but mixed value-presence forces bailout + // (the shorter rule has explicit value, the longer doesn't). + it("handles strict-prefix overlap with mixed value-presence", () => { + const text = ` = ; + = play -> "just" + | play song;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play", "play song"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: deep multi-level factoring — three layers of shared + // prefix should all collapse. + it("factors at multiple depths (a b c x | a b c y | a b d z)", () => { + const text = ` = ; + = a b c x -> 1 + | a b c y -> 2 + | a b d z -> 3;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["a b c x", "a b c y", "a b d z"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: variable name collision across alternatives — the lead + // alternative's variable name wins; later alternatives' value + // expressions must be remapped. + it("canonicalizes variable names from differently-named bindings", () => { + const text = ` = ; + = play $(track:string) once -> { kind: "once", track } + | play $(song:string) twice -> { kind: "twice", v: song };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play hello once", "play hello twice"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: factoring intersects with object-shorthand value. After + // remapping, `{ name }` from a non-lead alternative must be + // expanded to `{ name: }` so the field key + // doesn't change. + it("rewrites object shorthand keys when remapping (non-lead alt)", () => { + const text = ` = ; + = greet $(name:string) -> { name } + | greet $(other:string) twice -> { other };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["greet alice", "greet bob twice"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: order preservation across multiple groups at the same + // trie level — output ordering should match the original + // rule order semantically (same matches). + it("preserves match order across interleaved groups", () => { + // Three groups: foo*, bar*, foo* again. Trie merges the two + // foo* rules (insertion-order at root), bar stays separate. + const text = ` = ; + = foo a -> 1 + | bar -> 2 + | foo b -> 3;`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["foo a", "foo b", "bar"]) { + // Same multi-set of match results. + const baseRes = match(baseline, input).map((m) => + JSON.stringify(m), + ); + const optRes = match(optimized, input).map((m) => + JSON.stringify(m), + ); + expect(optRes.sort()).toStrictEqual(baseRes.sort()); + } + }); + + // ── Risk: nested factoring + outer factoring composing — the inner + // RulesPart returned by emit() is reused as a member at the + // outer level, so the wrapper's variable name must not + // collide with the inner wrapper's. + it("avoids wrapper-variable collisions across nested factoring", () => { + const text = ` = ; + = play song red -> "sr" + | play song blue -> "sb" + | play album green -> "ag" + | play album yellow -> "ay";`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of [ + "play song red", + "play song blue", + "play album green", + "play album yellow", + ]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: outer-name shadow. With first-inserter-wins canonical + // naming, a non-lead alternative whose value happens to use + // a name that matches the lead's local binding would have + // the local renamed onto the lead's canonical, silently + // changing what name the value resolves against. With + // opaque canonicals (`__opt_v_`) this collision class is + // impossible by construction; the emitted variable name is + // synthetic and cannot collide with any user-named ref. + it("opaque canonicals avoid outer-name shadowing", () => { + // Both alternatives bind their wildcard but the *non-lead* one + // happens to spell its local with the same name (`x`) the lead + // would have used as canonical. Under first-inserter-wins the + // second's value `{tag: "B", v: x}` would alias the lead's `x`; + // under opaque canonicals each side keeps its own remap and the + // emitted output is unambiguous. + const text = ` = ; + = play $(x:string) once -> { tag: "A", v: x } + | play $(x:string) twice -> { tag: "B", v: x };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play hello once", "play world twice"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: bound vs. unbound RulesPart references at the same edge. + // Without binding-presence parity they would merge, either + // inventing a binding the unbound side never had or + // dropping a binding the bound side depends on. + it("does not merge bound and unbound references", () => { + // Two alternatives both reference ; the second binds it + // and uses the binding in its value. Parity check should keep + // them as separate trie children. + const text = ` = ; + = a -> 1 | b -> 2; + = play -> "no-bind" + | play $(v:) -> { kind: "bound", v };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play a", "play b"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + // ── Risk: under "first-inserter-wins" canonical naming, the lead's + // local becomes the canonical for the merged prefix edge. + // A NON-LEAD alternative can have a SUFFIX binding whose + // name happens to match the lead's local — and whose value + // expression references that name. Under the broken + // scheme, the non-lead's value references the suffix + // binding, but matcher resolution hits the prefix binding + // first (the suffix binding is in the wrapper's nested + // scope and the value would *not* see it correctly). + // + // Under the opaque scheme: prefix canonical is `__opt_v_0` + // (synthetic, cannot collide with user names), and the + // non-lead's suffix binding `x` stays `x` after remap (its + // local doesn't get renamed because the suffix binding is + // on a DIFFERENT trie edge from the prefix). Value `{x}` + // resolves to the suffix binding correctly. + // + // Critically, this also exercises the "lead must record + // its own remap" property: the lead's `x` local in its + // value expression must be remapped to `__opt_v_0`. + // Without that remap, the matcher fails to resolve `x`. + it("opaque canonicals + lead remap handle prefix/suffix name reuse", () => { + const text = ` = ; + = play $(x:string) -> { kind: "lead", v: x } + | play $(a:string) then $(x:string) -> { kind: "alt", v: x };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["play hello", "play first then second"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); +}); diff --git a/ts/packages/actionGrammar/test/grammarOptimizerFactoringRepro.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerFactoringRepro.spec.ts deleted file mode 100644 index 02e32fe04..000000000 --- a/ts/packages/actionGrammar/test/grammarOptimizerFactoringRepro.spec.ts +++ /dev/null @@ -1,113 +0,0 @@ -// Copyright (c) Microsoft Corporation. -// Licensed under the MIT License. - -/** - * Targeted reproduction tests for factoring edge cases that previously - * broke the player grammar. Keep these as regression tests. - */ - -import { loadGrammarRules } from "../src/grammarLoader.js"; -import { matchGrammar } from "../src/grammarMatcher.js"; - -function match(grammar: ReturnType, s: string) { - return matchGrammar(grammar, s).map((m) => m.match); -} - -describe("Grammar Optimizer - Factoring Repro", () => { - it("handles alternatives that re-use the same variable name", () => { - const text = ` = ; - = play $(trackName:string) -> { kind: "solo", trackName } - | play $(trackName:string) by $(artist:string) -> { kind: "duet", trackName, artist };`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of [ - "play Hello", - "play Shake It Off by Taylor Swift", - ]) { - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); - - it("handles a group that is fully consumed by the shared prefix", () => { - const text = ` = ; - = play -> "just" - | play the song -> "song" - | play the track -> "track";`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of ["play", "play the song", "play the track"]) { - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); - - it("handles mixed explicit / default value alternatives", () => { - const text = ` = ; - = play the song - | play the track -> "custom";`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of ["play the song", "play the track"]) { - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); - - it("handles shared literal prefix with distinct wrapped RulesParts (player-like)", () => { - const text = ` = ; - = $(trackName:string) -> trackName - | the $(trackName:string) -> trackName; - = play $(trackName:) by $(artist:string) -> { kind: "byArtist", trackName, artist } - | play $(trackName:) from album $(albumName:string) -> { kind: "fromAlbum", trackName, albumName };`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of [ - "play hello by taylor", - "play the hello by taylor", - "play hello from album unity", - ]) { - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); - - // Regression for the failure surfaced by the optimizer benchmark - // against the player grammar: - // - // "Internal error: No value for variable 'trackName'. - // Values: {"name":"artist","valueId":4}" - // - // Object shorthand `{ trackName }` compiles to a property element - // with `value: null` (key = "trackName", expanded at evaluation - // time to `trackName: trackName`). Variable-renaming during - // factoring must (a) detect that the key is a variable reference - // and (b) rewrite it without changing the object field name. - it("rewrites object shorthand keys when remapping variables", () => { - const text = ` = ; - = greet $(name:string) -> { name } - | greet $(other:string) twice -> { other };`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of ["greet alice", "greet bob twice"]) { - // No "Internal error" thrown, and matches identical. - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); -}); diff --git a/ts/packages/actionGrammar/test/grammarOptimizerTrieRisks.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerTrieRisks.spec.ts deleted file mode 100644 index eb0a073f2..000000000 --- a/ts/packages/actionGrammar/test/grammarOptimizerTrieRisks.spec.ts +++ /dev/null @@ -1,323 +0,0 @@ -// Copyright (c) Microsoft Corporation. -// Licensed under the MIT License. - -/** - * Targeted tests for the trie-based common-prefix factoring rewrite: - * each test exercises a specific risk category called out during the - * design (see grammarOptimizer.ts factorRulesPart docstring). - */ - -import { loadGrammarRules } from "../src/grammarLoader.js"; -import { matchGrammar } from "../src/grammarMatcher.js"; -import { GrammarPart, GrammarRule, RulesPart } from "../src/grammarTypes.js"; - -function match(grammar: ReturnType, s: string) { - return matchGrammar(grammar, s).map((m) => m.match); -} - -function findAllRulesParts(rules: GrammarRule[]): RulesPart[] { - const out: RulesPart[] = []; - const visit = (parts: GrammarPart[]) => { - for (const p of parts) { - if (p.type === "rules") { - out.push(p); - for (const r of p.rules) visit(r.parts); - } - } - }; - for (const r of rules) visit(r.parts); - return out; -} - -describe("Grammar Optimizer - Trie risks", () => { - // ── Risk: cross-scope reference forces bailout, but factoring above - // the bailed fork still applies. - it("bailout at one fork still allows factoring above", () => { - // binds `trackName`; both alternatives reference - // it in their value expression — factoring the RulesPart - // would put the binding into outer scope, which the matcher - // can't see. The deep fork bails, but `play` should still get - // factored at the outer level. - const text = ` = ; - = $(trackName:string) -> trackName | the $(trackName:string) -> trackName; - = play $(trackName:) by $(artist:string) -> { kind: "by", trackName, artist } - | play $(trackName:) from album $(albumName:string) -> { kind: "from", trackName, albumName };`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of [ - "play hello by alice", - "play the world from album unity", - ]) { - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); - - // ── Risk: shared RulesPart array identity must be preserved (the - // serializer dedupes by Map). - it("preserves shared RulesPart array identity", () => { - // Two top-level alternatives both reference . After - // factoring, every emitted RulesPart that points at - // should share the same `rules` array object. - const text = ` = ; - = a -> 1 | b -> 2; - = play $(x:) -> { kind: "play", x } - | stop $(x:) -> { kind: "stop", x };`; - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - const innerRulesArrays = new Set(); - for (const rp of findAllRulesParts(optimized.rules)) { - // Heuristic: any RulesPart with two child rules whose first - // parts are both phraseSet/string and whose values are 1 / 2 - // is the body. - if ( - rp.rules.length === 2 && - rp.rules[0].value !== undefined && - rp.rules[1].value !== undefined && - JSON.stringify(rp.rules[0].value) === - '{"type":"literal","value":1}' && - JSON.stringify(rp.rules[1].value) === - '{"type":"literal","value":2}' - ) { - innerRulesArrays.add(rp.rules); - } - } - // Both references to produce edges that point at the - // same `rules` array (Set size === 1). - expect(innerRulesArrays.size).toBe(1); - }); - - // ── Risk: a rule whose entire path is a strict prefix of another's - // path becomes a terminal AND a forking node at the same - // trie spot. - it("handles a rule that is a strict prefix of another (no values)", () => { - const text = ` = ; - = play - | play song;`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of ["play", "play song"]) { - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); - - // ── Risk: same as above, but mixed value-presence forces bailout - // (the shorter rule has explicit value, the longer doesn't). - it("handles strict-prefix overlap with mixed value-presence", () => { - const text = ` = ; - = play -> "just" - | play song;`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of ["play", "play song"]) { - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); - - // ── Risk: deep multi-level factoring — three layers of shared - // prefix should all collapse. - it("factors at multiple depths (a b c x | a b c y | a b d z)", () => { - const text = ` = ; - = a b c x -> 1 - | a b c y -> 2 - | a b d z -> 3;`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of ["a b c x", "a b c y", "a b d z"]) { - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); - - // ── Risk: variable name collision across alternatives — the lead - // alternative's variable name wins; later alternatives' value - // expressions must be remapped. - it("canonicalizes variable names from differently-named bindings", () => { - const text = ` = ; - = play $(track:string) once -> { kind: "once", track } - | play $(song:string) twice -> { kind: "twice", v: song };`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of ["play hello once", "play hello twice"]) { - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); - - // ── Risk: factoring intersects with object-shorthand value. After - // remapping, `{ name }` from a non-lead alternative must be - // expanded to `{ name: }` so the field key - // doesn't change. - it("rewrites object shorthand keys when remapping (non-lead alt)", () => { - const text = ` = ; - = greet $(name:string) -> { name } - | greet $(other:string) twice -> { other };`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of ["greet alice", "greet bob twice"]) { - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); - - // ── Risk: order preservation across multiple groups at the same - // trie level — output ordering should match the original - // rule order semantically (same matches). - it("preserves match order across interleaved groups", () => { - // Three groups: foo*, bar*, foo* again. Trie merges the two - // foo* rules (insertion-order at root), bar stays separate. - const text = ` = ; - = foo a -> 1 - | bar -> 2 - | foo b -> 3;`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of ["foo a", "foo b", "bar"]) { - // Same multi-set of match results. - const baseRes = match(baseline, input).map((m) => - JSON.stringify(m), - ); - const optRes = match(optimized, input).map((m) => - JSON.stringify(m), - ); - expect(optRes.sort()).toStrictEqual(baseRes.sort()); - } - }); - - // ── Risk: nested factoring + outer factoring composing — the inner - // RulesPart returned by emit() is reused as a member at the - // outer level, so the wrapper's variable name must not - // collide with the inner wrapper's. - it("avoids wrapper-variable collisions across nested factoring", () => { - const text = ` = ; - = play song red -> "sr" - | play song blue -> "sb" - | play album green -> "ag" - | play album yellow -> "ay";`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of [ - "play song red", - "play song blue", - "play album green", - "play album yellow", - ]) { - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); - - // ── Risk: outer-name shadow. With first-inserter-wins canonical - // naming, a non-lead alternative whose value happens to use - // a name that matches the lead's local binding would have - // the local renamed onto the lead's canonical, silently - // changing what name the value resolves against. With - // opaque canonicals (`__opt_v_`) this collision class is - // impossible by construction; the emitted variable name is - // synthetic and cannot collide with any user-named ref. - it("opaque canonicals avoid outer-name shadowing", () => { - // Both alternatives bind their wildcard but the *non-lead* one - // happens to spell its local with the same name (`x`) the lead - // would have used as canonical. Under first-inserter-wins the - // second's value `{tag: "B", v: x}` would alias the lead's `x`; - // under opaque canonicals each side keeps its own remap and the - // emitted output is unambiguous. - const text = ` = ; - = play $(x:string) once -> { tag: "A", v: x } - | play $(x:string) twice -> { tag: "B", v: x };`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of ["play hello once", "play world twice"]) { - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); - - // ── Risk: bound vs. unbound RulesPart references at the same edge. - // Without binding-presence parity they would merge, either - // inventing a binding the unbound side never had or - // dropping a binding the bound side depends on. - it("does not merge bound and unbound references", () => { - // Two alternatives both reference ; the second binds it - // and uses the binding in its value. Parity check should keep - // them as separate trie children. - const text = ` = ; - = a -> 1 | b -> 2; - = play -> "no-bind" - | play $(v:) -> { kind: "bound", v };`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of ["play a", "play b"]) { - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); - - // ── Risk: under "first-inserter-wins" canonical naming, the lead's - // local becomes the canonical for the merged prefix edge. - // A NON-LEAD alternative can have a SUFFIX binding whose - // name happens to match the lead's local — and whose value - // expression references that name. Under the broken - // scheme, the non-lead's value references the suffix - // binding, but matcher resolution hits the prefix binding - // first (the suffix binding is in the wrapper's nested - // scope and the value would *not* see it correctly). - // - // Under the opaque scheme: prefix canonical is `__opt_v_0` - // (synthetic, cannot collide with user names), and the - // non-lead's suffix binding `x` stays `x` after remap (its - // local doesn't get renamed because the suffix binding is - // on a DIFFERENT trie edge from the prefix). Value `{x}` - // resolves to the suffix binding correctly. - // - // Critically, this also exercises the "lead must record - // its own remap" property: the lead's `x` local in its - // value expression must be remapped to `__opt_v_0`. - // Without that remap, the matcher fails to resolve `x`. - it("opaque canonicals + lead remap handle prefix/suffix name reuse", () => { - const text = ` = ; - = play $(x:string) -> { kind: "lead", v: x } - | play $(a:string) then $(x:string) -> { kind: "alt", v: x };`; - const baseline = loadGrammarRules("t.grammar", text); - const optimized = loadGrammarRules("t.grammar", text, { - optimizations: { factorCommonPrefixes: true }, - }); - for (const input of ["play hello", "play first then second"]) { - expect(match(optimized, input)).toStrictEqual( - match(baseline, input), - ); - } - }); -}); From 04bd17c46066d4055f38c1a74cc7f1518d5872b8 Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Thu, 23 Apr 2026 22:25:19 -0700 Subject: [PATCH 12/16] actionGrammar: extend optimizer test coverage Adds tests for previously uncovered paths in grammarOptimizer.ts: - grammarOptimizerValueExpressions.spec.ts: exercises every CompiledValueNode arm in substituteValueVariables (inliner Substitute branch) and collectVariableReferences (factorer cross-scope-ref check) -- array, binary/unary/conditional/member/call expressions, spread element, template literal, object spread, object shorthand-with-sub. - grammarOptimizerFactoring.spec.ts: number-edge factoring (with and without optional flag) and wrapper-rule spacingMode propagation (positive + negative). - grammarOptimizerIntegration.spec.ts: optimizeGrammar early-return paths, inliner refusal when child has multiple variable-bearing parts, compiler skipping optimization on grammars with errors, and observable effect of the post-factor inline pass. grammarOptimizer.ts coverage: 86.18% -> 96.73% lines, 75.45% -> 90% branches, 98% -> 100% functions. Tests: 119 -> 142, all passing. --- .../test/grammarOptimizerFactoring.spec.ts | 80 +++++++ .../test/grammarOptimizerIntegration.spec.ts | 177 ++++++++++++++ .../grammarOptimizerValueExpressions.spec.ts | 222 ++++++++++++++++++ 3 files changed, 479 insertions(+) create mode 100644 ts/packages/actionGrammar/test/grammarOptimizerIntegration.spec.ts create mode 100644 ts/packages/actionGrammar/test/grammarOptimizerValueExpressions.spec.ts diff --git a/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts index e2b2e7a0e..67f9657e9 100644 --- a/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts +++ b/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts @@ -680,3 +680,83 @@ describe("Grammar Optimizer - Trie risks", () => { } }); }); + +describe("Grammar Optimizer - Trie edge variants (number, phraseSet, optional)", () => { + // ── Number-edge factoring: `stepMergeKey` keys number edges by + // optional flag only; both alternatives share the same + // number-with-no-optional edge and should merge. + it("factors a shared number wildcard prefix across alternatives", () => { + const text = ` = ; + = volume $(n:number) up -> { dir: "up", n } + | volume $(n:number) down -> { dir: "down", n };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + const optChoice = findFirstRulesPart(optimized.rules); + const baseChoice = findFirstRulesPart(baseline.rules); + // Factoring collapses 2 alternatives into 1 wrapper. + expect(optChoice!.rules.length).toBeLessThan(baseChoice!.rules.length); + for (const input of ["volume 5 up", "volume 7 down"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + it("factors with optional-number edges merging only with matching flag", () => { + // Two alternatives that share `set $(n:number)?` (optional + // number). The optional flag on the number edge is part of + // the merge key; both sides agree, so factoring fires. + const text = ` = ; + = set $(n:number)? on -> { state: "on", n } + | set $(n:number)? off -> { state: "off", n };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of ["set on", "set 5 on", "set off", "set 7 off"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); +}); + +describe("Grammar Optimizer - Wrapper rule spacingMode propagation", () => { + // When all factored members share a non-default `spacingMode`, the + // synthesized wrapper rule inherits it. Top-level has + // multiple definitions each annotated `[spacing=required]`; the + // top-level factorer factors across them and the resulting + // wrapper rule must carry `spacingMode: "required"`. + it("propagates shared explicit spacingMode onto the wrapper rule", () => { + const text = ` [spacing=required] = play hello -> 1; + [spacing=required] = play world -> 2;`; + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + // Top-level reduces to a single shared-prefix wrapper. + expect(optimized.rules.length).toBe(1); + expect(optimized.rules[0].spacingMode).toBe("required"); + + // And matching still respects the required-spacing semantics. + const baseline = loadGrammarRules("t.grammar", text); + for (const input of ["play hello", "play world", "playhello"]) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + }); + + it("does not set wrapper spacingMode when members disagree", () => { + // Members with differing spacingMode → wrapper stays default + // (auto / undefined). + const text = ` [spacing=required] = play hello -> 1; + [spacing=optional] = play world -> 2;`; + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + expect(optimized.rules.length).toBe(1); + expect(optimized.rules[0].spacingMode).toBeUndefined(); + }); +}); diff --git a/ts/packages/actionGrammar/test/grammarOptimizerIntegration.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerIntegration.spec.ts new file mode 100644 index 000000000..e5f759d7c --- /dev/null +++ b/ts/packages/actionGrammar/test/grammarOptimizerIntegration.spec.ts @@ -0,0 +1,177 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Integration / API-surface coverage for `optimizeGrammar`, + * `inlineSingleAlternativeRules`, and `factorCommonPrefixes` — + * exercising the public function entry points and the compiler-level + * orchestration. + * + * Specifically: + * - `optimizeGrammar(grammar, undefined)` early-return path. + * - Compiler skips optimization when the grammar has parse/compile + * errors (the AST may be partial and optimizer invariants would + * not hold). + * - The defensive guard in the inliner that refuses to retarget a + * parent's `part.variable` onto a child rule with multiple + * variable-bearing parts (unreachable from any well-formed source + * grammar — exercised by direct AST construction). + * - The two-pass inline + factor pipeline observably collapses + * wrappers exposed by factoring (re-runs the inliner after + * factoring). + */ + +import { + loadGrammarRules, + loadGrammarRulesNoThrow, +} from "../src/grammarLoader.js"; +import { matchGrammar } from "../src/grammarMatcher.js"; +import { + inlineSingleAlternativeRules, + optimizeGrammar, +} from "../src/grammarOptimizer.js"; +import { Grammar, GrammarPart, GrammarRule } from "../src/grammarTypes.js"; + +function countRulesParts(rules: GrammarRule[]): number { + let n = 0; + const seen = new Set(); + const visit = (parts: GrammarPart[]) => { + for (const p of parts) { + if (p.type === "rules") { + n++; + if (seen.has(p.rules)) continue; + seen.add(p.rules); + for (const r of p.rules) visit(r.parts); + } + } + }; + for (const r of rules) visit(r.parts); + return n; +} + +describe("Grammar Optimizer - Public API entry points", () => { + it("optimizeGrammar returns the input grammar unchanged when options is undefined", () => { + const grammar: Grammar = { + rules: [{ parts: [{ type: "string", value: ["hello"] }] }], + }; + const result = optimizeGrammar(grammar, undefined); + // Same object identity — early return, no copy. + expect(result).toBe(grammar); + }); + + it("optimizeGrammar returns the input grammar unchanged when no flags are set", () => { + const grammar: Grammar = { + rules: [{ parts: [{ type: "string", value: ["hello"] }] }], + }; + // Both flags off → both passes skipped → returns same identity. + const result = optimizeGrammar(grammar, {}); + expect(result).toBe(grammar); + }); + + it("inlineSingleAlternativeRules refuses to retarget parent.variable onto child with multiple variable-bearing parts", () => { + // Direct AST construction: this shape is rejected by the + // grammar compiler (a no-value rule with two wildcards + // violates the matcher's default-value contract), so it is + // unreachable from real source. But the inliner still + // defends against it — when parent captures child via a + // variable and child has more than one variable-bearing part, + // the inliner can't pick which child binding should receive + // the parent's variable name and must leave the wrapper + // nested. + const childRules: GrammarRule[] = [ + { + parts: [ + { type: "wildcard", typeName: "string", variable: "a" }, + { type: "wildcard", typeName: "string", variable: "b" }, + ], + // No value expression. + }, + ]; + const parentRules: GrammarRule[] = [ + { + parts: [ + { + type: "rules", + rules: childRules, + variable: "captured", + }, + ], + }, + ]; + const optimized = inlineSingleAlternativeRules(parentRules); + // No inlining took place — the RulesPart is preserved and the + // result has the same identity (no rewrite). + expect(optimized).toBe(parentRules); + // And the inner shape is unchanged. + const part = optimized[0].parts[0]; + expect(part.type).toBe("rules"); + if (part.type === "rules") { + expect(part.rules).toBe(childRules); + } + }); +}); + +describe("Grammar Optimizer - Compiler integration", () => { + it("compiler skips optimization when the grammar has errors", () => { + // Malformed grammar: is referenced but never + // defined. parseAndCompileGrammar reports an error; the + // compiler must NOT call optimizeGrammar (which could choke on + // the partial AST) and loadGrammarRulesNoThrow returns + // undefined. + const errors: string[] = []; + const result = loadGrammarRulesNoThrow( + "t.grammar", + ` = play ;`, + errors, + undefined, + { + optimizations: { + inlineSingleAlternatives: true, + factorCommonPrefixes: true, + }, + }, + ); + // Returned undefined and reported at least one error — no + // exception was thrown by the optimizer running on a partial + // grammar. + expect(result).toBeUndefined(); + expect(errors.length).toBeGreaterThan(0); + }); +}); + +describe("Grammar Optimizer - Two-pass inline+factor pipeline", () => { + it("inline + factor produces no more RulesParts than factor alone", () => { + // Grammar where factoring exposes single-alternative wrappers + // that the post-factor inline pass can collapse. At minimum, + // the inline+factor combo must not be larger than factor-only. + const text = ` = play -> 1 | sing -> 2; + = ; + = hello;`; + const factorOnly = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + const both = loadGrammarRules("t.grammar", text, { + optimizations: { + inlineSingleAlternatives: true, + factorCommonPrefixes: true, + }, + }); + + // Combo result must be at least as small (in RulesPart count) + // as factor-only — the inline pass at the end of the pipeline + // is observably non-destructive. + expect(countRulesParts(both.rules)).toBeLessThanOrEqual( + countRulesParts(factorOnly.rules), + ); + + // And matches still agree with baseline. + const baseline = loadGrammarRules("t.grammar", text); + for (const input of ["hello", "play hello", "sing hello"]) { + const baseMatches = matchGrammar(baseline, input).map( + (m) => m.match, + ); + const bothMatches = matchGrammar(both, input).map((m) => m.match); + expect(bothMatches).toStrictEqual(baseMatches); + } + }); +}); diff --git a/ts/packages/actionGrammar/test/grammarOptimizerValueExpressions.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerValueExpressions.spec.ts new file mode 100644 index 000000000..743c6fecb --- /dev/null +++ b/ts/packages/actionGrammar/test/grammarOptimizerValueExpressions.spec.ts @@ -0,0 +1,222 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Coverage for the value-expression rewrite paths in + * `substituteValueVariables` (inliner Substitute branch) and + * `collectVariableReferences` (factorer cross-scope-ref check). + * + * Both walks recurse over every `CompiledValueNode` kind; the existing + * optimizer specs only exercised `literal`, `variable`, and `object`. + * The remaining node types (`array`, `binaryExpression`, + * `unaryExpression`, `conditionalExpression`, `memberExpression`, + * `callExpression`, `spreadElement`, `templateLiteral`, plus the object + * `spread` element and the shorthand-with-substitution branch) are + * exercised here. + */ + +import { loadGrammarRules } from "../src/grammarLoader.js"; +import { matchGrammar } from "../src/grammarMatcher.js"; + +function load(text: string, withOpt: boolean) { + if (withOpt) { + return loadGrammarRules("t.grammar", text, { + enableValueExpressions: true, + optimizations: { + inlineSingleAlternatives: true, + factorCommonPrefixes: true, + }, + }); + } + return loadGrammarRules("t.grammar", text, { + enableValueExpressions: true, + }); +} + +function match(grammar: ReturnType, request: string) { + return matchGrammar(grammar, request).map((m) => m.match); +} + +function expectAgrees(text: string, inputs: string[]) { + const baseline = load(text, false); + const optimized = load(text, true); + for (const input of inputs) { + expect(match(optimized, input)).toStrictEqual(match(baseline, input)); + } +} + +describe("Grammar Optimizer - Value expression rewrites (Substitute branch)", () => { + // Each test sets up an inliner Substitute scenario: + // - is a single-alternative wrapper with its own value + // expression — the inliner inlines its parts and substitutes + // 's value for the parent's capture variable. + // - The parent has its own value expression that references the + // capture variable inside the node type under test. + // After substitution, the matcher evaluates the rewritten parent + // value and the result must match the unoptimized baseline. + + it("substitutes through array node", () => { + const text = ` = play $(t:) here -> [t, "tail"]; + = $(name:string) -> name;`; + expectAgrees(text, ["play hello here"]); + expect(match(load(text, true), "play hello here")).toStrictEqual([ + ["hello", "tail"], + ]); + }); + + it("substitutes through binaryExpression node", () => { + const text = ` = play $(t:) here -> t + "_suffix"; + = $(name:string) -> name;`; + expectAgrees(text, ["play hello here"]); + expect(match(load(text, true), "play hello here")).toStrictEqual([ + "hello_suffix", + ]); + }); + + it("substitutes through unaryExpression node", () => { + const text = ` = play $(t:) here -> typeof t; + = $(name:string) -> name;`; + expectAgrees(text, ["play hello here"]); + expect(match(load(text, true), "play hello here")).toStrictEqual([ + "string", + ]); + }); + + it("substitutes through conditionalExpression node", () => { + const text = ` = play $(t:) here -> t === "hi" ? "yes" : "no"; + = $(name:string) -> name;`; + expectAgrees(text, ["play hi here", "play bye here"]); + expect(match(load(text, true), "play hi here")).toStrictEqual(["yes"]); + expect(match(load(text, true), "play bye here")).toStrictEqual(["no"]); + }); + + it("substitutes through memberExpression node (computed property)", () => { + const text = ` = play $(t:) here -> t.length; + = $(name:string) -> name;`; + expectAgrees(text, ["play hello here"]); + expect(match(load(text, true), "play hello here")).toStrictEqual([5]); + }); + + it("substitutes through memberExpression node (computed index)", () => { + const text = ` = play $(t:) here -> t[0]; + = $(name:string) -> name;`; + expectAgrees(text, ["play hello here"]); + expect(match(load(text, true), "play hello here")).toStrictEqual(["h"]); + }); + + it("substitutes through callExpression node", () => { + const text = ` = play $(t:) here -> t.toUpperCase(); + = $(name:string) -> name;`; + expectAgrees(text, ["play hello here"]); + expect(match(load(text, true), "play hello here")).toStrictEqual([ + "HELLO", + ]); + }); + + it("substitutes through spreadElement (array spread)", () => { + const text = ` = play $(t:) here -> [...t.split(" "), "tail"]; + = $(name:string) -> name;`; + expectAgrees(text, ["play one two here"]); + expect(match(load(text, true), "play one two here")).toStrictEqual([ + ["one", "two", "tail"], + ]); + }); + + it("substitutes through templateLiteral node", () => { + const text = ` = play $(t:) here -> \`hello \${t}!\`; + = $(name:string) -> name;`; + expectAgrees(text, ["play world here"]); + expect(match(load(text, true), "play world here")).toStrictEqual([ + "hello world!", + ]); + }); + + it("substitutes through object spread element", () => { + // produces an object; parent spreads it. + const text = ` = play $(t:) here -> { ...t, extra: 1 }; + = $(name:string) -> { name };`; + expectAgrees(text, ["play hello here"]); + expect(match(load(text, true), "play hello here")).toStrictEqual([ + { name: "hello", extra: 1 }, + ]); + }); + + it("substitutes through object shorthand key (expands to {key: replacement})", () => { + // Parent value uses shorthand `{ t }` — when `t` is the + // captured variable, substituteValueVariables expands it to + // `{ t: }` so the field name stays `t`. + const text = ` = play $(t:) here -> { t }; + = $(name:string) -> { who: name };`; + expectAgrees(text, ["play hello here"]); + expect(match(load(text, true), "play hello here")).toStrictEqual([ + { t: { who: "hello" } }, + ]); + }); +}); + +describe("Grammar Optimizer - Value expression walks (cross-scope-ref check)", () => { + // Force the factorer's `cross-scope-ref` eligibility check to fire + // by giving members a value expression that references the + // canonical variable bound by the wrapper's prefix. The check + // walks each member's value via `collectVariableReferences`. + // + // The check returns "cross-scope-ref" when ANY member references a + // prefix-bound canonical, forcing a bailout. Even when no member + // actually references a prefix canonical (the common case), the + // walk still visits every node — exercising the recursion arms. + + function expectFactoringSafe(text: string, inputs: string[]) { + const baseline = loadGrammarRules("t.grammar", text, { + enableValueExpressions: true, + }); + const optimized = loadGrammarRules("t.grammar", text, { + enableValueExpressions: true, + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of inputs) { + expect(match(optimized, input)).toStrictEqual( + match(baseline, input), + ); + } + } + + it("walks array, binary, ternary, member, call, spread, template node arms", () => { + // Two alternatives share `play $(x:string) ` as a prefix — the + // wrapper binds `x` (canonicalized). Each member's value + // exercises a different node kind referencing its own local + // `y`, not `x`, so the cross-scope check finds no collision + // (no bailout) but `collectVariableReferences` walks every + // node arm. + const text = ` = ; + = play $(x:string) once $(y:string) -> [y, y + "!", y === "hi" ? 1 : 2, y.length, y.toUpperCase(), ...y.split(""), \`<\${y}>\`] + | play $(x:string) twice $(y:number) -> [-y, !y ? "z" : "nz", y > 0];`; + expectFactoringSafe(text, [ + "play a once hi", + "play a twice 5", + "play a twice 0", + ]); + }); + + it("walks object spread and shorthand value arms", () => { + // Two alternatives share `play $(x:string) ` prefix; each + // member's value contains object spread + shorthand + // referencing local `y` — exercises the object/spread arm of + // collectVariableReferences. + const text = ` = $(s:string) -> { s }; + = ; + = play $(x:string) once $(y:) -> { ...y, kind: "once" } + | play $(x:string) twice $(y:) -> { y, kind: "twice" };`; + expectFactoringSafe(text, ["play a once hi", "play a twice bye"]); + }); + + it("triggers cross-scope-ref bailout when member value references prefix binding", () => { + // Both members' value expressions reference `x` which is the + // shared-prefix wildcard. Since `x` becomes a wrapper-scope + // canonical that's invisible to the wrapped members at runtime, + // the factorer must bail out to keep the binding in scope. + const text = ` = ; + = play $(x:string) once -> { kind: "once", v: x } + | play $(x:string) twice -> { kind: "twice", v: x };`; + expectFactoringSafe(text, ["play hello once", "play hello twice"]); + }); +}); From 037f6390cb6afc39e1b81ad031e94f6fddb2060c Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Thu, 23 Apr 2026 23:00:03 -0700 Subject: [PATCH 13/16] actionGrammar: optimizer review fixes + fuzz spec - Fix correctness bug in factorCommonPrefixes: bail out at any fork whose members all lack a value expression. The matcher's implicit text-default only fires for single-StringPart rules, so a wrapper rule [prefix..., suffixRulesPart] with no value would throw 'missing value for default' at finalize. Caught by the new fuzzer. - Unify factor-pass naming: wrapper bindings now use freshWrapperBinding threaded through BuildState; reserved-set scan removed. - Defer-allocate in factorParts to match factorRulesArray style. - Cleaner onlyChild (no non-null chain). - Drop duplicate JSDoc on BuildState. - Add recommendedOptimizations preset (and re-export from index). - Add grammarOptimizerFuzz.spec.ts: deterministic random grammars + inputs, asserts matchGrammar agrees with and without optimizations. --- .../actionGrammar/src/grammarOptimizer.ts | 117 +++++++---- ts/packages/actionGrammar/src/index.ts | 1 + .../test/grammarOptimizerFuzz.spec.ts | 186 ++++++++++++++++++ 3 files changed, 268 insertions(+), 36 deletions(-) create mode 100644 ts/packages/actionGrammar/test/grammarOptimizerFuzz.spec.ts diff --git a/ts/packages/actionGrammar/src/grammarOptimizer.ts b/ts/packages/actionGrammar/src/grammarOptimizer.ts index 97d615453..92808eef4 100644 --- a/ts/packages/actionGrammar/src/grammarOptimizer.ts +++ b/ts/packages/actionGrammar/src/grammarOptimizer.ts @@ -35,6 +35,21 @@ export type GrammarOptimizationOptions = { factorCommonPrefixes?: boolean; }; +/** + * Recommended preset enabling all optimizations. Use this when callers + * want every safe pass on without naming each flag individually \u2014 future + * passes added here will be picked up automatically. + * + * Caveat: enabling `factorCommonPrefixes` destroys the 1:1 + * correspondence between top-level rule indices and the original + * source. Callers that need that mapping for diagnostics must capture + * it before optimization runs. + */ +export const recommendedOptimizations: GrammarOptimizationOptions = { + inlineSingleAlternatives: true, + factorCommonPrefixes: true, +}; + /** * Run enabled optimization passes against the compiled grammar AST. * The returned grammar is semantically equivalent to the input — only the @@ -531,11 +546,13 @@ function factorParts( counter: { factored: number }, memo: RulesArrayMemo, ): { parts: GrammarPart[]; changed: boolean } { - let changed = false; - const out: GrammarPart[] = []; - for (const p of parts) { + // Single-pass: only allocate `out` once an element actually changes + // (mirrors `factorRulesArray` / `inlineRulesArray`). + let out: GrammarPart[] | undefined; + for (let i = 0; i < parts.length; i++) { + const p = parts[i]; if (p.type !== "rules") { - out.push(p); + if (out !== undefined) out.push(p); continue; } // Recurse into nested rules first, preserving shared-array @@ -545,10 +562,16 @@ function factorParts( recursedRules !== p.rules ? { ...p, rules: recursedRules } : p; const working = factorRulesPart(recursed, counter); - if (working !== p) changed = true; - out.push(working); + if (out !== undefined) { + out.push(working); + } else if (working !== p) { + out = parts.slice(0, i); + out.push(working); + } } - return { parts: changed ? out : parts, changed }; + return out !== undefined + ? { parts: out, changed: true } + : { parts, changed: false }; } /** @@ -623,7 +646,10 @@ function factorRules( const state: EmitState = { didFactor: false }; const items: { idx: number; rules: GrammarRule[] }[] = []; for (const c of root.children.values()) { - items.push({ idx: c.firstIdx, rules: emitFromNode(c, state) }); + items.push({ + idx: c.firstIdx, + rules: emitFromNode(c, state, buildState), + }); } items.sort((a, b) => a.idx - b.idx); const newRules: GrammarRule[] = items.flatMap((it) => it.rules); @@ -709,9 +735,10 @@ type Terminal = { /** * Per-`factorRulesPart`-invocation counter used to mint opaque canonical - * variable names (`__opt_v_`) on variable-bearing trie edges, plus - * an interner for `GrammarRule[]` array identities (used to build a - * primitive-keyed children Map without losing array-identity merging + * variable names (`__opt_v_` and `__opt_factor_`) on + * variable-bearing trie edges and on synthesized wrapper bindings, + * plus an interner for `GrammarRule[]` array identities (used to build + * a primitive-keyed children Map without losing array-identity merging * for `` references). * * Scope is one `RulesPart` because canonicals never escape the wrapper @@ -720,13 +747,6 @@ type Terminal = { * `RenameState` (which scopes per-parent-rule and produces * `__opt_inline_` names for the inliner pass). */ -/** - * Per-`factorRulesPart`-invocation counter used to mint opaque canonical - * variable names (`__opt_v_`) on variable-bearing trie edges, plus - * an interner for `GrammarRule[]` array identities (used to build a - * primitive-keyed children Map without losing array-identity merging - * for `` references). - */ type BuildState = { nextCanonicalId: number; rulesArrayIds: WeakMap; @@ -737,6 +757,16 @@ function freshCanonical(state: BuildState): string { return `__opt_v_${state.nextCanonicalId++}`; } +/** + * Mint a fresh wrapper-binding name for `buildWrapperRule`. Uses the + * same counter as `freshCanonical` so the names are guaranteed unique + * across the whole `factorRules` invocation; the distinct prefix makes + * synthesized wrapper bindings easy to spot in serialized grammars. + */ +function freshWrapperBinding(state: BuildState): string { + return `__opt_factor_${state.nextCanonicalId++}`; +} + function rulesArrayId(state: BuildState, rules: GrammarRule[]): number { let id = state.rulesArrayIds.get(rules); if (id === undefined) { @@ -805,9 +835,10 @@ function isLinearNode(n: TrieNode): boolean { /** Return the sole child of a linear node (caller must guarantee linearity). */ function onlyChild(n: TrieNode): TrieNode { - // Map iteration order is insertion order; for size===1 there is - // exactly one entry to read. - return n.children.values().next().value!; + // Caller guarantees `n.children.size === 1` via `isLinearNode`. + // Map iteration is insertion order; for size===1 there's one entry. + const first = n.children.values().next().value; + return first as TrieNode; } // ── Trie insertion ─────────────────────────────────────────────────────── @@ -1038,7 +1069,11 @@ function terminalToRule(t: Terminal): GrammarRule { * each would-be member is emitted as a full rule with the canonical * prefix prepended. */ -function emitFromNode(node: TrieNode, state: EmitState): GrammarRule[] { +function emitFromNode( + node: TrieNode, + state: EmitState, + buildState: BuildState, +): GrammarRule[] { // Path-compress: walk down single-child / no-terminal chain, but // stop *before* entering a node that would itself be a fork — that // way the fork's edge becomes the first part of each emitted member @@ -1062,7 +1097,10 @@ function emitFromNode(node: TrieNode, state: EmitState): GrammarRule[] { items.push({ idx: t.idx, rules: [terminalToRule(t)] }); } for (const c of current.children.values()) { - items.push({ idx: c.firstIdx, rules: emitFromNode(c, state) }); + items.push({ + idx: c.firstIdx, + rules: emitFromNode(c, state, buildState), + }); } items.sort((a, b) => a.idx - b.idx); const members: GrammarRule[] = items.flatMap((it) => it.rules); @@ -1084,7 +1122,7 @@ function emitFromNode(node: TrieNode, state: EmitState): GrammarRule[] { })); } state.didFactor = true; - return [buildWrapperRule(prefix, members)]; + return [buildWrapperRule(prefix, members, buildState)]; } /** @@ -1110,8 +1148,19 @@ function checkFactoringEligible( if (!allHaveValue && !noneHaveValue) { return "mixed-value-presence"; } - if (noneHaveValue && members.some((m) => m.parts.length > 1)) { - return "implicit-default-multipart"; + if (noneHaveValue) { + // The matcher synthesizes an implicit text-concatenation + // default value only for single-part rules whose sole part + // is a StringPart (`matchStringPartWithoutWildcard` fast + // path). After factoring, the wrapper rule becomes + // `[prefix..., suffixRulesPart]` with parts.length >= 2 and + // no value expression — the implicit default no longer + // fires and `createValue` throws "missing value for default" + // at finalize time. Without a wrapper variable to + // synthesize a value into, factoring at this fork breaks + // matcher behavior whenever the parent rule relied on the + // implicit default. Bail out unconditionally. + return "no-value-implicit-default"; } // Cross-scope-ref: nested rule scope is fresh at the matcher level // (entering a `RulesPart` resets `valueIds`). If a member's value @@ -1140,22 +1189,18 @@ function checkFactoringEligible( function buildWrapperRule( prefix: GrammarPart[], members: GrammarRule[], + buildState: BuildState, ): GrammarRule { const suffixRulesPart: RulesPart = { type: "rules", rules: members }; const factoredAlt: GrammarRule = { parts: [...prefix, suffixRulesPart], }; if (members.some((m) => m.value !== undefined)) { - const reserved = new Set(collectVariableNames(prefix)); - for (const m of members) { - for (const v of collectVariableNames(m.parts)) reserved.add(v); - } - let gen = "__opt_factor"; - let i = 0; - while (reserved.has(gen)) { - i++; - gen = `__opt_factor_${i}`; - } + // Opaque counter-based name shares `BuildState.nextCanonicalId` + // with `freshCanonical`, so it can never collide with any + // canonical edge binding in this `factorRules` invocation — no + // reserved-set scan needed. + const gen = freshWrapperBinding(buildState); suffixRulesPart.variable = gen; factoredAlt.value = { type: "variable", name: gen }; } diff --git a/ts/packages/actionGrammar/src/index.ts b/ts/packages/actionGrammar/src/index.ts index 056e38ab0..0418afdf3 100644 --- a/ts/packages/actionGrammar/src/index.ts +++ b/ts/packages/actionGrammar/src/index.ts @@ -11,6 +11,7 @@ export { grammarToJson } from "./grammarSerializer.js"; export { loadGrammarRules, loadGrammarRulesNoThrow } from "./grammarLoader.js"; export type { LoadGrammarRulesOptions } from "./grammarLoader.js"; export type { GrammarOptimizationOptions } from "./grammarOptimizer.js"; +export { recommendedOptimizations } from "./grammarOptimizer.js"; export type { SchemaLoader } from "./grammarCompiler.js"; // Parser (for tooling — formatter, linters, etc.) diff --git a/ts/packages/actionGrammar/test/grammarOptimizerFuzz.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerFuzz.spec.ts new file mode 100644 index 000000000..bef15818d --- /dev/null +++ b/ts/packages/actionGrammar/test/grammarOptimizerFuzz.spec.ts @@ -0,0 +1,186 @@ +// Copyright (c) Microsoft Corporation. +// Licensed under the MIT License. + +/** + * Property-based equivalence fuzzer for the grammar optimizer. + * + * Generates random structurally-valid grammars + matching/non-matching + * inputs and asserts that `matchGrammar` returns the same multi-set of + * matches whether or not the optimizer is enabled. + * + * Targets the kind of α-rename / scope / value-substitution bugs that + * fixture-based tests are most likely to miss. Seed is fixed so + * failures are reproducible; bump `SEED` to widen exploration. + */ + +import { loadGrammarRules } from "../src/grammarLoader.js"; +import { matchGrammar } from "../src/grammarMatcher.js"; +import { recommendedOptimizations } from "../src/grammarOptimizer.js"; + +const SEED = 0xc0ffee; +const GRAMMAR_COUNT = 40; +const INPUTS_PER_GRAMMAR = 6; +const WORDS = ["a", "b", "c", "d", "e"]; +const MAX_RULES = 4; +const MAX_ALTS = 4; +const MAX_PARTS = 4; + +// Mulberry32 — small, deterministic PRNG. +function makeRng(seed: number): () => number { + let s = seed >>> 0; + return () => { + s = (s + 0x6d2b79f5) >>> 0; + let t = s; + t = Math.imul(t ^ (t >>> 15), t | 1); + t ^= t + Math.imul(t ^ (t >>> 7), t | 61); + return ((t ^ (t >>> 14)) >>> 0) / 4294967296; + }; +} + +function pick(rng: () => number, xs: T[]): T { + return xs[Math.floor(rng() * xs.length)]; +} + +function intInRange(rng: () => number, lo: number, hi: number): number { + return lo + Math.floor(rng() * (hi - lo + 1)); +} + +/** + * Build a random grammar. Rules form a DAG (rule `i` may reference + * rules `> i`), so there are no cycles and every rule terminates in + * literal words. No value expressions — keeps the fuzzer focused on + * structural rewrites; value-handling has dedicated coverage in + * grammarOptimizerValueExpressions.spec.ts. + */ +function buildRandomGrammar(rng: () => number): { + text: string; + matchingInputs: string[]; +} { + const ruleCount = intInRange(rng, 1, MAX_RULES); + const ruleName = (i: number) => `R${i}`; + const lines: string[] = []; + // Pre-generate one matching input per rule by picking the first + // alternative's expansion for each. We'll join from . + const firstAltText: string[] = new Array(ruleCount); + const firstAltMatch: string[][] = new Array(ruleCount); + + // Generate rules in reverse so that when rule i is built, rules + // > i already exist (allowing forward-only references and a + // pre-computed `firstAltMatch` for any reference site). + for (let i = ruleCount - 1; i >= 0; i--) { + const altCount = intInRange(rng, 1, MAX_ALTS); + const altTexts: string[] = []; + let firstMatch: string[] | undefined; + for (let a = 0; a < altCount; a++) { + const partCount = intInRange(rng, 1, MAX_PARTS); + const partTexts: string[] = []; + const partMatch: string[] = []; + for (let p = 0; p < partCount; p++) { + // Always allow literals; allow rule refs only when + // there's a forward rule available. + const canRef = i + 1 < ruleCount; + const useRef = canRef && rng() < 0.35; + if (useRef) { + const target = intInRange(rng, i + 1, ruleCount - 1); + partTexts.push(`<${ruleName(target)}>`); + partMatch.push(...firstAltMatch[target]); + } else { + const w = pick(rng, WORDS); + partTexts.push(w); + partMatch.push(w); + } + } + altTexts.push(partTexts.join(" ")); + if (a === 0) firstMatch = partMatch; + } + firstAltText[i] = altTexts.join(" | "); + firstAltMatch[i] = firstMatch!; + lines.push(`<${ruleName(i)}> = ${altTexts.join(" | ")};`); + } + // Reverse so is defined first (cosmetic). + lines.reverse(); + // Anchor the start symbol to . + const text = ` = ;\n${lines.join("\n")}`; + + // Matching input: the first-alternative expansion of . + const matching = firstAltMatch[0].join(" "); + // A definitely-non-matching input (uses a token outside WORDS). + const nonMatching = "zzz"; + // A truncated input. + const truncated = firstAltMatch[0].slice(0, -1).join(" ") || "x"; + return { + text, + matchingInputs: [matching, nonMatching, truncated], + }; +} + +function matchKeys( + grammar: ReturnType, + input: string, +): string[] | { error: string } { + try { + return matchGrammar(grammar, input) + .map((m) => JSON.stringify(m.match)) + .sort(); + } catch (e) { + return { error: (e as Error).message }; + } +} + +describe("Grammar Optimizer - Random equivalence fuzz", () => { + const rng = makeRng(SEED); + for (let g = 0; g < GRAMMAR_COUNT; g++) { + const { text, matchingInputs } = buildRandomGrammar(rng); + // Generate a few extra random inputs to widen coverage. + const extraInputs: string[] = []; + for (let i = 0; i < INPUTS_PER_GRAMMAR - matchingInputs.length; i++) { + const len = intInRange(rng, 1, 5); + const tokens: string[] = []; + for (let t = 0; t < len; t++) tokens.push(pick(rng, WORDS)); + extraInputs.push(tokens.join(" ")); + } + const inputs = [...matchingInputs, ...extraInputs]; + + // Compile both flavors once; reuse across inputs. Generated + // grammars don't carry value expressions, so disable the + // start-value requirement. + let baseline: ReturnType; + let optimized: ReturnType; + const loadOpts = { startValueRequired: false } as const; + try { + baseline = loadGrammarRules("fuzz.grammar", text, loadOpts); + optimized = loadGrammarRules("fuzz.grammar", text, { + ...loadOpts, + optimizations: recommendedOptimizations, + }); + } catch (e) { + it(`grammar #${g} compiles`, () => { + throw new Error( + `Generated grammar failed to compile: ${(e as Error).message}\n${text}`, + ); + }); + continue; + } + + for (const input of inputs) { + it(`grammar #${g} matches '${input}' identically`, () => { + const baseResult = matchKeys(baseline, input); + const optResult = matchKeys(optimized, input); + // If both throw, consider that consistent (the + // baseline grammar itself is the bug, not the + // optimizer). If only one throws, fail loudly. + if ( + typeof baseResult === "object" && + !Array.isArray(baseResult) && + "error" in baseResult + ) { + expect(optResult).toMatchObject({ + error: expect.any(String), + }); + return; + } + expect(optResult).toStrictEqual(baseResult); + }); + } + } +}); From e9ca9a402474dcad31289268e631afd39a708612 Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Thu, 23 Apr 2026 23:06:35 -0700 Subject: [PATCH 14/16] actionGrammarCompiler: enable optimizations by default + --debug flag Default the agc compile command to recommendedOptimizations (matches what runtime callers should use). Add a --debug flag that disables optimizations, producing an unoptimized AST that preserves the 1:1 correspondence between top-level rules and the original source for diagnostics. --- .../actionGrammarCompiler/src/commands/compile.ts | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/ts/packages/actionGrammarCompiler/src/commands/compile.ts b/ts/packages/actionGrammarCompiler/src/commands/compile.ts index ade5882eb..1ee13df0b 100644 --- a/ts/packages/actionGrammarCompiler/src/commands/compile.ts +++ b/ts/packages/actionGrammarCompiler/src/commands/compile.ts @@ -7,6 +7,7 @@ import fs from "node:fs"; import { grammarToJson, loadGrammarRulesNoThrow, + recommendedOptimizations, SchemaLoader, } from "action-grammar"; import { parseSchemaSource } from "@typeagent/action-schema"; @@ -62,6 +63,11 @@ export default class Compile extends Command { required: true, char: "o", }), + debug: Flags.boolean({ + description: + "Disable grammar optimizations (produces an unoptimized AST that preserves the 1:1 correspondence between top-level rules and the original source — useful for diagnostics).", + default: false, + }), }; async run(): Promise { @@ -75,7 +81,13 @@ export default class Compile extends Command { undefined, errors, warnings, - { startValueRequired: true, schemaLoader }, + flags.debug + ? { startValueRequired: true, schemaLoader } + : { + startValueRequired: true, + schemaLoader, + optimizations: recommendedOptimizations, + }, ); if (grammar === undefined) { From b57f6cc4f287aafb801eaf6e6b9a112a7230b965 Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Thu, 23 Apr 2026 23:13:27 -0700 Subject: [PATCH 15/16] lint --- ts/packages/actionGrammar/package.json | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/ts/packages/actionGrammar/package.json b/ts/packages/actionGrammar/package.json index 68b027570..c800d0abf 100644 --- a/ts/packages/actionGrammar/package.json +++ b/ts/packages/actionGrammar/package.json @@ -28,6 +28,9 @@ "!dist/bench" ], "scripts": { + "bench": "npm run bench:synthetic && npm run bench:real", + "bench:real": "node ./dist/bench/grammarOptimizerBenchmark.js", + "bench:synthetic": "node ./dist/bench/grammarOptimizerSyntheticBenchmark.js", "build": "npm run tsc", "clean": "rimraf --glob dist *.tsbuildinfo *.done.build.log", "jest-esm": "node --no-warnings --experimental-vm-modules ./node_modules/jest/bin/jest.js", @@ -37,9 +40,6 @@ "test:integration": "pnpm run jest-esm --testPathPattern=\"grammarGenerator.spec.js\"", "test:local": "pnpm run jest-esm --testPathPattern=\".*[.]spec[.]js\"", "test:local:debug": "node --inspect-brk --no-warnings --experimental-vm-modules ./node_modules/jest/bin/jest.js --testPathPattern=\".*\\.spec\\.js\" --testPathIgnorePatterns=\"grammarGenerator.spec.js\"", - "bench": "npm run bench:synthetic && npm run bench:real", - "bench:synthetic": "node ./dist/bench/grammarOptimizerSyntheticBenchmark.js", - "bench:real": "node ./dist/bench/grammarOptimizerBenchmark.js", "tsc": "tsc -b" }, "dependencies": { From 278576119db68532c69bd4d92b9580827775e88a Mon Sep 17 00:00:00 2001 From: Curtis Man Date: Thu, 23 Apr 2026 23:49:35 -0700 Subject: [PATCH 16/16] actionGrammar: fix factoring losing ancestor-prefix bindings The cross-scope-ref check in checkFactoringEligible only compared member values against the immediate prefix at the current trie fork. It missed the case where a deeper bailout had already prepended an inner edge to each member, dragging in references to ancestor-prefix canonicals (variables bound several edges up the trie). An outer fork at a literal edge would then factor those members into a fresh wrapper-rule scope where the ancestor canonical was no longer visible, producing a runtime "Internal error: No value for variable '__opt_v_*'" This surfaced on playerSchema.agr's play by (alt1) play from album (alt2) play by from album ... (alt3) trie shape: the inner fork bails (alt1's terminal lands with empty parts), prepending to alt3's suffix; the outer 'by' fork's eligibility check then sees no canonicals in its 'by' prefix and factored anyway, lifting alt1/alt3 into a wrapper where 's canonical was lost. Fix: tighten the check to require every member's value to only reference variables bound in *that member's own parts*. This subsumes the simpler immediate-prefix check and catches the bailout-then-factor scenario. prefix parameter is no longer needed. Add a regression test covering the exact playerSchema trie shape. --- .../actionGrammar/src/grammarOptimizer.ts | 45 +++++++++-------- .../test/grammarOptimizerFactoring.spec.ts | 50 +++++++++++++++++++ 2 files changed, 74 insertions(+), 21 deletions(-) diff --git a/ts/packages/actionGrammar/src/grammarOptimizer.ts b/ts/packages/actionGrammar/src/grammarOptimizer.ts index 92808eef4..3a2526c82 100644 --- a/ts/packages/actionGrammar/src/grammarOptimizer.ts +++ b/ts/packages/actionGrammar/src/grammarOptimizer.ts @@ -1115,7 +1115,7 @@ function emitFromNode( } // Multi-member fork: try to wrap; bail out if any check fails. - if (checkFactoringEligible(prefix, members) !== undefined) { + if (checkFactoringEligible(members) !== undefined) { return members.map((m) => ({ ...m, parts: concatParts(prefix, m.parts), @@ -1129,10 +1129,7 @@ function emitFromNode( * Per-fork eligibility checks (lifted from the previous implementation). * Returns `undefined` when factoring is safe, or a short reason string. */ -function checkFactoringEligible( - prefix: GrammarPart[], - members: GrammarRule[], -): string | undefined { +function checkFactoringEligible(members: GrammarRule[]): string | undefined { // Empty-parts members never compose cleanly inside a wrapped // RulesPart: with a value, the matcher would have to treat // `{parts:[], value: V}` as a degenerate match (today's algorithm @@ -1163,24 +1160,30 @@ function checkFactoringEligible( return "no-value-implicit-default"; } // Cross-scope-ref: nested rule scope is fresh at the matcher level - // (entering a `RulesPart` resets `valueIds`). If a member's value - // references a name that the wrapper's prefix binds, that reference - // would resolve to nothing at runtime. Detect and bail out so each - // member is emitted at the wrapper's level instead, putting the - // binding back in scope. + // (entering a `RulesPart` resets `valueIds`). When members are + // lifted into a wrapper rule's `suffixRulesPart`, each member + // becomes an isolated inner rule whose value can only see + // variables bound in its own `parts` — bindings in the wrapper's + // prefix, *or* in any ancestor's prefix that has already been + // incorporated upstream, are no longer visible. + // + // We therefore require every variable referenced by a member's + // value to appear in that member's own top-level part bindings. + // This subsumes the simpler "member references prefix binding" + // check, and additionally catches the case where a deeper bailout + // dragged ancestor-prefix canonical references into a member that + // doesn't bind them (the bailout-then-factor scenario in + // playerSchema's `play by [...]`). // // Binding-shadow (a member's own binding colliding with a prefix - // binding) is no longer reachable: canonicals are opaque - // `__opt_v_` names allocated globally per `factorRulesPart` call, - // so two distinct edges always get distinct canonicals. - const prefixCanonicals = collectVariableNames(prefix); - if (prefixCanonicals.size > 0) { - for (const m of members) { - if (m.value !== undefined) { - for (const v of collectVariableReferences(m.value)) { - if (prefixCanonicals.has(v)) return "cross-scope-ref"; - } - } + // binding) is not reachable: canonicals are opaque `__opt_v_` + // names allocated globally per `factorRulesPart` call, so two + // distinct edges always get distinct canonicals. + for (const m of members) { + if (m.value === undefined) continue; + const memberBindings = collectVariableNames(m.parts); + for (const v of collectVariableReferences(m.value)) { + if (!memberBindings.has(v)) return "cross-scope-ref"; } } return undefined; diff --git a/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts b/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts index 67f9657e9..d9e18f4c3 100644 --- a/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts +++ b/ts/packages/actionGrammar/test/grammarOptimizerFactoring.spec.ts @@ -371,6 +371,56 @@ describe("Grammar Optimizer - Factoring Repro", () => { ); } }); + + // Regression for the playerSchema bug. The failing trie shape: + // + // play by (alt1) + // play from album (alt2) + // play by from album (alt3) + // + // The trie at forks ("by" vs "from"); inside the "by" + // branch, alt1's terminal lands at with empty parts + // alongside alt3's "from album " subtree. That deeper + // fork bails ("whole-consumed") and prepends the edge + // to each member. The outer "by" fork's eligibility check then sees + // members whose values reference the *outer* canonical + // — but that canonical isn't bound in the members' own parts. The + // pre-fix check missed this (it only compared against the immediate + // "by" prefix, which has no canonicals), factored anyway, and the + // matcher threw at runtime: + // "Internal error: No value for variable '__opt_v_*'". + it("does not factor when members reference ancestor-prefix bindings", () => { + const text = ` = ; + = $(trackName:string) -> trackName + | the $(trackName:string) -> trackName; + = + play $(trackName:) by $(artist:string) -> + { kind: "byArtist", trackName, artist } + | play $(trackName:) from album $(albumName:string) -> + { kind: "fromAlbum", trackName, albumName } + | play $(trackName:) by $(artist:string) from album $(albumName:string) -> + { kind: "byArtistFromAlbum", trackName, artist, albumName };`; + const baseline = loadGrammarRules("t.grammar", text); + const optimized = loadGrammarRules("t.grammar", text, { + optimizations: { factorCommonPrefixes: true }, + }); + for (const input of [ + "play hello by alice", + "play the hello by alice", + "play hello from album greats", + "play hello by alice from album greats", + "play the hello by alice from album greats", + ]) { + // No "Internal error" thrown at runtime, and the same set + // of matches is produced (order may differ, since factoring + // can interleave alternatives at the wrapper level). + const baseRes = match(baseline, input); + const optRes = match(optimized, input); + expect(optRes).toHaveLength(baseRes.length); + expect(optRes).toEqual(expect.arrayContaining(baseRes)); + expect(baseRes).toEqual(expect.arrayContaining(optRes)); + } + }); }); // ─── Merged from grammarOptimizerTrieRisks.spec.ts ──────────────────────────