Skip to content

Reduce parser-layer backtracking / longest-match (remaining gap to the official parser) #8

@johnsoncodehk

Description

@johnsoncodehk

After the lexer is emitted (the token-IR / char-code-scanner issue), the parser layer is still ~6× the official TS parser. The decomposition of the gap (PR #4) is even: lexer ≈ parser layer, each ~6× tsc.

Monogram's non-recursive rules do longest-match — try every alternative, keep the one that consumes most — and the engine backtracks (save/restore position). tsc is predictive: it commits to one production from a small lookahead with minimal backtracking. The parserharness.ts bench file shows the cost — its parser layer is ~6.8× tsc vs ~5.8× on the other three files (the backtracking signature).

Levers

  • When an alternative's FIRST set uniquely identifies it, commit to it — skip the longest-match scan of the remaining alternatives.
  • Strengthen first-token dispatch so provably-dead alternatives are never entered (extends the existing canStartFT / FIRST-set pruning).

Honest ceiling

This is partly the price of a non-deterministic, grammar-derived engine vs a hand-coded predictive parser — it likely won't fully close in JS while staying grammar-derived. Beating tsc is the native-target path (the emitted lexer + the target-agnostic emitter issues), where no-GC + value types + char-code scanning raise the ceiling above V8.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions