Skip to content

Branch main

Mark Lauter edited this page Jun 24, 2026 · 1 revision

title: main branch summary: "The only branch: the published MSL.Lexi regex lexer with the VocabularyBuilder API, maximal-munch scanning, and Math and SQL-like sample parsers." tags: [lexi, branch, main, lexer, tokenizer, regex, csharp] created: 2026-06-24 status: draft dotnet: [net6.0, net7.0, net8.0] build-status: builds

The main branch

main is the only branch — the whole project. It is a regex-driven, allocation-light lexer for .NET. You declare a vocabulary of Match and Ignore regex patterns mapped to integer token ids with VocabularyBuilder, then call Lexer.NextMatch to pull tokens left to right using maximal-munch — longest match, lowest index on a tie. It is built to feed simple recursive-descent parsers, demonstrated by the bundled math and predicate sample parsers and their REPLs.

What it holds

The tokenization pipeline described in Architecture (source: Lexi):

  • Lexer and VocabularyBuilder — the scanner and its fluent configuration.
  • Pattern, Symbol, Source, MatchResult — the ref-struct token and scanning types.
  • CommonPatterns — the reusable identifier, literal, and whitespace regexes.

State

Builds clean with the installed .NET SDK — dotnet build restored and compiled all three target frameworks (net6.0, net7.0, net8.0) with no warnings or errors; tests were not run. Last touched June 2024. It is finished and stable: a published NuGet package (MSL.Lexi v2.2.2), with a test suite, CI workflows, and two working sample parsers. The one loose end is a // todo in CommonPatterns.CharacterLiteral noting the char-literal pattern does not yet handle escape sequences.

Related

  • Architecture — the tokenization pipeline this branch implements.