Skip to content
Mark Lauter edited this page Jun 24, 2026 · 2 revisions

title: lexi summary: "A regex-based lexer for .NET, published as MSL.Lexi — declare a vocabulary of named regex patterns mapped to token ids, then scan source into tokens with maximal-munch." tags: [lexi, overview, lexer, tokenizer, regex, parsing, csharp, index] created: 2026-06-24 status: draft

lexi

lexi is a regex-based lexer, or tokenizer, for .NET, published to NuGet as MSL.Lexi. You declare a vocabulary of named regex patterns, each mapped to an integer token id, and the lexer scans source text left to right, emitting the next matching token. It is general-purpose — SQL is only one demonstration — and is built to feed simple recursive-descent parsers, with a math expression parser and a SQL-like query parser shipped as samples.

Unlike most of the projects in this collection, lexi is finished and published: it builds clean across three target frameworks, carries tests and CI, and uses ref struct value types throughout for allocation-light scanning.

Architecture

Architecture describes the tokenization pipeline — the Source over the input, the Lexer that applies patterns with maximal-munch selection, the Symbol it emits, and the VocabularyBuilder that configures it.

Branches

  • Branch-main — the only branch: the published MSL.Lexi lexer with its VocabularyBuilder API and sample parsers.

A note on state

The project is finished and stable. Branch-main reports a verified build-status from dotnet build — it multi-targets net6.0, net7.0, and net8.0, and builds clean across all three. The package is MSL.Lexi v2.2.2, AOT-compatible and trimmable. The only loose end in source is a // todo noting the character-literal pattern does not yet handle escape sequences.