Proposal for 4.0: Trie-based string matching and more by rkh · Pull Request #154 · sinatra/mustermann

rkh · 2026-04-18T12:26:33Z

This is a proof of concept at the moment.

API

Current API:

trie = Mustermann::Trie.new
pattern = Mustermann.new("/a/:a/b/:b")

trie.add(pattern, :my_value)
trie.match("/a/1/b/2").value # => :my_value

User-facing API could, of course, use some love.

Implementation status

General implementation status

Basic trie implementation
Full compatibility with all AST-based patterns
Add support for except
Add benchmark
Tests
Documentation
Confirm whether the Hanami-Team is interested in this feature (some related discussion in hanami/hanami-router#284)

Supported AST nodes:

Implementing additional nodes should not have a performance impact on patterns not using these nodes (ie, I don't expect the numbers from the benchmark to get worse).

Implemented features:

Open questions:

Are there ways to improve compilation performance?
Should Trie mimic other Pattern methods? Should it be a pattern?
Should Trie::Match have MatchData-compatibility beyond what SimpleMatch offers? (Somewhat unrelated, should SimpleMatch have better compatibility with MatchData?)
How should a pattern conflict be handled? Like adding /prefix(/:capture)/suffix as well as /prefix/suffix will currently result in a TrieError being raised. I think it might make sense to just ignore the second pattern? As implemented, it is possible to ignore just one possible path through the pattern.

Performance

Initial performance:

===================== Compilation: Array<Pattern> =====================
      1 routes    1 level   0.003062   0.000856   0.003918 (  0.003918)
      5 routes    1 level   0.000293   0.000002   0.000295 (  0.000296)
     10 routes    1 level   0.000779   0.000008   0.000787 (  0.000787)
      1 routes    2 levels  0.000113   0.000006   0.000119 (  0.000119)
     25 routes    2 levels  0.003041   0.000127   0.003168 (  0.003169)
    100 routes    2 levels  0.009108   0.000178   0.009286 (  0.009288)
      1 routes    3 levels  0.000153   0.000002   0.000155 (  0.000154)
    125 routes    3 levels  0.018385   0.000227   0.018612 (  0.018616)
   1000 routes    3 levels  0.149649   0.001208   0.150857 (  0.150868)
      1 routes    4 levels  0.000186   0.000001   0.000187 (  0.000188)
    625 routes    4 levels  0.117547   0.000712   0.118259 (  0.118288)
  10000 routes    4 levels  1.948137   0.022295   1.970432 (  1.970570)

========================== Compilation: Trie ==========================
      1 routes    1 level   0.000134   0.000004   0.000138 (  0.000139)
      5 routes    1 level   0.000459   0.000006   0.000465 (  0.000465)
     10 routes    1 level   0.000650   0.000011   0.000661 (  0.000662)
      1 routes    2 levels  0.000131   0.000002   0.000133 (  0.000134)
     25 routes    2 levels  0.003879   0.000060   0.003939 (  0.003939)
    100 routes    2 levels  0.024879   0.000469   0.025348 (  0.025353)
      1 routes    3 levels  0.000186   0.000005   0.000191 (  0.000191)
    125 routes    3 levels  0.028082   0.001140   0.029222 (  0.029226)
   1000 routes    3 levels  0.248822   0.004997   0.253819 (  0.253838)
      1 routes    4 levels  0.000215   0.000000   0.000215 (  0.000216)
    625 routes    4 levels  0.208893   0.003252   0.212145 (  0.212162)
  10000 routes    4 levels  3.429190   0.045075   3.474265 (  3.474669)

====================== Matching: Array<Pattern> =======================
Rehearsal --------------------------------------------------------------
      1 routes    1 level    0.000216   0.000001   0.000217 (  0.000217)
      5 routes    1 level    0.000498   0.000000   0.000498 (  0.000497)
     10 routes    1 level    0.000792   0.000000   0.000792 (  0.000793)
      1 routes    2 levels   0.000234   0.000001   0.000235 (  0.000235)
     25 routes    2 levels   0.001815   0.000009   0.001824 (  0.001824)
    100 routes    2 levels   0.006397   0.000007   0.006404 (  0.006403)
      1 routes    3 levels   0.000254   0.000010   0.000264 (  0.000263)
    125 routes    3 levels   0.008741   0.000013   0.008754 (  0.008753)
   1000 routes    3 levels   0.074886   0.000221   0.075107 (  0.075118)
      1 routes    4 levels   0.000274   0.000017   0.000291 (  0.000291)
    625 routes    4 levels   0.043660   0.000110   0.043770 (  0.043786)
  10000 routes    4 levels   0.764417   0.000983   0.765400 (  0.765444)
----------------------------------------------------- total: 0.903556sec

                                 user     system      total        real
      1 routes    1 level    0.000225   0.000001   0.000226 (  0.000223)
      5 routes    1 level    0.000485   0.000000   0.000485 (  0.000483)
     10 routes    1 level    0.000772   0.000000   0.000772 (  0.000770)
      1 routes    2 levels   0.000265   0.000000   0.000265 (  0.000264)
     25 routes    2 levels   0.001766   0.000000   0.001766 (  0.001764)
    100 routes    2 levels   0.006530   0.000000   0.006530 (  0.006529)
      1 routes    3 levels   0.000242   0.000000   0.000242 (  0.000241)
    125 routes    3 levels   0.008577   0.000003   0.008580 (  0.008576)
   1000 routes    3 levels   0.065358   0.000025   0.065383 (  0.065382)
      1 routes    4 levels   0.000300   0.000000   0.000300 (  0.000297)
    625 routes    4 levels   0.043457   0.000008   0.043465 (  0.043464)
  10000 routes    4 levels   0.761751   0.000703   0.762454 (  0.762771)

=========================== Matching: Trie ============================
Rehearsal --------------------------------------------------------------
      1 routes    1 level    0.001232   0.000006   0.001238 (  0.001238)
      5 routes    1 level    0.001190   0.000003   0.001193 (  0.001193)
     10 routes    1 level    0.001231   0.000003   0.001234 (  0.001235)
      1 routes    2 levels   0.002223   0.000005   0.002228 (  0.002229)
     25 routes    2 levels   0.002280   0.000005   0.002285 (  0.002284)
    100 routes    2 levels   0.002288   0.000009   0.002297 (  0.002303)
      1 routes    3 levels   0.002759   0.000017   0.002776 (  0.002776)
    125 routes    3 levels   0.003742   0.000024   0.003766 (  0.003766)
   1000 routes    3 levels   0.003273   0.000026   0.003299 (  0.003300)
      1 routes    4 levels   0.003413   0.000014   0.003427 (  0.003428)
    625 routes    4 levels   0.003983   0.000027   0.004010 (  0.004010)
  10000 routes    4 levels   0.004955   0.000074   0.005029 (  0.005030)
----------------------------------------------------- total: 0.032782sec

                                 user     system      total        real
      1 routes    1 level    0.000986   0.000001   0.000987 (  0.000984)
      5 routes    1 level    0.000972   0.000001   0.000973 (  0.000970)
     10 routes    1 level    0.001002   0.000000   0.001002 (  0.001001)
      1 routes    2 levels   0.001754   0.000001   0.001755 (  0.001753)
     25 routes    2 levels   0.001792   0.000001   0.001793 (  0.001789)
    100 routes    2 levels   0.001902   0.000002   0.001904 (  0.001902)
      1 routes    3 levels   0.002501   0.000002   0.002503 (  0.002498)
    125 routes    3 levels   0.002803   0.000001   0.002804 (  0.002802)
   1000 routes    3 levels   0.003100   0.000001   0.003101 (  0.003101)
      1 routes    4 levels   0.003287   0.000002   0.003289 (  0.003287)
    625 routes    4 levels   0.003847   0.000001   0.003848 (  0.003848)
  10000 routes    4 levels   0.004927   0.000001   0.004928 (  0.004928)

Note that these benchmarks use the same patterns as r10k.

rkh · 2026-04-18T12:49:55Z

Things that could be interesting:

Add a generic RouteSet that can either be trie-based or linear. Measure the first x requests to choose between the two.
Could move the static lookup to an array instead of a hash, based on char.ord. This might be messy if URI-escaping isn't enabled, though. We might want to move to a byte-based approach in that case. Would only be worth it if there's an actual performance improvement. – this gives no performance advantage
Investigate the performance gain by using a native trie implementation. I think this dependency should be optional, or Mustermann::Trie should be moved to a separate gem.
Add hot-path caching, either with EquityMap or by some other means. It could be quite memory-intensive, though.

rkh · 2026-04-18T12:57:00Z

An alternative approach regarding the conflict would be to return all matching patterns and values. This might have some performance impact. But this would also be the only way to allow Sinatra to possibly use this feature in the future (as the current approach would not allow the use of pass).

…once. also add Mustermann::Match to replace the mix of Mustermann::SimpleMatch and MatchData

rkh · 2026-04-19T00:45:26Z

New API is now:

set = Mustermann::Set.new(type: :rails)

# Value can be anything, it is optional
set.add("/books/:id", "books.show")
set.add("/authors", "authors.index")
set.add("/books/:book_id/authors", "authors.index")

# matching
match = set.match("/books/1/authors")
match.value # => "authors.index"
match.params # => { "book_id" => "1" }

# expansion
set.expand("authors.index", {}) # => "/authors"
set.expand("authors.index", { books_id: 1 } # => "/books/1/authors"

rkh · 2026-04-19T00:52:05Z

This PR technically includes breaking changes. I doubt most people will notice, but match is no longer returning a MatchData instance. We should bump the version to 4.0, but that will give issues with testing against a recent Sinatra version.

I might keep going at this branch with some somewhat related changes, like moving Mustermann::Mapper to contrib (I couldn't find a single OSS repo using it).

…Data)

…mustermann-contrib

rkh · 2026-04-19T01:56:33Z

I was trying to turn it into a chart…

Anyway, here are the updated numbers:

Routes	Linear	Trie	Cached
5	0.001394	0.002305	0.000259
10	0.001703	0.002324	0.000251
20	0.002717	0.003716	0.000258
30	0.003413	0.003703	0.000236
40	0.004067	0.00388	0.000235
50	0.004643	0.003831	0.000263
100	0.007982	0.003874	0.000261
200	0.014497	0.003878	0.000283
300	0.022102	0.005527	0.000308
400	0.028515	0.005445	0.000317
500	0.035253	0.005737	0.000323
1000	0.069573	0.005918	0.000345
2000	0.144303	0.006331	0.000371
3000	0.213737	0.00623	0.000379
4000	0.283128	0.006519	0.000432
5000	0.366844	0.008239	0.000437
10000	0.787683	0.008391	0.000472

initial (limited) implementation for trie-based matching

a080785

rkh self-assigned this Apr 18, 2026

rkh added all patterns feature labels Apr 18, 2026

rkh added 2 commits April 19, 2026 00:20

current status

5915989

implement Mustermann::Set, a class for matching multiple patterns at …

c181088

…once. also add Mustermann::Match to replace the mix of Mustermann::SimpleMatch and MatchData

bump minimum Ruby version to 3.3 (oldest not EOL yet)

57a0554

rkh added 6 commits April 19, 2026 02:57

have Mustermann::Mapper use Mustermann::Set under the hood

dcc06e1

move Mustermann::Mapper to mustermann-contrib

0e68a9b

remove extra code and documentation for Sinatra 1.x

c61ca6d

bump version to 4.0

5fb4f04

remove broken require

9df8378

add support for patterns with :except option to Mustermann::Set

2ecce1f

rkh changed the title ~~Trie-based string matching~~ Proposal for 4.0: Trie-based string matching and more Apr 19, 2026

rkh added 5 commits April 19, 2026 03:27

remove mentions of old return values (Mustermann::SimpleMatch / Match…

2394c2b

…Data)

move Mustermann::PatternCache (used by Mustermann::StringScanner) to …

8caae2b

…mustermann-contrib

move to_pattern to contrib

dbfe705

remove mustermann/extension

8490b08

support newer rails versions

4c2d560

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for 4.0: Trie-based string matching and more#154

Proposal for 4.0: Trie-based string matching and more#154
rkh wants to merge 15 commits intomainfrom
trie

rkh commented Apr 18, 2026 •

edited

Loading

Uh oh!

rkh commented Apr 18, 2026 •

edited

Loading

Uh oh!

rkh commented Apr 18, 2026

Uh oh!

rkh commented Apr 19, 2026

Uh oh!

rkh commented Apr 19, 2026

Uh oh!

rkh commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rkh commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

API

Implementation status

Performance

Uh oh!

rkh commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rkh commented Apr 18, 2026

Uh oh!

rkh commented Apr 19, 2026

Uh oh!

rkh commented Apr 19, 2026

Uh oh!

rkh commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rkh commented Apr 18, 2026 •

edited

Loading

rkh commented Apr 18, 2026 •

edited

Loading