Skip to content

Proposal for 4.0: Trie-based string matching and more#154

Draft
rkh wants to merge 15 commits intomainfrom
trie
Draft

Proposal for 4.0: Trie-based string matching and more#154
rkh wants to merge 15 commits intomainfrom
trie

Conversation

@rkh
Copy link
Copy Markdown
Member

@rkh rkh commented Apr 18, 2026

This is a proof of concept at the moment.

API

Current API:

trie = Mustermann::Trie.new
pattern = Mustermann.new("/a/:a/b/:b")

trie.add(pattern, :my_value)
trie.match("/a/1/b/2").value # => :my_value

User-facing API could, of course, use some love.

Implementation status

General implementation status

  • Basic trie implementation
  • Full compatibility with all AST-based patterns
  • Add support for except
  • Add benchmark
  • Tests
  • Documentation
  • Confirm whether the Hanami-Team is interested in this feature (some related discussion in hanami/hanami-router#284)

Supported AST nodes:

  • capture
  • char
  • expression
  • group
  • union
  • optional
  • or
  • root
  • seperator
  • splat
  • named splat
  • variable
  • with_lookahead

Implementing additional nodes should not have a performance impact on patterns not using these nodes (ie, I don't expect the numbers from the benchmark to get worse).

Implemented features:

  • Trie#match
  • Trie#peek_match
  • Trie#add with nicer API
  • Trie::Match#string
  • Trie::Match#pattern
  • Trie::Match#string
  • Trie::Match#values
  • Trie::Match#params
  • Trie::Match#captures
  • Trie::Match#names
  • Trie::Match#[]

Open questions:

  • Are there ways to improve compilation performance?
  • Should Trie mimic other Pattern methods? Should it be a pattern?
  • Should Trie::Match have MatchData-compatibility beyond what SimpleMatch offers? (Somewhat unrelated, should SimpleMatch have better compatibility with MatchData?)
  • How should a pattern conflict be handled? Like adding /prefix(/:capture)/suffix as well as /prefix/suffix will currently result in a TrieError being raised. I think it might make sense to just ignore the second pattern? As implemented, it is possible to ignore just one possible path through the pattern.

Performance

Initial performance:

===================== Compilation: Array<Pattern> =====================
      1 routes    1 level   0.003062   0.000856   0.003918 (  0.003918)
      5 routes    1 level   0.000293   0.000002   0.000295 (  0.000296)
     10 routes    1 level   0.000779   0.000008   0.000787 (  0.000787)
      1 routes    2 levels  0.000113   0.000006   0.000119 (  0.000119)
     25 routes    2 levels  0.003041   0.000127   0.003168 (  0.003169)
    100 routes    2 levels  0.009108   0.000178   0.009286 (  0.009288)
      1 routes    3 levels  0.000153   0.000002   0.000155 (  0.000154)
    125 routes    3 levels  0.018385   0.000227   0.018612 (  0.018616)
   1000 routes    3 levels  0.149649   0.001208   0.150857 (  0.150868)
      1 routes    4 levels  0.000186   0.000001   0.000187 (  0.000188)
    625 routes    4 levels  0.117547   0.000712   0.118259 (  0.118288)
  10000 routes    4 levels  1.948137   0.022295   1.970432 (  1.970570)

========================== Compilation: Trie ==========================
      1 routes    1 level   0.000134   0.000004   0.000138 (  0.000139)
      5 routes    1 level   0.000459   0.000006   0.000465 (  0.000465)
     10 routes    1 level   0.000650   0.000011   0.000661 (  0.000662)
      1 routes    2 levels  0.000131   0.000002   0.000133 (  0.000134)
     25 routes    2 levels  0.003879   0.000060   0.003939 (  0.003939)
    100 routes    2 levels  0.024879   0.000469   0.025348 (  0.025353)
      1 routes    3 levels  0.000186   0.000005   0.000191 (  0.000191)
    125 routes    3 levels  0.028082   0.001140   0.029222 (  0.029226)
   1000 routes    3 levels  0.248822   0.004997   0.253819 (  0.253838)
      1 routes    4 levels  0.000215   0.000000   0.000215 (  0.000216)
    625 routes    4 levels  0.208893   0.003252   0.212145 (  0.212162)
  10000 routes    4 levels  3.429190   0.045075   3.474265 (  3.474669)

====================== Matching: Array<Pattern> =======================
Rehearsal --------------------------------------------------------------
      1 routes    1 level    0.000216   0.000001   0.000217 (  0.000217)
      5 routes    1 level    0.000498   0.000000   0.000498 (  0.000497)
     10 routes    1 level    0.000792   0.000000   0.000792 (  0.000793)
      1 routes    2 levels   0.000234   0.000001   0.000235 (  0.000235)
     25 routes    2 levels   0.001815   0.000009   0.001824 (  0.001824)
    100 routes    2 levels   0.006397   0.000007   0.006404 (  0.006403)
      1 routes    3 levels   0.000254   0.000010   0.000264 (  0.000263)
    125 routes    3 levels   0.008741   0.000013   0.008754 (  0.008753)
   1000 routes    3 levels   0.074886   0.000221   0.075107 (  0.075118)
      1 routes    4 levels   0.000274   0.000017   0.000291 (  0.000291)
    625 routes    4 levels   0.043660   0.000110   0.043770 (  0.043786)
  10000 routes    4 levels   0.764417   0.000983   0.765400 (  0.765444)
----------------------------------------------------- total: 0.903556sec

                                 user     system      total        real
      1 routes    1 level    0.000225   0.000001   0.000226 (  0.000223)
      5 routes    1 level    0.000485   0.000000   0.000485 (  0.000483)
     10 routes    1 level    0.000772   0.000000   0.000772 (  0.000770)
      1 routes    2 levels   0.000265   0.000000   0.000265 (  0.000264)
     25 routes    2 levels   0.001766   0.000000   0.001766 (  0.001764)
    100 routes    2 levels   0.006530   0.000000   0.006530 (  0.006529)
      1 routes    3 levels   0.000242   0.000000   0.000242 (  0.000241)
    125 routes    3 levels   0.008577   0.000003   0.008580 (  0.008576)
   1000 routes    3 levels   0.065358   0.000025   0.065383 (  0.065382)
      1 routes    4 levels   0.000300   0.000000   0.000300 (  0.000297)
    625 routes    4 levels   0.043457   0.000008   0.043465 (  0.043464)
  10000 routes    4 levels   0.761751   0.000703   0.762454 (  0.762771)

=========================== Matching: Trie ============================
Rehearsal --------------------------------------------------------------
      1 routes    1 level    0.001232   0.000006   0.001238 (  0.001238)
      5 routes    1 level    0.001190   0.000003   0.001193 (  0.001193)
     10 routes    1 level    0.001231   0.000003   0.001234 (  0.001235)
      1 routes    2 levels   0.002223   0.000005   0.002228 (  0.002229)
     25 routes    2 levels   0.002280   0.000005   0.002285 (  0.002284)
    100 routes    2 levels   0.002288   0.000009   0.002297 (  0.002303)
      1 routes    3 levels   0.002759   0.000017   0.002776 (  0.002776)
    125 routes    3 levels   0.003742   0.000024   0.003766 (  0.003766)
   1000 routes    3 levels   0.003273   0.000026   0.003299 (  0.003300)
      1 routes    4 levels   0.003413   0.000014   0.003427 (  0.003428)
    625 routes    4 levels   0.003983   0.000027   0.004010 (  0.004010)
  10000 routes    4 levels   0.004955   0.000074   0.005029 (  0.005030)
----------------------------------------------------- total: 0.032782sec

                                 user     system      total        real
      1 routes    1 level    0.000986   0.000001   0.000987 (  0.000984)
      5 routes    1 level    0.000972   0.000001   0.000973 (  0.000970)
     10 routes    1 level    0.001002   0.000000   0.001002 (  0.001001)
      1 routes    2 levels   0.001754   0.000001   0.001755 (  0.001753)
     25 routes    2 levels   0.001792   0.000001   0.001793 (  0.001789)
    100 routes    2 levels   0.001902   0.000002   0.001904 (  0.001902)
      1 routes    3 levels   0.002501   0.000002   0.002503 (  0.002498)
    125 routes    3 levels   0.002803   0.000001   0.002804 (  0.002802)
   1000 routes    3 levels   0.003100   0.000001   0.003101 (  0.003101)
      1 routes    4 levels   0.003287   0.000002   0.003289 (  0.003287)
    625 routes    4 levels   0.003847   0.000001   0.003848 (  0.003848)
  10000 routes    4 levels   0.004927   0.000001   0.004928 (  0.004928)

Note that these benchmarks use the same patterns as r10k.

@rkh
Copy link
Copy Markdown
Member Author

rkh commented Apr 18, 2026

Things that could be interesting:

  • Add a generic RouteSet that can either be trie-based or linear. Measure the first x requests to choose between the two.
  • Could move the static lookup to an array instead of a hash, based on char.ord. This might be messy if URI-escaping isn't enabled, though. We might want to move to a byte-based approach in that case. Would only be worth it if there's an actual performance improvement. – this gives no performance advantage
  • Investigate the performance gain by using a native trie implementation. I think this dependency should be optional, or Mustermann::Trie should be moved to a separate gem.
  • Add hot-path caching, either with EquityMap or by some other means. It could be quite memory-intensive, though.

@rkh
Copy link
Copy Markdown
Member Author

rkh commented Apr 18, 2026

An alternative approach regarding the conflict would be to return all matching patterns and values. This might have some performance impact. But this would also be the only way to allow Sinatra to possibly use this feature in the future (as the current approach would not allow the use of pass).

rkh added 2 commits April 19, 2026 00:20
…once. also add Mustermann::Match to replace the mix of Mustermann::SimpleMatch and MatchData
@rkh
Copy link
Copy Markdown
Member Author

rkh commented Apr 19, 2026

New API is now:

set = Mustermann::Set.new(type: :rails)

# Value can be anything, it is optional
set.add("/books/:id", "books.show")
set.add("/authors", "authors.index")
set.add("/books/:book_id/authors", "authors.index")

# matching
match = set.match("/books/1/authors")
match.value # => "authors.index"
match.params # => { "book_id" => "1" }

# expansion
set.expand("authors.index", {}) # => "/authors"
set.expand("authors.index", { books_id: 1 } # => "/books/1/authors"

@rkh
Copy link
Copy Markdown
Member Author

rkh commented Apr 19, 2026

This PR technically includes breaking changes. I doubt most people will notice, but match is no longer returning a MatchData instance. We should bump the version to 4.0, but that will give issues with testing against a recent Sinatra version.

I might keep going at this branch with some somewhat related changes, like moving Mustermann::Mapper to contrib (I couldn't find a single OSS repo using it).

@rkh rkh changed the title Trie-based string matching Proposal for 4.0: Trie-based string matching and more Apr 19, 2026
@rkh
Copy link
Copy Markdown
Member Author

rkh commented Apr 19, 2026

image

I was trying to turn it into a chart…

Anyway, here are the updated numbers:

Routes Linear Trie Cached
5 0.001394 0.002305 0.000259
10 0.001703 0.002324 0.000251
20 0.002717 0.003716 0.000258
30 0.003413 0.003703 0.000236
40 0.004067 0.00388 0.000235
50 0.004643 0.003831 0.000263
100 0.007982 0.003874 0.000261
200 0.014497 0.003878 0.000283
300 0.022102 0.005527 0.000308
400 0.028515 0.005445 0.000317
500 0.035253 0.005737 0.000323
1000 0.069573 0.005918 0.000345
2000 0.144303 0.006331 0.000371
3000 0.213737 0.00623 0.000379
4000 0.283128 0.006519 0.000432
5000 0.366844 0.008239 0.000437
10000 0.787683 0.008391 0.000472

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant