# [Routing] Match 77.7x faster by compiling routes in one regexp #26059

Merged
merged 1 commit into from Feb 12, 2018

## Conversation

Projects
None yet
Member

### nicolas-grekas commented Feb 5, 2018 • edited

Q A
Branch? master
Bug fix? no
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets -
Doc PR -

Builds on top of #25961 and http://nikic.github.io/2014/02/18/Fast-request-routing-using-regular-expressions.html to make routing 77.7x faster (measured when matching the last dynamic route of 400 static + 400 dynamic routes.)

More benchs welcomed.

• check static routes first in a switch
• group same-modifiers regexps
• put condition-less routes in a static map, in a switch's "default"
• match host+pathinfo at the same time
• group same-host regexps in sub-patterns
• consider the host to move more static routes in the static switch
• move static single-route "case" to "default" by adding requirements to $routes • group same-static-prefix in sub-patterns • group consecutive same-regex in a single "case" • move dynamic single-route "case" to "default" by adding requirements to$routes
• extract host variables from the main match and remove the current 2nd match
• extend same-prefix grouping to placeholders
• group same-suffix hosts together

Here is my benchmarking code:

<?php

namespace Symfony\Component\Routing;

$routes = new RouteCollection(); for ($i = 0; $i < 400; ++$i) {
$routes->add('r'.$i, new Route('/abc'.$i));$routes->add('f'.$i, new Route('/abc{foo}/'.$i));
}

$dumper = new Matcher\Dumper\PhpMatcherDumper($routes);

eval('?'.'>'.$dumper->dump());$router = new \ProjectUrlMatcher(new RequestContext());

$i = 10000;$s = microtime(1);

while (--$i) {$router->match('/abcdef/399');
}

echo 'Symfony: ', 1000 * (microtime(1) - $s), "\n"; namespace FastRoute;$dispatcher = simpleDispatcher(function(RouteCollector $r) { for ($i = 0; $i < 400; ++$i) {
$r->addRoute('GET', '/abc'.$i, 'r'.$i);$r->addRoute('GET', '/abc{foo}/'.$i, 'f'.$i);
}
});

$i = 10000;$s = microtime(1);

while (--$i) {$dispatcher->dispatch('GET', '/abcdef/399');
}

echo 'FastRoute: ', 1000 * (microtime(1) - $s), "\n"; ### nicolas-grekas added this to the 4.1 milestone Feb 5, 2018 ### carsonbot added Status: Needs Review Routing labels Feb 5, 2018 ### javiereguiluz added the Performance label Feb 5, 2018 ### chalasr reviewed Feb 5, 2018 $code .= sprintf(" case %s:\n", self::export($url)); foreach ($routes as $name => list($hasTrailingSlash, $route)) {$methods = $route->getMethods();$supportsTrailingSlash = $supportsRedirections && (!$methods || in_array('HEAD', $methods) || in_array('GET',$methods));

#### chalasr Feb 5, 2018

Member

Route methods must be uppercase strings, \in_array calls could be made strict

#### nicolas-grekas Feb 5, 2018 • edited

Member

this is borrowed from existing code, nothing to do here: uppercase is already done by the very method you linked, and tweaking in_array() would have zero impact as this is dumping code, not critical at all

### phansys suggested changes Feb 6, 2018

 // dynamic return $this->mergeDefaults(array_replace($matches, array('_route' => 'dynamic')), array()); break;

#### phansys Feb 6, 2018

Contributor

This break statement seems useless.

#### nicolas-grekas Feb 6, 2018 • edited

Member

these are dumped using independent methods, so cannot do it without heavy changes/complexity
since this is run-once code, there would be no practical benefit, so it's not worth it

 } $regexOffset = 0;$regexList = array(

#### javiereguiluz Feb 6, 2018

Member

Not sure if it's useful in this case, but the S modifier makes PHP better analyze the regexps to improve performance: http://php.net/manual/en/reference.pcre.pattern.modifiers.php

S

When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. If this modifier is set, then this extra analysis is performed. At present, studying a pattern is useful only for non-anchored patterns that do not have a single fixed starting character.

#### nicolas-grekas Feb 6, 2018

Member

studying a pattern is useful only for non-anchored patterns

the pattern here is anchored, so this doesn't apply

Member

### javiereguiluz commented Feb 6, 2018

 In the benchmark code, all the route URLs have the same prefix. I wonder if that influences in any way the results (maybe it unlocks some internal optimizations somehow): $routes->add('r'.$i, new Route('/abc'.$i));$routes->add('f'.$i, new Route('/abc{foo}/'.$i)); Maybe for the benchmark we could try the worst case possible (a different prefix for each route) and see if the results vary: for ($i = 0;$i < 400; ++$i) { -$routes->add('r'.$i, new Route('/abc'.$i)); - $routes->add('f'.$i, new Route('/abc{foo}/'.$i)); +$prefix = substr(str_shuffle('abcdefghijklmnopqrstuvwxyz'), -3); + $routes->add('r'.$i, new Route('/'.$prefix.$i)); + $routes->add('f'.$i, new Route('/'.$prefix.'{foo}/'.$i)); } // ... while (--$i) { -$router->match('/abcdef/399'); + $router->match('/'.$prefix.'def/399'); } // ...
Member

Member

Member

fixed thanks

Contributor

### dmaicher commented Feb 6, 2018 • edited

 @nicolas-grekas it seems this is now too optimized for a common prefix? Out of interest I benchmarked it on my biggest Symfony 3.4 monolith app with around 540 routes. boot(); $router =$kernel->getContainer()->get('router'); $routes =$router->getRouteCollection(); $router->getContext()->setHost('...');$router->getContext()->setMethod('GET'); foreach (range(0, 10000) as $i) { /** @var \Symfony\Component\Routing\Route$route */ foreach ($routes as$route) { if (!$route->getCondition() && (!$route->getMethods() || in_array('GET', $route->getMethods()))) { try {$router->match(preg_replace('/\{[^{]+\}/', 'foo', $route->getPath())); } catch (\Symfony\Component\Routing\Exception\ResourceNotFoundException$e) { } } } } This naive script matches 177 out of 541 routes. Benchmarks done with cleared & warmed up cache. With Symfony 3.4.4.: $time php route_bench.php real 0m11.288s user 0m11.256s sys 0m0.024s  Applied your changes for PhpMatcherDumper and PhpGeneratorDumper: $ time php route_bench.php real 0m37.907s user 0m37.856s sys 0m0.028s  So for me this takes roughly 3.5 times as long now. Do you see something wrong with my benchmark?
Member

### nicolas-grekas commented Feb 6, 2018

 @dmaicher can you share the dumped router matcher? (PM on Slack if you want)

Member

### nicolas-grekas commented Feb 6, 2018 • edited

 Quickly running a Blackfire bench on both your cases made me notice that the issue is the call to createRequest() - which is totally unrelated to the PR itself and would remove 95% of the perf delta you have. This made me realize we should move this call next to expressions, since that's the only place where the request is used. Being done with this, my own bench above now makes our router 3x faster than FastRoute, yay! Still, after this change, I measure 27ms vs 33ms in your case, so still slower. Now working on fixing that.

### stof reviewed Feb 6, 2018

 $expression =$this->getExpressionLanguage()->compile($route->getCondition(), array('context', 'request')); if (false !== strpos($expression, '$request')) {$conditions[] = '($request =$request ?? $this->request ?:$this->createRequest($pathinfo))'; #### stof Feb 6, 2018 Member It looks like we don't have any of the fixtures hitting this code path. Is it expected ? #### nicolas-grekas Feb 10, 2018 Member not sure what you mean: this code path is run by the test suite (just verified.) Member ### nicolas-grekas commented Feb 6, 2018 • edited  @dmaicher perf should be back on par. Which means in your case, this is not faster for now, so you have a pretty pathological case, thank you ;-) Basically, it has interweaved routes with and without host constraints, and also has a host-bound route that matches any pathinfo. The mix of both makes the hard case. Status: needs work ### carsonbot added Status: Needs Work and removed Status: Reviewed labels Feb 6, 2018 Contributor ### frankdejonge commented Feb 6, 2018  Perhaps now would be a good time to have some technical documentation around this. For instance we have a general rule we preserve input order (which prevents certain optimisations, but for good reasons). It may be the most trivial of documents, but it does prevent the maintainers to correct PR's which might want to optimise for this as well as give people a more general idea of why things are done the way they are done (which might be a broader topic that could benefit the framework). Overall, this work looks great! While I was pretty proud at the optimisations my contributions brought to the framework, I'm happy to see something even more performant will replace them 🤘 ### Tobion reviewed Feb 10, 2018  } if ('#' === \$requiredHost[0] && \$hostMatches) { \$hostMatches['_route'] = \$ret['_route']; #### Tobion Feb 10, 2018 Member Can't this be removed? It's only when the host has a _route placeholder. This does not seem to be a supported case. You can only get the route name or the content of the placeholder. But not both. So using _route name should be forbidden as placeholder. #### nicolas-grekas Feb 10, 2018 • edited Member using _route name should be forbidden as placeholder It should, but it isn't, and the current code has actual protection so that the route name always has precedence. Looking twice, I think this is required yes. #### stof Feb 13, 2018 Member we should deprecate defining such placeholder though, to avoid WTF Member ### nicolas-grekas commented Feb 10, 2018  Last optim: hosts now are also grouped+reordered together, based on their static suffix. Contributor ### jean-pasqualini commented Feb 11, 2018  @nicolas-grekas what about an application with 200 dynamic routes ? I will probably test your pr on our application to make my own opinion. @nicolas-grekas In the article he talks about the fact that having everything on a regex is effective on a small number of routes but that on a large number of routes it is better to group the regex in packs of 10. Is this an approach that you have tried ? I do not find that in the implementation that's why I ask Member ### nicolas-grekas commented Feb 11, 2018 • edited  @jean-pasqualini My canonical test as 400 dynamic routes (see main description), and I have another one with 540 static + non-static routes from a real app. Which means 200 dynamic routes will be fine :) About chunking, this uses PCRE named patterns, which are faster and don't need it. Contributor ### dmaicher commented Feb 12, 2018 • edited  @nicolas-grekas awesome job 🎉 For my huge monolith app I extended my simple test script a bit so it matches routes for all available hosts: https://gist.github.com/dmaicher/61f2e388ac0e65cc945dbb6e678e2ef7 Here the new numbers: Stock symfony 3.4.4: 0m22.874s Your optimizations: 0m14.812s The total number of matched routes is identical 👍 Edit: but I guess you used my route collection already for some benchmarks yourself 😄 ### ScullWM reviewed Feb 12, 2018 $code .= <<indent($default, 4)} ); #### ScullWM Feb 12, 2018 • edited Contributor This array can contain severals static routes. With a big amount of route, this array and the $regexList and $matchedPathinfo array could be significants. Isn't it possible to define them as static array and benefit of OPCode cache ? (But won't work on 3.4 with php55 requirement...) Like you've done with the php array adapter ;) Anyway great job @nicolas-grekas ! #### nicolas-grekas Feb 12, 2018 Member A static array can be inline, opcache will work with them. So nothing to change here to benefit from shared memory. Member ### nicolas-grekas commented Feb 12, 2018  PR ready for a last round of review, all green on my side. ### Tobion reviewed Feb 12, 2018  throw$e; } if (!in_array($this->context->getMethod(), array('HEAD', 'GET'))) { throw new MethodNotAllowedException(array('GET')); #### Tobion Feb 12, 2018 Member this should be reverted #### nicolas-grekas Feb 12, 2018 Member reverted Member ### Tobion commented Feb 12, 2018 • edited  @nicolas-grekas I think the trailing slash logic is way to complicated now and makes it really hard to maintain. I think it would be much easier to do something similar as the RedirectableUrlMatcher. So it first matches with the original uri path. And only if it does not find anything then trigger a new internal match with the changed path using the same generated match method. Then we don't need to keep track of all the supportsRedirections and hashTrailingSlash logic and changing the regex based on this. And since it happens only when it does not match the original path, it has no real-world performance inpact as well. But I think it makes the code much more readable. This would also allow to do the redirect the other way round very easily which has been asked for several times: GET /foo/ to redirect to GET /foo when /foo/ does not exist. This is currently really hard to implement due to all the string manipulation magic and implementation details.  [Routing] Match 77.7x faster by compiling routes in one regexp   f933f70  Member ### nicolas-grekas commented Feb 12, 2018  @Tobion why not. For another PR if you don't mind? ### Tobion approved these changes Feb 12, 2018 ### carsonbot added Status: Reviewed and removed Status: Needs Review labels Feb 12, 2018 Member ### Tobion commented Feb 12, 2018  Thank you Nicolas for this inspiring work ### Tobion merged commit f933f70 into symfony:master Feb 12, 2018 2 of 3 checks passed #### 2 of 3 checks passed fabbot.io Some changes should be done to comply with our standards. Details continuous-integration/appveyor/pr AppVeyor build succeeded Details continuous-integration/travis-ci/pr The Travis CI build passed Details ### Tobion added a commit that referenced this pull request Feb 12, 2018  feature #26059 [Routing] Match 77.7x faster by compiling routes in on…  …e regexp (nicolas-grekas) This PR was merged into the 4.1-dev branch. Discussion ---------- [Routing] Match 77.7x faster by compiling routes in one regexp | Q | A | ------------- | --- | Branch? | master | Bug fix? | no | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | - | License | MIT | Doc PR | - Builds on top of #25961 and http://nikic.github.io/2014/02/18/Fast-request-routing-using-regular-expressions.html to make routing 77.7x faster (measured when matching the last dynamic route of 400 static + 400 dynamic routes.) More benchs welcomed. - [x] check static routes first in a switch - [x] group same-modifiers regexps - [x] put condition-less routes in a static map, in a switch's "default" - [x] match host+pathinfo at the same time - [x] group same-host regexps in sub-patterns - [x] consider the host to move more static routes in the static switch - [x] move static single-route "case" to "default" by adding requirements to$routes
- [x] group same-static-prefix in sub-patterns
- [x] group consecutive same-regex in a single "case"
- [x] move dynamic single-route "case" to "default" by adding requirements to $routes - [x] extract host variables from the main match and remove the current 2nd match - [x] extend same-prefix grouping to placeholders - [x] group same-suffix hosts together Here is my benchmarking code: php <?php namespace Symfony\Component\Routing; require 'vendor/autoload.php';$routes = new RouteCollection();

for ($i = 0;$i < 400; ++$i) {$routes->add('r'.$i, new Route('/abc'.$i));
$routes->add('f'.$i, new Route('/abc{foo}/'.$i)); }$dumper = new Matcher\Dumper\PhpMatcherDumper($routes); eval('?'.'>'.$dumper->dump());

$router = new \ProjectUrlMatcher(new RequestContext());$i = 10000;
$s = microtime(1); while (--$i) {
$router->match('/abcdef/399'); } echo 'Symfony: ', 1000 * (microtime(1) -$s), "\n";

namespace FastRoute;

$dispatcher = simpleDispatcher(function(RouteCollector$r) {
for ($i = 0;$i < 400; ++$i) {$r->addRoute('GET', '/abc'.$i, 'r'.$i);
$r->addRoute('GET', '/abc{foo}/'.$i, 'f'.$i); } });$i = 10000;
$s = microtime(1); while (--$i) {
$dispatcher->dispatch('GET', '/abcdef/399'); } echo 'FastRoute: ', 1000 * (microtime(1) -$s), "\n";


Commits
-------

f933f70 [Routing] Match 77.7x faster by compiling routes in one regexp
 4ef0b3e 

Contributor

### mvrhov commented Feb 13, 2018 • edited

 This is probably going to be a problem with pcre jit enabled. The stack limit there is 32k https://bugs.php.net/bug.php?id=70110 and http://marc.info/?l=php-internals&m=143764967808306&w=2
Member

### nicolas-grekas commented Feb 13, 2018 • edited

 @mvrhov not sure this can happen on real world app. Routes are pretty simple usually, and pretty short. Matching them is straightforward - ie it doesn't involve much backtracking. But of course I may be wrong, please provide a reproducer if you have a case that doesn't work. There is another limit I just hit when trying to run https://github.com/tyler-sommer/php-router-benchmark: Compilation failed: regular expression is too large at offset 111543 We may need chunking to workaround the issue. This should be easy to detect at compile time. edit: chunking done in #26169
Contributor

### mvrhov commented Feb 13, 2018 • edited

 The 2nd limit might come from pcre.backtrack_limit. I don't have reproducer I'm just saying that the some huge (in number of items) and long (in number of characters per route url) route collection benchmark should be run with pcre.jit set to true
Member

### stof commented Feb 13, 2018

 @mvrhov Symfony generates possessive quantifiers for placeholders by default. And when defining custom requirements, it is generally possible to rely on possessive quantifiers too (not being able to use them would require having some crazy URL pattern in most cases). And so, backtracking should not happen too often.

Merged

### nicolas-grekas added a commit that referenced this pull request Feb 14, 2018

 bug #26169 [Routing] Handle very large set of dynamic routes (nicolas… 
…-grekas)

This PR was merged into the 4.1-dev branch.

Discussion
----------

[Routing] Handle very large set of dynamic routes

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | -
| Doc PR        | -

Follow up of #26059; allows handling very long lists of dynamic routes.
Allows running https://github.com/tyler-sommer/php-router-benchmark seamlessly.

BTW, here are the result of running this benchmark:

R3 extension is not loaded. Skipping initialization for "Worst-case matching" test using R3.
R3 extension is not loaded. Skipping initialization for "First route matching" test using R3.
## Worst-case matching
This benchmark matches the last route and unknown route. It generates a randomly prefixed and suffixed route in an attempt to thwart any optimization. 1,000 routes each with 9 arguments.

This benchmark consists of 10 tests. Each test is executed 1,000 times, the results pruned, and then averaged. Values that fall outside of 3 standard deviations of the mean are discarded.

Test Name | Results | Time | + Interval | Change
--------- | ------- | ---- | ---------- | ------
Symfony4 - unknown route (1000 routes) | 995 | 0.0000085699 | +0.0000000000 | baseline
Symfony4 - last route (1000 routes) | 999 | 0.0000086754 | +0.0000001055 | 1% slower
FastRoute - unknown route (1000 routes) | 980 | 0.0000305154 | +0.0000219455 | 256% slower
FastRoute - last route (1000 routes) | 999 | 0.0000529922 | +0.0000444223 | 518% slower
Pux PHP - unknown route (1000 routes) | 972 | 0.0003162730 | +0.0003077032 | 3591% slower
Pux PHP - last route (1000 routes) | 999 | 0.0004376847 | +0.0004291148 | 5007% slower
Aura v2 - unknown route (1000 routes) | 976 | 0.0138277517 | +0.0138191818 | 161253% slower
Aura v2 - last route (1000 routes) | 989 | 0.0138914190 | +0.0138828491 | 161996% slower

## First route matching
This benchmark tests how quickly each router can match the first route. 1,000 routes each with 9 arguments.

This benchmark consists of 5 tests. Each test is executed 1,000 times, the results pruned, and then averaged. Values that fall outside of 3 standard deviations of the mean are discarded.

Test Name | Results | Time | + Interval | Change
--------- | ------- | ---- | ---------- | ------
FastRoute - first route | 999 | 0.0000016928 | +0.0000000000 | baseline
Pux PHP - first route | 999 | 0.0000017381 | +0.0000000453 | 3% slower
Symfony4 - first route | 997 | 0.0000029818 | +0.0000012890 | 76% slower
Aura v2 - first route | 977 | 0.0000376436 | +0.0000359508 | 2124% slower

Commits
-------

ee8b201 [Routing] Handle very large set of dynamic routes
 d7658d2 
Member

### nicolas-grekas commented Feb 16, 2018

 I just published “Making Symfony’s Router 77.7x faster — 1/2” https://twitter.com/nicolasgrekas/status/964513325815074817

Closed

Merged

Merged

### Tobion added a commit that referenced this pull request Feb 24, 2018

 feature #26283 [Routing] Redirect from trailing slash to no-slash whe… 
…n possible (nicolas-grekas)

This PR was merged into the 4.1-dev branch.

Discussion
----------

[Routing] Redirect from trailing slash to no-slash when possible

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | #26207
| Doc PR        | -

Implemented as suggest by @Tobion in #26059 (comment)

When a route for /foo exists but the request is for /foo/, we now redirect.
(this complements the flipped side redirection, which already exists.)

Commits
-------

69a4e94 [Routing] Redirect from trailing slash to no-slash when possible
 be1a3b4 

Merged