[Routing] Optimised dumped matcher #21926

Closed
wants to merge 21 commits into
from

Conversation

Projects
None yet
@frankdejonge
Contributor

frankdejonge commented Mar 8, 2017

Q A
Branch? master
Bug fix? no
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets -
License MIT
Doc PR -

TL;DR: I've optimised the PhpMatcherDumper output for a 60x 4.4x performance improvement on a collection of ~800 routes by inducing cyclomatic complexity.

[EDIT] The 60x performance boost was only visible when profiling with blackfire, which is quite possibly a result of the cost of profiling playing a part. After doing some more profiling the realistic benefit of the optimisation is more likely to be in the ranges is 1.3x to 4.4x.

After the previous optimisation I began looking at how the PrefixCollection was adding its performance boost. I spotted another way to do this, which has the same theory behind it (excluding groups based on prefixes). The current implementation only groups when one prefix resides in the other. In this new implementation I've created a way to detect common prefixes, which allows for much more efficient grouping. Every time a route is added to the group it'll either merge into an existing group, merge into a new group with a route that has a common prefix, or merge into a new group with an existing group that has a common prefix.

However, when a parameter is present grouping must only be done AFTER that route, this case is accounted for. In all other cases, where there's no collision routes can be grouped freely because if a group was matched other groups wouldn't have matched.

After all the groups are created the groups are optimised. Groups with fewer than 3 children are inlined into the parent group. This is because a group with 2 children would potentially result in 3 prefix checks while if they are inlines it's 2 checks.

Like with the previous optimisation I've profiled this using blackfire. But the match function didn't show up anymore. I've added usleep calls in the dumped matcher during profiling, which made it show up again. I've verified with @simensen that this is because the wall time of the function was too small for it to be of any interest. When it DID get detected, because of more tasks running, it would show up with around 250 nanoseconds. In comparison, the previous speed improvement brought the wall time down from 7ms to ~2.5ms on a set of ~800 routes.

Because of the altered grouping behaviour I've not modified the PrefixCollection but I've created a new StaticPrefixCollection and updated the PhpMatcherDumper to use that instead.

@assertchris

This comment has been minimized.

Show comment
Hide comment
@assertchris

assertchris Mar 8, 2017

At the risk of sounding like a stick in the mud, adding ~470 LOC seems like a big change sans benchmarks to show the performance improvement and to show that performance doesn't suffer with few routes.

At the risk of sounding like a stick in the mud, adding ~470 LOC seems like a big change sans benchmarks to show the performance improvement and to show that performance doesn't suffer with few routes.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 8, 2017

Contributor

@assertchris It's actually not adding that. Those are extra compiled cases, so there are more test fixtures.

Contributor

frankdejonge commented Mar 8, 2017

@assertchris It's actually not adding that. Those are extra compiled cases, so there are more test fixtures.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 8, 2017

Contributor

@assertchris in addition to that, the fast majority of that code is used at compile time, not at runtime. There are also optimisation which benefit small groups.

Contributor

frankdejonge commented Mar 8, 2017

@assertchris in addition to that, the fast majority of that code is used at compile time, not at runtime. There are also optimisation which benefit small groups.

@stof

This comment has been minimized.

Show comment
Hide comment
@stof

stof Mar 8, 2017

Member

Still, @assertchris is right about one thing: we want to see the numbers when you claim it improves performance.

Member

stof commented Mar 8, 2017

Still, @assertchris is right about one thing: we want to see the numbers when you claim it improves performance.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 8, 2017

Contributor

@stof it has the same behaviour in smaller cases, so I don't know what you want to see. Also, in those cases neither before or after would show up in blackfire, it's literally double digit nanosecond wall time we're talking about.

Contributor

frankdejonge commented Mar 8, 2017

@stof it has the same behaviour in smaller cases, so I don't know what you want to see. Also, in those cases neither before or after would show up in blackfire, it's literally double digit nanosecond wall time we're talking about.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 8, 2017

Contributor

@arnaud-lb since you did the initial work on the prefix optimisation I thought I'd ping you.

Contributor

frankdejonge commented Mar 8, 2017

@arnaud-lb since you did the initial work on the prefix optimisation I thought I'd ping you.

+ * @param $prefix
+ * @param $route
+ *
+ * @return bool|StaticPrefixCollection

This comment has been minimized.

@stof

stof Mar 8, 2017

Member

please use null, not false for the not found case. This would create an API with a nullable return type rather than with a mixed type

@stof

stof Mar 8, 2017

Member

please use null, not false for the not found case. This would create an API with a nullable return type rather than with a mixed type

+ *
+ * @param StaticPrefixCollection|array $item
+ * @param $prefix
+ * @param $route

This comment has been minimized.

@stof

stof Mar 8, 2017

Member

invalid phpdoc: missing types

@stof

stof Mar 8, 2017

Member

invalid phpdoc: missing types

+ // Lower index to pass through the same index again after optimizing.
+ // The first item of the replacements might be a group needing optimization.
+ --$index;
+ continue;

This comment has been minimized.

@stof

stof Mar 8, 2017

Member

useless continue, as it is the end of the iteration anyway

@stof

stof Mar 8, 2017

Member

useless continue, as it is the end of the iteration anyway

@@ -223,14 +223,39 @@ private static function compilePattern(Route $route, $pattern, $isHost)
}
return array(
- 'staticPrefix' => 'text' === $tokens[0][0] ? $tokens[0][1] : '',
+ 'staticPrefix' => static::determineStaticPrefix($route, $tokens),

This comment has been minimized.

@stof

stof Mar 8, 2017

Member

access to private APIs MUST use self, not static. Otherwise it breaks when using inheritance as it will use ChildClass::determineStaticPrefix, which is forbidden by the visibility.

@stof

stof Mar 8, 2017

Member

access to private APIs MUST use self, not static. Otherwise it breaks when using inheritance as it will use ChildClass::determineStaticPrefix, which is forbidden by the visibility.

+ if ('text' !== $tokens[0][0]) {
+ return $route->hasDefault($tokens[0][3])
+ ? ''
+ : $tokens[0][1];

This comment has been minimized.

@stof

stof Mar 8, 2017

Member

should stay on the same line

@stof

stof Mar 8, 2017

Member

should stay on the same line

+ // bar2
+ if (0 === strpos($pathinfo, '/a/b\'b/') && preg_match('#^/a/b\'b/(?P<bar1>[^/]++)$#s', $pathinfo, $matches)) {

This comment has been minimized.

@stof

stof Mar 8, 2017

Member

this inlining looks suspicious to me. It will force checking the prefix twice before doing the regex checks

@stof

stof Mar 8, 2017

Member

this inlining looks suspicious to me. It will force checking the prefix twice before doing the regex checks

This comment has been minimized.

@frankdejonge

frankdejonge Mar 8, 2017

Contributor

@stof I've disabled inlining for groups which contain a route that has the same prefix at the group it resides in.

@frankdejonge

frankdejonge Mar 8, 2017

Contributor

@stof I've disabled inlining for groups which contain a route that has the same prefix at the group it resides in.

- return $this->mergeDefaults(array_replace($matches, array('_route' => 'foo4')), array ());
- }
+ // ababa
+ if ('/ababa' === $pathinfo) {

This comment has been minimized.

@stof

stof Mar 8, 2017

Member

for fully static routes, I agree that checking a prefix first looks useless.

@stof

stof Mar 8, 2017

Member

for fully static routes, I agree that checking a prefix first looks useless.

+ */
+ private function accepts($prefix)
+ {
+ return '' === $this->prefix || strpos($prefix, $this->prefix, 0) === 0;

This comment has been minimized.

@stof

stof Mar 8, 2017

Member

passing the offset explicitly is useless, as you use the default one

@stof

stof Mar 8, 2017

Member

passing the offset explicitly is useless, as you use the default one

@stof

This comment has been minimized.

Show comment
Hide comment
@stof

stof Mar 8, 2017

Member

Well, I never said that the numbers you are showing us should be on the smaller case. You talk about a 60x performance improvement. I want a proof.

Member

stof commented Mar 8, 2017

Well, I never said that the numbers you are showing us should be on the smaller case. You talk about a 60x performance improvement. I want a proof.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 8, 2017

Contributor

@stof ok, how do you want the proof and what is the baseline?

Contributor

frankdejonge commented Mar 8, 2017

@stof ok, how do you want the proof and what is the baseline?

@stof

This comment has been minimized.

Show comment
Hide comment
@stof

stof Mar 8, 2017

Member

@frankdejonge the comparison should be between the master branch and your optimized version. See #5734 (comment) about the way @arnaud-lb did it when applying the initial optimization.

Member

stof commented Mar 8, 2017

@frankdejonge the comparison should be between the master branch and your optimized version. See #5734 (comment) about the way @arnaud-lb did it when applying the initial optimization.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 8, 2017

Contributor

@stof I'll create the benchmark and post it with the results here.

Contributor

frankdejonge commented Mar 8, 2017

@stof I'll create the benchmark and post it with the results here.

@Stelian

This comment has been minimized.

Show comment
Hide comment
@Stelian

Stelian Mar 8, 2017

Contributor

@stof I believe you meant "Thank you for your contribution, this would be a great addition to the framework. Could we maybe have some benchmarks on less routes / more user-land samples before we make a decision?" but for some reason your auto correct went quite poorly.

Contributor

Stelian commented Mar 8, 2017

@stof I believe you meant "Thank you for your contribution, this would be a great addition to the framework. Could we maybe have some benchmarks on less routes / more user-land samples before we make a decision?" but for some reason your auto correct went quite poorly.

@arnaud-lb

This comment has been minimized.

Show comment
Hide comment
@arnaud-lb

arnaud-lb Mar 8, 2017

Contributor

Seems great @frankdejonge, impressive improvements ! If I understand correctly, you pushed the optimizations farther by allowing the routes to be re-ordered (while taking into account that some routes must still be matched before others - those with parameters at least), and striping inefficient / too small groupings ?

Contributor

arnaud-lb commented Mar 8, 2017

Seems great @frankdejonge, impressive improvements ! If I understand correctly, you pushed the optimizations farther by allowing the routes to be re-ordered (while taking into account that some routes must still be matched before others - those with parameters at least), and striping inefficient / too small groupings ?

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 8, 2017

Contributor

@arnaud-lb that's exactly correct!

Contributor

frankdejonge commented Mar 8, 2017

@arnaud-lb that's exactly correct!

@stof

This comment has been minimized.

Show comment
Hide comment
@stof

stof Mar 8, 2017

Member

Because of the altered grouping behaviour I've not modified the PrefixCollection but I've created a new StaticPrefixCollection and updated the PhpMatcherDumper to use that instead.

DumperPrefixCollection is an internal class. So if the new logic does not use it anymore, the class should be removed (or the same class should be reused differently). No need to keep dead internal code around.

Member

stof commented Mar 8, 2017

Because of the altered grouping behaviour I've not modified the PrefixCollection but I've created a new StaticPrefixCollection and updated the PhpMatcherDumper to use that instead.

DumperPrefixCollection is an internal class. So if the new logic does not use it anymore, the class should be removed (or the same class should be reused differently). No need to keep dead internal code around.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 8, 2017

Contributor

While the benchmarks are running I'll provide some more context. Part of the reason for this optimisation is because we use a locale prefix in our bundles. More specifically it's because we also use BeSimpleRouting bundle which adds multiple routes based on locale per route definition. This causes the prefixed routes to not be grouped, missing the opportunity of benefitting from the prefix grouping currently in place. The problem is also not only specific to the BeSimpleRouting bundle, every bundle which provides this will benefit from this more intelligent grouping. Also, in most cases this dumping strategy outputs less code to do the same work. Because of the advanced grouping, we can also be smarter about excluding multiple paths. Like, if a group is followed by another group, the second group can never be matched if the first group did. Simply by chaining the groups in if/elseif statements additional exclusions are possible.

Contributor

frankdejonge commented Mar 8, 2017

While the benchmarks are running I'll provide some more context. Part of the reason for this optimisation is because we use a locale prefix in our bundles. More specifically it's because we also use BeSimpleRouting bundle which adds multiple routes based on locale per route definition. This causes the prefixed routes to not be grouped, missing the opportunity of benefitting from the prefix grouping currently in place. The problem is also not only specific to the BeSimpleRouting bundle, every bundle which provides this will benefit from this more intelligent grouping. Also, in most cases this dumping strategy outputs less code to do the same work. Because of the advanced grouping, we can also be smarter about excluding multiple paths. Like, if a group is followed by another group, the second group can never be matched if the first group did. Simply by chaining the groups in if/elseif statements additional exclusions are possible.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 8, 2017

Contributor

Also, believe it or not, this is not even the most we can push out of the routing bundle just by sorting and grouping routes. There's still the possibility of taking into account segments after parameters to intelligently group routes. For instance, if you have a the paths (in order) /prefixed/path/here/, /prefixed/{param}/segment/ and /prefixed/path/other/ the first and last COULD still be logically grouped because path 3 could never match pattern 2.

Contributor

frankdejonge commented Mar 8, 2017

Also, believe it or not, this is not even the most we can push out of the routing bundle just by sorting and grouping routes. There's still the possibility of taking into account segments after parameters to intelligently group routes. For instance, if you have a the paths (in order) /prefixed/path/here/, /prefixed/{param}/segment/ and /prefixed/path/other/ the first and last COULD still be logically grouped because path 3 could never match pattern 2.

@nicolas-grekas nicolas-grekas added this to the 3.3 milestone Mar 8, 2017

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 8, 2017

Contributor

After profiling it seems that big as well as smaller groups benefit from this optimisation. As an added benefit the cost of routing is more stable (less difference between lower and upper bounds), which is especially beneficial for determining scaling needs.

Contributor

frankdejonge commented Mar 8, 2017

After profiling it seems that big as well as smaller groups benefit from this optimisation. As an added benefit the cost of routing is more stable (less difference between lower and upper bounds), which is especially beneficial for determining scaling needs.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 8, 2017

Contributor

@stof the old code and tree builder has been deleted.

Contributor

frankdejonge commented Mar 8, 2017

@stof the old code and tree builder has been deleted.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 8, 2017

Contributor

@stof I've gone through your comments and corrected them.

Contributor

frankdejonge commented Mar 8, 2017

@stof I've gone through your comments and corrected them.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 8, 2017

Contributor

The findings from the profiling:

In the given small route set

group avg max
master 0.81 1.56
pr 0.59 0.92
+/- 1.37x 1.64x

On a real life project with ~800 routes

group avg max
master 4.73 10.26
pr 1.07 2.42
+/- 4.42x 4.26x

This comparison is done against master which already includes the previous optimisation which brought routing times down from 7ms to 2,5ms.

Contributor

frankdejonge commented Mar 8, 2017

The findings from the profiling:

In the given small route set

group avg max
master 0.81 1.56
pr 0.59 0.92
+/- 1.37x 1.64x

On a real life project with ~800 routes

group avg max
master 4.73 10.26
pr 1.07 2.42
+/- 4.42x 4.26x

This comparison is done against master which already includes the previous optimisation which brought routing times down from 7ms to 2,5ms.

frankdejonge added some commits Mar 10, 2017

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 10, 2017

Contributor

@fabpot I've corrected the things you've commented on.

Contributor

frankdejonge commented Mar 10, 2017

@fabpot I've corrected the things you've commented on.

@Tobion

This comment has been minimized.

Show comment
Hide comment
@Tobion

Tobion Mar 13, 2017

Member

@frankdejonge under which conditions do you reorder routes? Only static ones without parameters?

Member

Tobion commented Mar 13, 2017

@frankdejonge under which conditions do you reorder routes? Only static ones without parameters?

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 13, 2017

Contributor

@Tobion it will reorder routes if it doesn't logically change matching order, all based on the static prefix. For example if there are two logical groups with common prefixes, say /a/ and /b/, and both of them have 2 routes /a/1/, /b/1/, /a/2/, /b/2/. In this collection we know that if the prefix /a/ matches the prefix /b/ wil never match. So in that case we can group them /a/[/a/1/,/a/2/],/b/[/b/1/,/b/2/]. The only exception is then there's an item within the collection which has an item within the group that groups higher up in the tree. For instance if there a collection of /a/1/, /b/1/, /{param}/, /a/2/, /b/2/ the groups can't be merged because /{param}/ needs to be before the routes defined after it. This is because we want to respect the order of declaration. We know why this happens because routes are recursively grouped by longest possible static prefix. Prefix intersections can be detected when their static prefix matches the groups static prefix. In those cases we guide the grouping to only try to groups with routes after the intersection, this is done by holding onto the index of the intersections internally. During the grouping process we simple skip all the items that come before the first intersection.

In essence it's really simple, but it requires some thought to see 1) how it plays out in a larger case and 2) why it is beneficial and 3) you need to really think about which types of exclusions are possible logically.

Contributor

frankdejonge commented Mar 13, 2017

@Tobion it will reorder routes if it doesn't logically change matching order, all based on the static prefix. For example if there are two logical groups with common prefixes, say /a/ and /b/, and both of them have 2 routes /a/1/, /b/1/, /a/2/, /b/2/. In this collection we know that if the prefix /a/ matches the prefix /b/ wil never match. So in that case we can group them /a/[/a/1/,/a/2/],/b/[/b/1/,/b/2/]. The only exception is then there's an item within the collection which has an item within the group that groups higher up in the tree. For instance if there a collection of /a/1/, /b/1/, /{param}/, /a/2/, /b/2/ the groups can't be merged because /{param}/ needs to be before the routes defined after it. This is because we want to respect the order of declaration. We know why this happens because routes are recursively grouped by longest possible static prefix. Prefix intersections can be detected when their static prefix matches the groups static prefix. In those cases we guide the grouping to only try to groups with routes after the intersection, this is done by holding onto the index of the intersections internally. During the grouping process we simple skip all the items that come before the first intersection.

In essence it's really simple, but it requires some thought to see 1) how it plays out in a larger case and 2) why it is beneficial and 3) you need to really think about which types of exclusions are possible logically.

@dmaicher

This comment has been minimized.

Show comment
Hide comment
@dmaicher

dmaicher Mar 13, 2017

Contributor

@frankdejonge I was curious about the changes and tried it on my biggest professional project with around 460 (without debug routes its ~ 250 for production) routes.

Its still running on Symfony 2.8 so I had to update symfony/routing to 3.2.6. With that version my test suite works fine.

I then ported your changes within PhpMatcherDumper and RouteCompiler plus I added the new class StaticPrefixCollection. That should be enough, right?

After clearing the cache I ran my test suite again but now it fails with a 404 on some routes. So it seems something is not working anymore as it did before 😋

Contributor

dmaicher commented Mar 13, 2017

@frankdejonge I was curious about the changes and tried it on my biggest professional project with around 460 (without debug routes its ~ 250 for production) routes.

Its still running on Symfony 2.8 so I had to update symfony/routing to 3.2.6. With that version my test suite works fine.

I then ported your changes within PhpMatcherDumper and RouteCompiler plus I added the new class StaticPrefixCollection. That should be enough, right?

After clearing the cache I ran my test suite again but now it fails with a 404 on some routes. So it seems something is not working anymore as it did before 😋

@dmaicher

This comment has been minimized.

Show comment
Hide comment
@dmaicher

dmaicher Mar 13, 2017

Contributor

Ok so here a quick example that fails:

No route found for "GET /guest/newsletter"

dumped matcher line with your changes:

elseif (0 === strpos($pathinfo, '/guest/newsletter/')) {
    // ca_guest_newsletter_index
    if ('/guest/newsletter' === $trimmedPathinfo) {

dumped matcher line for 3.2.6:

if (0 === strpos($pathinfo, '/guest/newsletter')) {
    // ca_guest_newsletter_index
    if (rtrim($pathinfo, '/') === '/guest/newsletter') {

So before it handled fine without a trailing / but now its expecting it in some cases?

Contributor

dmaicher commented Mar 13, 2017

Ok so here a quick example that fails:

No route found for "GET /guest/newsletter"

dumped matcher line with your changes:

elseif (0 === strpos($pathinfo, '/guest/newsletter/')) {
    // ca_guest_newsletter_index
    if ('/guest/newsletter' === $trimmedPathinfo) {

dumped matcher line for 3.2.6:

if (0 === strpos($pathinfo, '/guest/newsletter')) {
    // ca_guest_newsletter_index
    if (rtrim($pathinfo, '/') === '/guest/newsletter') {

So before it handled fine without a trailing / but now its expecting it in some cases?

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 13, 2017

Contributor

@dmaicher could you try again? I've updated the RouteCompiler to take trailing slash handling into account.

Contributor

frankdejonge commented Mar 13, 2017

@dmaicher could you try again? I've updated the RouteCompiler to take trailing slash handling into account.

@dmaicher

This comment has been minimized.

Show comment
Hide comment
@dmaicher

dmaicher Mar 13, 2017

Contributor

@frankdejonge for me the dumped matcher is still the same here and it keeps failing:

elseif (0 === strpos($pathinfo, '/guest/newsletter/')) {
    // ca_guest_newsletter_index
    if ('/guest/newsletter' === $trimmedPathinfo) {
Contributor

dmaicher commented Mar 13, 2017

@frankdejonge for me the dumped matcher is still the same here and it keeps failing:

elseif (0 === strpos($pathinfo, '/guest/newsletter/')) {
    // ca_guest_newsletter_index
    if ('/guest/newsletter' === $trimmedPathinfo) {
@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 13, 2017

Contributor

@dmaicher could you share a route definition collection with which I can simulate this?

Contributor

frankdejonge commented Mar 13, 2017

@dmaicher could you share a route definition collection with which I can simulate this?

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 13, 2017

Contributor

I think I know which case this is...

Contributor

frankdejonge commented Mar 13, 2017

I think I know which case this is...

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 13, 2017

Contributor

@dmaicher should be good now.

Contributor

frankdejonge commented Mar 13, 2017

@dmaicher should be good now.

@dmaicher

This comment has been minimized.

Show comment
Hide comment
@dmaicher

dmaicher Mar 13, 2017

Contributor

Still the same problem 😢 I can try to put together a minimal route definition set tomorrow morning to reproduce the problem.

Contributor

dmaicher commented Mar 13, 2017

Still the same problem 😢 I can try to put together a minimal route definition set tomorrow morning to reproduce the problem.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 13, 2017

Contributor

@dmaicher I've recreated the issue, working on the solution.

Contributor

frankdejonge commented Mar 13, 2017

@dmaicher I've recreated the issue, working on the solution.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 13, 2017

Contributor

@dmaicher could you try again?

Contributor

frankdejonge commented Mar 13, 2017

@dmaicher could you try again?

@dmaicher

This comment has been minimized.

Show comment
Hide comment
@dmaicher

dmaicher Mar 13, 2017

Contributor

@frankdejonge that case is fine now but I still have some more test fails 😢

Contributor

dmaicher commented Mar 13, 2017

@frankdejonge that case is fine now but I still have some more test fails 😢

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 13, 2017

Contributor

@dmaicher what kind of failures?

Contributor

frankdejonge commented Mar 13, 2017

@dmaicher what kind of failures?

@dmaicher

This comment has been minimized.

Show comment
Hide comment
@dmaicher

dmaicher Mar 14, 2017

Contributor

Here is one more failing example:

No route found for "GET /statistics/semantic"

Dumped matcher for your version:

        elseif (0 === strpos($pathinfo, '/statistics/semantic')) {
            // ca_statistics_semantic
            if ('/statistics/semantic/' === $pathinfo) {
                if (!in_array($canonicalMethod, array('GET', 'POST'))) {
                    $allow = array_merge($allow, array('GET', 'POST'));
                    goto not_ca_statistics_semantic;
                }

                return array(...)

Dumped matcher for 3.2.6:

        if (0 === strpos($pathinfo, '/statistics/semantic')) {
            // ca_statistics_semantic
            if (rtrim($pathinfo, '/') === '/statistics/semantic') {
                if (!in_array($this->context->getMethod(), array('GET', 'POST', 'HEAD'))) {
                    $allow = array_merge($allow, array('GET', 'POST', 'HEAD'));
                    goto not_ca_statistics_semantic;
                }

                if (substr($pathinfo, -1) !== '/') {
                    return $this->redirect($pathinfo.'/', 'ca_statistics_semantic');
                }

                return array (...)
Contributor

dmaicher commented Mar 14, 2017

Here is one more failing example:

No route found for "GET /statistics/semantic"

Dumped matcher for your version:

        elseif (0 === strpos($pathinfo, '/statistics/semantic')) {
            // ca_statistics_semantic
            if ('/statistics/semantic/' === $pathinfo) {
                if (!in_array($canonicalMethod, array('GET', 'POST'))) {
                    $allow = array_merge($allow, array('GET', 'POST'));
                    goto not_ca_statistics_semantic;
                }

                return array(...)

Dumped matcher for 3.2.6:

        if (0 === strpos($pathinfo, '/statistics/semantic')) {
            // ca_statistics_semantic
            if (rtrim($pathinfo, '/') === '/statistics/semantic') {
                if (!in_array($this->context->getMethod(), array('GET', 'POST', 'HEAD'))) {
                    $allow = array_merge($allow, array('GET', 'POST', 'HEAD'));
                    goto not_ca_statistics_semantic;
                }

                if (substr($pathinfo, -1) !== '/') {
                    return $this->redirect($pathinfo.'/', 'ca_statistics_semantic');
                }

                return array (...)
@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 14, 2017

Contributor

This could be the effect of other changes, looks like your dumper doesn't allow a trailing slack to be absent, which is caused by having a non redirecting matcher. I'll look into this some more, try and reproduce it and come back. I'm AFK now, but might be good to look into the strict matching setting on your end too.

Contributor

frankdejonge commented Mar 14, 2017

This could be the effect of other changes, looks like your dumper doesn't allow a trailing slack to be absent, which is caused by having a non redirecting matcher. I'll look into this some more, try and reproduce it and come back. I'm AFK now, but might be good to look into the strict matching setting on your end too.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
Contributor

frankdejonge commented Mar 14, 2017

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 15, 2017

Contributor

@dmaicher I would have expected a piece of code like this:

            // a_fifth
            if ('/a/55' === $trimmedPathinfo) {
                if (substr($pathinfo, -1) !== '/') {
                    return $this->redirect($pathinfo.'/', 'a_fifth');
                }

                return array('_route' => 'a_fifth');
            }

Which is generated for a route if it has redirect support. But I see in the code that those redirects only happen if the matcher supports redirects, but the method should have no methods (thus matching all methods) or contains the HEAD method. This code wasn't changed in this PR. Perhaps @fabpot could shed some light on this? I saw he was the last one to touch that bit of logic.

Contributor

frankdejonge commented Mar 15, 2017

@dmaicher I would have expected a piece of code like this:

            // a_fifth
            if ('/a/55' === $trimmedPathinfo) {
                if (substr($pathinfo, -1) !== '/') {
                    return $this->redirect($pathinfo.'/', 'a_fifth');
                }

                return array('_route' => 'a_fifth');
            }

Which is generated for a route if it has redirect support. But I see in the code that those redirects only happen if the matcher supports redirects, but the method should have no methods (thus matching all methods) or contains the HEAD method. This code wasn't changed in this PR. Perhaps @fabpot could shed some light on this? I saw he was the last one to touch that bit of logic.

@dmaicher

This comment has been minimized.

Show comment
Hide comment
@dmaicher

dmaicher Mar 15, 2017

Contributor

Maybe my test fails are related to other changes on master for the routing that are not in 3.2 yet? I cannot test master unfortunately as its not compatible with Symfony 2.8 (missing Psr\Container\ContainerInterface etc) 😋

Contributor

dmaicher commented Mar 15, 2017

Maybe my test fails are related to other changes on master for the routing that are not in 3.2 yet? I cannot test master unfortunately as its not compatible with Symfony 2.8 (missing Psr\Container\ContainerInterface etc) 😋

@dmaicher

This comment has been minimized.

Show comment
Hide comment
@dmaicher

dmaicher Mar 15, 2017

Contributor

Ok @frankdejonge I think its related to something else indeed. I just took PhpMatcherDumper and RouteCompiler from master and have the same fails.

Contributor

dmaicher commented Mar 15, 2017

Ok @frankdejonge I think its related to something else indeed. I just took PhpMatcherDumper and RouteCompiler from master and have the same fails.

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 15, 2017

Contributor

@dmaicher cool, thanks for trying out this PR in a real project btw! Very valuable!

Contributor

frankdejonge commented Mar 15, 2017

@dmaicher cool, thanks for trying out this PR in a real project btw! Very valuable!

@dmaicher

This comment has been minimized.

Show comment
Hide comment
@dmaicher

dmaicher Mar 15, 2017

Contributor

I also did a quick benchmark for your PR on my project using this script:

https://gist.github.com/dmaicher/5e85c23145e84a4400354224da85bd08

I did 5 runs for each version.

3.2.6 routing component:

matched: 218
not found: 19
method not allowed: 2

1m6.975s
1m1.886s
1m3.858s
1m2.247s
1m2.309s

your PR changes:

matched: 218
not found: 19
method not allowed: 2

1m2.529s
1m1.204s
0m59.481s
1m0.846s
1m1.294s

I don't see a clear performance advantage here but it might be because the optimization is not effective for my route collection.

@frankdejonge do you see something wrong with the benchmark script? Maybe some other people could try it on real projects? 😊

Contributor

dmaicher commented Mar 15, 2017

I also did a quick benchmark for your PR on my project using this script:

https://gist.github.com/dmaicher/5e85c23145e84a4400354224da85bd08

I did 5 runs for each version.

3.2.6 routing component:

matched: 218
not found: 19
method not allowed: 2

1m6.975s
1m1.886s
1m3.858s
1m2.247s
1m2.309s

your PR changes:

matched: 218
not found: 19
method not allowed: 2

1m2.529s
1m1.204s
0m59.481s
1m0.846s
1m1.294s

I don't see a clear performance advantage here but it might be because the optimization is not effective for my route collection.

@frankdejonge do you see something wrong with the benchmark script? Maybe some other people could try it on real projects? 😊

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 15, 2017

Contributor

@dmaicher it really depends on the project. Also, the times of the script is not the cleanest indicator of wether there are speed improvements. Did you look at the benchmark script I posted above?

Contributor

frankdejonge commented Mar 15, 2017

@dmaicher it really depends on the project. Also, the times of the script is not the cleanest indicator of wether there are speed improvements. Did you look at the benchmark script I posted above?

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 15, 2017

Contributor

@dmaicher Also, it could be that your routes grouped nicely with the current optimisation methods, but not all of them do. In the case of the application I was working in the algorithm didn't match at all.

Contributor

frankdejonge commented Mar 15, 2017

@dmaicher Also, it could be that your routes grouped nicely with the current optimisation methods, but not all of them do. In the case of the application I was working in the algorithm didn't match at all.

@dmaicher

This comment has been minimized.

Show comment
Hide comment
@dmaicher

dmaicher Mar 15, 2017

Contributor

@frankdejonge yes I also believe your new optimizations simply don't apply for my application 😉 This particular app has 4 different hosts and a lot of routes are filtered/matched by host first.

I checked your benchmark script but my script was simply easier to run on an existing app without extracting route definitions first.

The good news is that your solution is for sure not slower than 3.2.6 in my case 😄

Contributor

dmaicher commented Mar 15, 2017

@frankdejonge yes I also believe your new optimizations simply don't apply for my application 😉 This particular app has 4 different hosts and a lot of routes are filtered/matched by host first.

I checked your benchmark script but my script was simply easier to run on an existing app without extracting route definitions first.

The good news is that your solution is for sure not slower than 3.2.6 in my case 😄

@frankdejonge

This comment has been minimized.

Show comment
Hide comment
@frankdejonge

frankdejonge Mar 15, 2017

Contributor

@dmaicher nope, on average they are faster in your results too, so that's good.

Contributor

frankdejonge commented Mar 15, 2017

@dmaicher nope, on average they are faster in your results too, so that's good.

@nicolas-grekas

This comment has been minimized.

Show comment
Hide comment
@nicolas-grekas

nicolas-grekas Mar 22, 2017

Member

@dmaicher all good on your side, no remaining failing edge cases?

Member

nicolas-grekas commented Mar 22, 2017

@dmaicher all good on your side, no remaining failing edge cases?

@dmaicher

This comment has been minimized.

Show comment
Hide comment
@dmaicher

dmaicher Mar 22, 2017

Contributor

@nicolas-grekas yes the remaining test fails were not related to the changes in this PR (Running a Symfony 2.8 app and updated symfony/routing to 3.2.6 to try the changes of this PR which resulted in some other weird test fails). So from my side all good 👍

Contributor

dmaicher commented Mar 22, 2017

@nicolas-grekas yes the remaining test fails were not related to the changes in this PR (Running a Symfony 2.8 app and updated symfony/routing to 3.2.6 to try the changes of this PR which resulted in some other weird test fails). So from my side all good 👍

@fabpot

This comment has been minimized.

Show comment
Hide comment
@fabpot

fabpot Mar 22, 2017

Member

Thank you @frankdejonge.

Member

fabpot commented Mar 22, 2017

Thank you @frankdejonge.

@fabpot fabpot closed this Mar 22, 2017

fabpot added a commit that referenced this pull request Mar 22, 2017

minor #21926 [Routing] Optimised dumped matcher (frankdejonge)
This PR was squashed before being merged into the 3.3-dev branch (closes #21926).

Discussion
----------

[Routing] Optimised dumped matcher

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | -
| License       | MIT
| Doc PR        | -

TL;DR: I've optimised the PhpMatcherDumper output for a <del>60x</del> 4.4x performance improvement on a collection of ~800 routes by inducing cyclomatic complexity.

[EDIT] The 60x performance boost was only visible when profiling with blackfire, which is quite possibly a result of the cost of profiling playing a part. After doing some more profiling the realistic benefit of the optimisation is more likely to be in the ranges is 1.3x to 4.4x.

After the previous optimisation I began looking at how the PrefixCollection was adding its performance boost. I spotted another way to do this, which has the same theory behind it (excluding groups based on prefixes). The current implementation only groups when one prefix resides in the other. In this new implementation I've created a way to detect common prefixes, which allows for much more efficient grouping. Every time a route is added to the group it'll either merge into an existing group, merge into a new group with a route that has a common prefix, or merge into a new group with an existing group that has a common prefix.

However, when a parameter is present grouping must only be done AFTER that route, this case is accounted for. In all other cases, where there's no collision routes can be grouped freely because if a group was matched other groups wouldn't have matched.

After all the groups are created the groups are optimised. Groups with fewer than 3 children are inlined into the parent group. This is because a group with 2 children would potentially result in 3 prefix checks while if they are inlines it's 2 checks.

Like with the previous optimisation I've profiled this using blackfire. But the match function didn't show up anymore. I've added `usleep` calls in the dumped matcher during profiling, which made it show up again. I've verified with @simensen that this is because the wall time of the function was too small for it to be of any interest. When it DID get detected, because of more tasks running, it would show up with around 250 nanoseconds. In comparison, the previous speed improvement brought the wall time down from 7ms to ~2.5ms on a set of ~800 routes.

Because of the altered grouping behaviour I've not modified the PrefixCollection but I've created a new StaticPrefixCollection and updated the PhpMatcherDumper to use that instead.

Commits
-------

449b691 [Routing] Optimised dumped matcher

@frankdejonge frankdejonge deleted the frankdejonge:feature/optimised-dumped-matcher branch Mar 22, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment