[Routing] Simplified php matcher dumper (and optimized generated matcher) #5734

Merged
merged 1 commit into from Oct 19, 2012

Conversation

Projects
None yet
5 participants
Contributor

arnaud-lb commented Oct 12, 2012

Bug fix: no
Feature addition: no
Related: #3378
Backwards compatibility break: no
Symfony2 tests pass: yes

This simplifies the php matcher dumper by allowing the dumper to re-organize routes in the dumper's own structure.

As a result, dumping is made a little simpler. This is also helps much for my hostname-based routes PR #3378.

Reorganizing routes also allows to find more optimization opportunities:

Currently the dumper wraps some collections of routes in a if (0 === strpos($pathinfo, '/someprefix') if the collection has user-defined prefix, and if it contains more than one direct child Route. This can miss many optimization opportunities.

The PR changes this by building a prefix tree of routes based on the static prefix extracted from routes' patterns. Then every leave having a prefix and more than one child (route or collection) will be wrapped in a if statement.

Example:

// No explicit prefix is specified
@Route("/cafe")
@Route("/cacao")
@Route("/coca")

is compiled like this:

if (url starts with /c) {
    if (url starts with /ca) {
        // test route "/cafe"
        // test route "/cacao"
    }
    // test route "/coca"
}

Some tests have many white space changes, much more easier to review here

src/Symfony/Component/Routing/Matcher/Dumper/DumperPrefixCollection.php
+ return $this;
+
+ // Route's prefix contains current leave's prefix
+ } else if ('' === $this->getPrefix() || 0 === strpos($prefix, $this->getPrefix())) {
@stof

stof Oct 12, 2012

Member

simply use if as the previous if returns

src/Symfony/Component/Routing/Matcher/Dumper/DumperPrefixCollection.php
+ return $collection;
+
+ // No match, fallback to parent (recursively)
+ } else {
@stof

stof Oct 12, 2012

Member

no need to use else here

...ny/Component/Routing/Tests/Matcher/Dumper/DumperPrefixCollectionTest.php
+ $this->assertSame($expect, $result->getRoot()->toString(function($route) {
+ if ($route instanceof DumperPrefixCollection) {
+ return sprintf("coll %s", $route->getPrefix());
+ } else {
@stof

stof Oct 12, 2012

Member

you can remove the else here as the if returns

src/Symfony/Component/Routing/Matcher/Dumper/DumperCollection.php
+ /**
+ * Returns a debug string representation of this collection.
+ *
+ * @param Callable $toString Callback used to get the string representation of each individual child
@stof

stof Oct 12, 2012

Member

callable should have a lowercase c

src/Symfony/Component/Routing/Matcher/Dumper/DumperCollection.php
+ }
+
+ /**
+ * Returns a debug string representation of this collection.
@stof

stof Oct 12, 2012

Member

Does it really make sense to define this method in the class ? It looks to me that its only use case is the unit test, so it should be a method defined in the testcase (and accepting the router collection as argument)

@Tobion

Tobion Oct 12, 2012

Member

I agree

src/Symfony/Component/Routing/Matcher/Dumper/DumperPrefixCollection.php
+ $stack = array_reverse($this->getRoutes());
+
+ while (null !== $route = array_pop($stack)) {
+ if ($route instanceof static) {
@stof

stof Oct 12, 2012

Member

Shouldn't this be self ? Otherwise, Embedding a DumperPrefixCollection inside an instance of a child class would not call the method.
And using static is only useful when you consider inheritance.

+ if (0 === strpos($pathinfo, '/test')) {
+ if (0 === strpos($pathinfo, '/test/baz')) {
+ // baz
+ if ($pathinfo === '/test/baz') {
@Tobion

Tobion Oct 12, 2012

Member

To match route baz there is now a performance decrease because the simple equality check is wrapped by 2 prefix tests.
So the question is whether there is an optimization for the average case at all. It depends on the number of sub-routes and the type of route (static or with variables).
So the perfect optimization would also consider this fact and adjust the wraps depending on the number of sub-routes.
And the concrete number would need to be inferred from performance tests...

@arnaud-lb

arnaud-lb Oct 13, 2012

Contributor

Performance wasn't the primary motivation of this PR, but I agree I should have added some performance numbers.

To match route baz there is now a performance decrease because the simple equality check is wrapped by 2 prefix tests

On average the performance increases: For each non matching prefix test, at least two route matchings are avoided. The loss incurred by this extra prefix test is mitigated by prefix tests on previous routes which already avoided some route matching at the time a particular route is reached.

I did a simple test that matches all possible routes in this test case 50K times each:

master: 7.4s
routing-php-dumper-simplification: 6.4s

The improvement should be bigger as more routes are added and less explicit route collection prefix are used. However you are right, there is a small performance loss on the few first routes of the matcher. I'll see if I can improve that.

@arnaud-lb

arnaud-lb Oct 13, 2012

Contributor

Here is a diagram of the time it takes to match each route 50K times:

diagram

Member

Tobion commented Oct 13, 2012

I'm not sure if adding these specific classes just for dumping is the best implementation because they duplicate some logic and this optimization should also work out-of-the-box with the standard RouteCollection etc.
What I have in mind is a new method in RouteCollection like RouteCollection::optimizeTree that returns a new RouteCollection with the Routes that includes these optimization you do here. So there would probably be no need for the new classes.

It think it requires some changes in RouteCollection like the handling of prefix that must start with a slash currently, which is too restrictive. But it should be possible.

src/Symfony/Component/Routing/Matcher/Dumper/DumperCollection.php
+ public function __construct(array $routes = array(), array $attributes = array())
+ {
+ $this->routes = $routes;
+ $this->attributes = $attributes;
@Tobion

Tobion Oct 13, 2012

Member

I don't see this used anywhere else.

Contributor

arnaud-lb commented Oct 13, 2012

@Tobion

I'm not sure if adding these specific classes just for dumping is the best implementation because they duplicate some logic and this optimization should also work out-of-the-box with the standard RouteCollection etc.

I think RouteCollection and DumperCollection do not share the same concerns; and RouteCollection does things that don't allow to reorganize routes freely. For instance when adding a collection to a RouteCollection this changes all the child routes' prefix, requirements, options, etc. When setting a collection's prefix, this prepends the prefix to every child route's pattern, etc.

src/Symfony/Component/Routing/Matcher/Dumper/DumperCollection.php
+ */
+ public function getRoot()
+ {
+ return (null !== $parent = $this->parent) ? $parent->getRoot() : $this;
@Tobion

Tobion Oct 13, 2012

Member

The $parent variable is not needed

Contributor

arnaud-lb commented Oct 15, 2012

squashed the CS commits

Contributor

arnaud-lb commented Oct 15, 2012

@fabpot @Tobion this PR is ready to be merged if everyone agrees

src/Symfony/Component/Routing/Matcher/Dumper/DumperPrefixCollection.php
+
+ // Route's prefix contains current leave's prefix
+ if ('' === $this->getPrefix() || 0 === strpos($prefix, $this->getPrefix())) {
+
@Tobion

Tobion Oct 15, 2012

Member

empty line

@arnaud-lb

arnaud-lb Oct 15, 2012

Contributor

@Tobion Thanks for your reviews, I've fixed the issues you pointed

This one is fixed too, I don't know why github still shows it

src/Symfony/Component/Routing/Matcher/Dumper/PhpMatcherDumper.php
+ * Flattens a tree of routes to a single collection
+ *
+ * @param RouteCollection $routes Collection of routes
+ * @return DumperCollection
@Tobion

Tobion Oct 15, 2012

Member

missing empty line (same below)

src/Symfony/Component/Routing/Matcher/Dumper/DumperCollection.php
+ /**
+ * Returns the root of the collection.
+ *
+ * @return DumperCollection|null The root collection
@Tobion

Tobion Oct 15, 2012

Member

It cannot return null

src/Symfony/Component/Routing/Matcher/Dumper/DumperCollection.php
+ *
+ * @return DumperCollection|DumperRoute The route at given index
+ */
+ public function getRoute($index)
@Tobion

Tobion Oct 15, 2012

Member

it doesnt seem to be used. and it doesnt handle invalid indices which result in a php notice

src/Symfony/Component/Routing/Matcher/Dumper/DumperRoute.php
+ *
+ * @return RouteCollection the parent collection
+ */
+ public function getParentCollection()
@Tobion

Tobion Oct 15, 2012

Member

This doesnt seem to be called, so the property parentCollection seems useless.

src/Symfony/Component/Routing/Matcher/Dumper/DumperRoute.php
+ {
+ $this->name = $name;
+ $this->route = $route;
+ $this->parentCollection = $parentCollection;
@stof

stof Oct 15, 2012

Member

the parent collection is not used

src/Symfony/Component/Routing/Matcher/Dumper/DumperCollection.php
+ *
+ * @param DumperCollection|DumperRoute The route or route collection
+ */
+ public function addRoute($route)
@Tobion

Tobion Oct 15, 2012

Member

The difference between addRoute and addAllRoutes is not really clear. They seem to overlap.

@arnaud-lb

arnaud-lb Oct 15, 2012

Contributor

addRoute() adds a route or sub-collection to the collection

and addAllRoutes() adds all routes from the given collection to the collection

@Tobion

Tobion Oct 15, 2012

Member

But they can be both be used to achieve the same. So from an API point of view, I don't see what is supposed to do what.
addAllRoutes($collection) does not add more than addRoute($collection). So the name is very misleading. And addAllRoutes can also add sub-collections, not just Routes.

@arnaud-lb

arnaud-lb Oct 15, 2012

Contributor

But they can be both be used to achieve the same

See ConstraintViolationList add() and addAll() methods:

https://github.com/symfony/symfony/blob/master/src/Symfony/Component/Validator/ConstraintViolationList.php#L63

You can probably achieve the same with add() than with addAll(), but addAll() is much more convenient for adding more than one item.

@stof

stof Oct 15, 2012

Member

@arnaud-lb Thsi is not equivalent. If you want an equivalent, addRoute should not support receiving a collection but only a route

@arnaud-lb

arnaud-lb Oct 15, 2012

Contributor

I don't see a problem with addRoute() receiving both routes and collections. It does exactly the same thing whether the argument is a route or a collection.

@stof

stof Oct 15, 2012

Member

Exactly the same ? not really as there is a if :)

But my point was that it cannot be compared with the ConstraintViolationList

src/Symfony/Component/Routing/Matcher/Dumper/PhpMatcherDumper.php
+ *
+ * @return DumperCollection
+ */
+ private function flattenRouteCollection(RouteCollection $routes)
@Tobion

Tobion Oct 15, 2012

Member

Does it really flatten a tree to a single collection? From what I see, the resulting DumperCollection can have sub-collections?!

@arnaud-lb

arnaud-lb Oct 15, 2012

Contributor

Does it really flatten a tree to a single collection?

Yes, it returns a DumperCollection with only routes in it (no sub-collections)

@Tobion

Tobion Oct 15, 2012

Member

Ah, I see how it works now. I find the implementation pretty strange. Why not simply create only one DumperCollection and then add each route to it. No nead to create itermediate dumpercollection that are then not used at all. This way, addAllRoutes is not even needed.

@arnaud-lb

arnaud-lb Oct 15, 2012

Contributor

Ah, I see how it works now. I find the implementation pretty strange. Why not simply create only one DumperCollection and then add each route to it. No nead to create itermediate dumpercollection that are then not used at all.

I think that's a common recursive pattern. I've tried searching "flatten a tree" on google and I found similar implementations:

http://stackoverflow.com/questions/3907408/dotnet-flatten-a-tree-list-of-lists-with-one-statement

(replace Flatten with flattenRouteCollection and AddRange with addAllRoutes and it's the same)

...ny/Component/Routing/Tests/Matcher/Dumper/DumperPrefixCollectionTest.php
+ $coll = new DumperPrefixCollection;
+ $coll->setPrefix('');
+
+ $route = new DumperRoute('bar', new Route('/foo/bar'), new RouteCollection);
@Tobion

Tobion Oct 15, 2012

Member

new RouteCollection not used

src/Symfony/Component/Routing/Matcher/Dumper/DumperCollection.php
+ /**
+ * Returns the parent collection.
+ *
+ * @return DumperCollection The parent collection
@Tobion

Tobion Oct 16, 2012

Member

it can also be null

src/Symfony/Component/Routing/Matcher/Dumper/DumperPrefixCollection.php
+ }
+
+ // No match, fallback to parent (recursively)
+ return $this->getParent()->addPrefixRoute($route);
@Tobion

Tobion Oct 16, 2012

Member

getParent could be null at this point

@arnaud-lb

arnaud-lb Oct 16, 2012

Contributor

This cannot happen (unless the root of the collection has a non empty prefix, which is invalid)

What should I do ?

@Tobion

Tobion Oct 16, 2012

Member

Of course this can happen. setParent and setPrefix are public, so sure this case can be brought about.
And where is it made sure that the root cannot have a prefix?
How about if (null === $this->getParent() || '' === $this->getPrefix() || 0 === strpos($prefix, $this->getPrefix())) above?

@arnaud-lb

arnaud-lb Oct 16, 2012

Contributor

Of course this can happen. setParent and setPrefix are public, so sure this case can be brought about.

I'm sure I could break any code by messing with its state, too ;) Attempts to avoid that would make the code less flexible and more complicated.

@arnaud-lb

arnaud-lb Oct 16, 2012

Contributor

Best I can do is throw an exception like "Collection root's prefix must be an empty string" when parent is null when this line is reached. What do you think ?

Do you have any last remarks, so that I could do one final fix ?

Thanks

@Tobion

Tobion Oct 16, 2012

Member

What about my idea null === $this->getParent() above? Also I'd make setParent and getParent protected, as it should not be public.

@arnaud-lb

arnaud-lb Oct 16, 2012

Contributor

What about my idea null === $this->getParent() above?

Entering in this branch if the rest of the condition isn't true won't construct a valid tree. Having a null parent here is an error / shouldn't happen, so it seems reasonable to handle this by throwing an exception ?

@Tobion

Tobion Oct 16, 2012

Member

An exception is also fine.

src/Symfony/Component/Routing/Matcher/Dumper/DumperPrefixCollection.php
+ $collection = new DumperPrefixCollection();
+ $collection->setPrefix(substr($prefix, 0, strlen($this->getPrefix())+1));
+ $this->add($collection);
+ return $collection->addPrefixRoute($route);
@Tobion

Tobion Oct 16, 2012

Member

missing empty line above return (same in line 57)

src/Symfony/Component/Routing/Matcher/Dumper/DumperPrefixCollection.php
+ }
+
+ // Prefix starts with route's prefix
+ if ('' === $this->getPrefix() || 0 === strpos($prefix, $this->getPrefix())) {
@Tobion

Tobion Oct 16, 2012

Member

I'd use the property instead of the getter for performance: $this->prefix (same line 55)

Member

Tobion commented Oct 16, 2012

When the above is fixed, I think it's good to be merged.

Contributor

arnaud-lb commented Oct 17, 2012

Fixed; thanks @Tobion @stof for your reviews

Member

Tobion commented Oct 19, 2012

@arnaud-lb could you please test whether your PR fixes #5780 as a side-effect?
I can image that it's fixed because you use $route->compile()->getStaticPrefix(); for prefix optimization.
If it's fixed please add a test case. If not, that's fine, and we need to fix it in another PR.

fabpot added a commit that referenced this pull request Oct 19, 2012

merged branch arnaud-lb/routing-php-dumper-simplification (PR #5734)
This PR was merged into the master branch.

Commits
-------

e54d749 [Routing] Simplified php matcher dumper (and optimized generated matcher)

Discussion
----------

[Routing] Simplified php matcher dumper (and optimized generated matcher)

Bug fix: no
Feature addition: no
Related: #3378
Backwards compatibility break: no
Symfony2 tests pass: yes

This simplifies the php matcher dumper by allowing the dumper to re-organize routes in the dumper's own structure.

As a result, dumping is made a little simpler. This is also helps much for my hostname-based routes PR #3378.

Reorganizing routes also allows to find more optimization opportunities:

Currently the dumper wraps some collections of routes in a `if (0 === strpos($pathinfo, '/someprefix')` if the collection has user-defined prefix, and if it contains more than one direct child Route. This can miss many optimization opportunities.

The PR changes this by building a prefix tree of routes based on the static prefix extracted from routes' patterns. Then every leave having a prefix and more than one child (route or collection) will be wrapped in a `if` statement.

Example:

```
// No explicit prefix is specified
@Route("/cafe")
@Route("/cacao")
@Route("/coca")
```

is compiled like this:

```
if (url starts with /c) {
    if (url starts with /ca) {
        // test route "/cafe"
        // test route "/cacao"
    }
    // test route "/coca"
}
```

Some tests have many white space changes, much more easier to review [here](https://github.com/symfony/symfony/pull/5734/files?w=1)

---------------------------------------------------------------------------

by Tobion at 2012-10-13T02:27:54Z

I'm not sure if adding these specific classes just for dumping is the best implementation because they duplicate some logic and this optimization should also work out-of-the-box with the standard RouteCollection etc.
What I have in mind is a new method in RouteCollection like `RouteCollection::optimizeTree` that returns a new RouteCollection with the Routes that includes these optimization you do here. So there would probably be no need for the new classes.

It think it requires some changes in RouteCollection like the handling of prefix that must start with a slash currently, which is too restrictive. But it should be possible.

---------------------------------------------------------------------------

by arnaud-lb at 2012-10-13T12:52:32Z

@Tobion

> I'm not sure if adding these specific classes just for dumping is the best implementation because they duplicate some logic and this optimization should also work out-of-the-box with the standard RouteCollection etc.

I think RouteCollection and DumperCollection do not share the same concerns; and RouteCollection does things that don't allow to reorganize routes freely. For instance when adding a collection to a RouteCollection this changes all the child routes' prefix, requirements, options, etc. When setting a collection's prefix, this prepends the prefix to every child route's pattern, etc.

---------------------------------------------------------------------------

by arnaud-lb at 2012-10-15T08:50:23Z

squashed the CS commits

---------------------------------------------------------------------------

by arnaud-lb at 2012-10-15T13:50:16Z

@fabpot @Tobion this PR is ready to be merged if everyone agrees

---------------------------------------------------------------------------

by Tobion at 2012-10-16T18:10:36Z

When the above is fixed, I think it's good to be merged.

---------------------------------------------------------------------------

by arnaud-lb at 2012-10-17T08:40:20Z

Fixed; thanks @Tobion @stof for your reviews

---------------------------------------------------------------------------

by Tobion at 2012-10-19T03:30:10Z

@arnaud-lb could you please test whether your PR fixes #5780 as a side-effect?
I can image that it's fixed because you use `$route->compile()->getStaticPrefix();` for prefix optimization.
If it's fixed please add a test case. If not, that's fine, and we need to fix it in another PR.

@fabpot fabpot merged commit e54d749 into symfony:master Oct 19, 2012

1 check failed

default The Travis build failed
Details
Contributor

acasademont commented Oct 22, 2012

Just got the time test the new optimizations and they work very nice! Thanks for the patch!

Looking at the new matcher code I would like to know if things could be optimized a little bit further. My first lines of the matcher now look like this:

        if (0 === strpos($pathinfo, '/_')) {
            // _wdt
            if (0 === strpos($pathinfo, '/_wdt') && preg_match('#^/_wdt/(?<token>[^/]++)$#s', $pathinfo, $matches)) {
                return array_merge($this->mergeDefaults($matches, array (  '_controller' => 'web_profiler.controller.profiler:toolbarAction',)), array('_route' => '_wdt'));
            }

            if (0 === strpos($pathinfo, '/_profiler')) {
                // _profiler_search
                if ($pathinfo === '/_profiler/search') {
                    return array (  '_controller' => 'web_profiler.controller.profiler:searchAction',  '_route' => '_profiler_search',);
                }

When getting into the first if it's obvious that all subsequent routes have the '/_' prefixe so why not strip those 2 chars from the $pathinfo variable and the child routes? The code would now looke like this:

        if (0 === strpos($pathinfo, '/_')) {
            $pathinfo = substr($pathinfo, 2);
            // _wdt
            if (0 === strpos($pathinfo, 'wdt') && preg_match('#^wdt/(?<token>[^/]++)$#s', $pathinfo, $matches)) {
                return array_merge($this->mergeDefaults($matches, array (  '_controller' => 'web_profiler.controller.profiler:toolbarAction',)), array('_route' => '_wdt'));
            }

            if (0 === strpos($pathinfo, 'profiler')) {
                $pathinfo = substr($pathinfo, 8);
                // _profiler_search
                if ($pathinfo === '/search') {
                    return array (  '_controller' => 'web_profiler.controller.profiler:searchAction',  '_route' => '_profiler_search',);
                }

What do you think @arnaud-lb?

Contributor

arnaud-lb commented Oct 22, 2012

@acasademont This has to be measured :) But intuitively I think that the cost of $pathinfo = substr($pathinfo, 2); (function call, variable allocation/free, copying) could be greater than the cost of comparing the prefix again a few times.

Contributor

acasademont commented Oct 22, 2012

Would be nice to have the test code you used in this PR for the previous discussion about performance, is it available somewhere?

Contributor

arnaud-lb commented Oct 22, 2012

Sure: https://gist.github.com/63bb06c4519e7395b381

$dumper is the same dumper than the one created by PhpMatcherDumperTest::testDump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment