Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert router stack to an object of arrays #2601

Closed
ilanbiala opened this issue Mar 23, 2015 · 13 comments
Closed

Convert router stack to an object of arrays #2601

ilanbiala opened this issue Mar 23, 2015 · 13 comments
Labels

Comments

@ilanbiala
Copy link

Object and array access is almost equal in performance, so why not change the stack from an array of layers to an object with paths as the keys and an array with all the handlers as the values? That would mean that If a middleware is at the "end" of the stack, it could still be accessed in a comparable amount of time. For example, two stacks, with simplified layers:


var stack = [{
  regexp: /^\/?(?=/|$)/i,
  handle: Function
}, {
  regexp: /^\/?(?=/|$)/i,
  handle: Function
}, {
  regexp: /^\/products\/?$/i,
  handle: Function,
  route: { path: '/products', stack: [Object], methods: [Object] }
}, { 
  regexp: /^\/product\/(?:([^/]+?))\/?$/i,
  handle: Function,
  route: { path: '/product/:id', stack: [Object], methods: [Object] }
}, {
  regexp: /^\/product\/?$/i,
  handle: Function,
  route: { path: '/product', stack: [Object], methods: [Object] }
}, {
  regexp: /^\/product\/(?:([^/]+?))\/?$/i,
  handle: Function,
  route: { path: '/product/:id', stack: [Object], methods: [Object] }
}, {
  regexp: /^\/product\/(?:([^/]+?))\/?$/i,
  handle: Function,
  route: { path: '/product/:id', stack: [Object], methods: [Object] }
}, {
  regexp: /^\/products\/?$/i,
  handle: Function,
  route: { path: '/products', stack: [Object], methods: [Object] }
}];

// Accessing a route here requires iterating through all the layers in the array.
// Now say the stack were an object
var stack = {
  '^\/products\/?$': {
    handle: Function,
    route: { path: '/products', stack: [Object], methods: [Object] }
  }
};

Accessing a specific middleware in the object stack should be faster by skipping over all the inapplicable middlewares. This approach could go with either a path -> Array of handlers or path -> method type -> Array of handlers. Depending on whether the user has lots of handlers on one path, it may make sense to switch to filtered by method, but that requires some more thought. This is just a general inquiry into optimizing routing in Express. Any thoughts? I looked for some issues with ideas like this but I didn't find anything specific, so I decided to open one here to get some discussion going.

@dougwilson
Copy link
Contributor

You would no longer be able to have the same path exist in multiple positions in the stack, would you? If not, then we would not be able to do this type of conversion. If so, please make a PR against the 5.x branch and post benchmarks (if the purpose is that it's faster).

@dougwilson
Copy link
Contributor

I'm fine with whatever thoughts people have in general :) And I know code+benchmarks speaks very well in general, so PRs (even incomplete ones) are welcome here and can be discussed :)

@ilanbiala
Copy link
Author

@dougwilson do you mean having two handlers for a route like GET /abc? I think having an array-based stack for each path key would still work, just limit the routes that have to be iterated over before getting to a matching one, no? Say you have two handlers matching GET /abc, then that would just be:

'abc': [{
  handler: Function
}, {
  handler: Function
}]

You would just create the stack by doing:

app.get('/abc', handler);
// this would execute something like
if (stack[path]) {
  stack[path].push(new Layer());
} else {
  stack[path] = [new Layer()];
}
// stack = {'abc': [{ handler: Function }] }
app.get('/abc', handler); // stack = {'abc': [{ handler: Function }, { handler: Function }] }

@ilanbiala
Copy link
Author

Maybe I'm missing the nuances, but I tried to look around the source code a little bit to understand Express routing a little bit more and see what can be done to make it faster, if possible.

@dougwilson
Copy link
Contributor

Gotcha. What you are suggesting would break the following, if I'm understanding you:

app.get('/stuff', function (req, res, next) {
  // do work, realize this is not the handler for req
  next()
})
app.use(some_middleware)
app.get('/stuff', function (req, res, next) {
  // fallback GET /stuff route
})

Even then, in general, what you are describing sounds to be like you just want to use app.route, which will group all the methods and it's inner middlewards together into a "mini stack" that can easily be skipped over.

We had tried in the past to make is so if you did app.get(path, fn1) and then app.get(path, fn2), they would group together where fn1 was declared, but that broke so, so many users, so we undid that grouping behavior.

@dougwilson
Copy link
Contributor

Maybe I'm missing the nuances

As for this, this is why I encourage you to start hacking on Express :) We have a very large test suite, so as you are tweaking, you'll know if you're tripping on a nuance or not, based on causing tests to fail. This is the easiest way to get into Express core hacking, i.m.o. It's where even I started :)

@ilanbiala
Copy link
Author

@dougwilson you are right, it would break that case. The issue I see is that if you have a handler for /abc declared at the end of the stack then the router has to iterate all the way to the end before actually letting the handler kick in. What about something like:

var stack = [{
  'abc': {},
  'xyz': {}
  // here would be the handler for a bunch of routes declared before the first app.use
}, {
  handler: Function
  // here would be an app.use handler
}, {
  'abc': {},
  'def': {} 
  // here would be some more routes declared to happen after an app.use
}];

So the stack would be iterated over like this:

// on request
stack.forEach(function(handler) {
  if (handler.path) { // catches one-off app handlers in the order they are added
    // handle URL using this handler
  } else {
    // assume it's an object of route paths
    see if a handler for URL exists, if so, call it
  }
});

@dougwilson
Copy link
Contributor

I really cannot comment on speculation :) Please, please, feel free to make a PR so you can play with Express and determine all the edge cases you need to handle, etc. And if the argument is being made for performance, we would really like to see a benchmark showing this.

Your new stack is pretty much describing the already existing app.route feature, which is highly recommend. You should never be using app.get/app.post unless you need it--always do a single app.route and group everything there.

@dougwilson
Copy link
Contributor

We also need to be sure that routes always run in the correct order. i.e. we need multiple different paths that match the same req.url (because they are regular expressions) to always match in the same order. JavaScript does not guarantee the ordering of keys in an object--remember, people are actually using this router in web browsers.

@dougwilson
Copy link
Contributor

Here is an example, if it helps :)

app.get('/books/:id', function (req, res, next) {
  // do work, realize this is not the handler for req
  next()
})
app.all('/books/*', something)
app.get('/books/:id', function (req, res, next) {
  // fallback getting a book
})

Note that internally, all app.all does is calls app.get, app.post, etc. for all methods. Theoretically a user could have also written it as app.get('/books/*', something). Also, I'm not saying the things are the best way to construct these types of routes, but it's currently allowed, because we basically have the philosophy that anything you want to do is possible. We even let you use your own router implementation, which is what LoopBack is doing and anyone can :) Just create your own function that takes req, res, next, call it a router, attach everything on there and then app.use() it.

@Mickael-van-der-Beek
Copy link

If such a refactoring is a possibility, wouldn't a trie be a better idea in terms of performance ?

RegExp paths could be expressed as state-machines or modelled as an array like now.

@ilanbiala
Copy link
Author

Trie? How could it be a state machine for the routes?

@dougwilson
Copy link
Contributor

There hasn't been much discussion in here for a couple months. Any thoughts on issuing a PR to discuss over? Otherwise, I may need to close this for becoming stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants