Optimize URL building#1281
Conversation
|
Wow, thanks for looking into this. Building URLs definitely needs to be faster, and this is an interesting approach. However, I need to be able to maintain this, which means I need to understand what's going on. Would you give at least a basic explanation of what this does? |
|
Heh, exactly what I thought - this looks like black magic, which is something I usually like, but from a quick look it wasn't obvious to me what you are doing there. Something with assembling Python bytecode yourself? |
I'll give it a go! @ThiefMaster is right—I'm emitting python bytecode for a function (well, actually two) per The bytecode for which is an awful lot like the code you'd get by compiling this: def builder(x, y, **kwargs):
return ('',
''.join(('/a/',
self._converters['x'].to_url(x),
'/',
self._converters['y'].to_url(y),
'?' if kwargs else '',
url_encode(kwargs, [lots of stuff]) if kwargs else '')))but with one less jump, and with all the information that's known ahead-of-time loaded into the function as literals. I chose this route rather than, say, generating a Python AST (let's not even talk about generating literal source) and compiling that because it makes certain things easier: for example, Python 3.6 has formatted string literals, and the bytecode compiler can use their As for altogether saner options like generating a list of things to do and executing them with regular Python, I considered it, then realised that list is basically what Python bytecode already does. With that said, I'm sure a middle ground is possible, and I recognize that sometimes fun has to give way to maintainability :) |
|
This is really fantastic. But I don't agree with " I recognize that sometimes fun has to give way to maintainability". Werkzeug is widely used by many people, I think maintainability is very important. |
|
That's what I meant—maintainability is more important than fun |
|
Just two cents from Random Internet Passerby: Werkzeug is widely used by many people, who use it specifically so that Werkzeug can solve the heck out of common WSGI problems. It can afford some complexity to solve those problems well. I don't think a bytecode generator is going too far, if there is a practical benefit. |
44ce9b1 to
261c7c1
Compare
|
Hi, I'm the person who originally nerdsniped edk into writing this, and I have some interest in it being merged into upstream. I'm a contributor to nyaa, a flask application codebase which has a fairly large deployment that currently ranks in the low 700s of Alexa's top sites. As the code is written to be deployable even if on a shoestring budget, any optimisations in hot paths are very much welcome.
|
|
First off: I was assuming that trying to destructure a >>> def x(**kw): print(kw)
...
>>> x(**MultiDict({'a': 3}))
{'a': [3]}which isn't nearly as fatal for my approach. Anyway, the upshot is that >>> a_rule.build({'x': 1, 'y': 2})
('', '/a/1/2')
>>> a_rule.build(MultiDict({'x': 1, 'y': 2}))
>>>or yield silly URLs like So, contrary to what I said in the OP, this wouldn't rule out #724, but it would take a little extra work to deal with it. |
|
Just to make sure I understand: >>> a = MultiDict({'x': 1, 'y': [2, 3]})
>>> a
MultiDict([('a', 1), ('b', 2), ('b', 3)])
# build should do this
>>> to_dict(a)
{'a': 1, 'b': [2, 3]}That's how the PR works now. |
Yes. |
|
I'm going to clean up that other PR, merge both locally and run the tests, then merge both here if everything works. |
| self._trace.append((False, '/')) | ||
|
|
||
| self._build = self.compile_builder(False) | ||
| self._build_u = self.compile_builder(True) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| return quote | ||
|
|
||
|
|
||
| fast_url_quote = _make_url_encoder() |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| fn = types.FunctionType(co, {}, None, self.argdefs) | ||
| return fn | ||
|
|
||
| def compile_builder(self, append_unknown=True): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
|
||
| return domain_part, url | ||
| return self._build(**values) | ||
| except TypeError: |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
c935441 to
7d10b58
Compare
|
OK, I think this is in a good place at this point. Would you rebase this to squash all the intermediate work into one commit against current master? (You can try it out on a branch first if you want to make sure it will rebase cleanly.) |
f035a7f to
f4c8d12
Compare
|
@davidism alright, rebased :) |
|
Just curious, since someone reminded me I forgot to ask this during review: is there a reason not to just eval a formatted string to produce the functions? Are we gaining something by taking on the complexity of bytecode generation? |
|
I guess the reasons boil down to two general ones: generating source code that does what we want is deceptively hard, and generated bytecode is more optimized than what Python makes. It's difficult to get Python to not be dynamic :) Specifically:
|
|
Just released Werkzeug 0.15 with this. |
<CounterPillow> if Python was as good as D, url_for would just be a compile time template.I appreciate how silly this is going to look, but it does make URL building much faster.
(test.py)
I have some ideas for related tests, but don't want to put any more time into this unless it might actually go somewhere. It passes the existing tests on py2 and 3, at least on a clear Thursday night with a waxing crescent moon.
Worth noting up front: this approach is mutually exclusive with anything like #724