Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize URL building #1281

Merged
merged 1 commit into from May 23, 2018
Merged

Optimize URL building #1281

merged 1 commit into from May 23, 2018

Conversation

edk0
Copy link
Contributor

@edk0 edk0 commented Apr 20, 2018

<CounterPillow> if Python was as good as D, url_for would just be a compile time template.

I appreciate how silly this is going to look, but it does make URL building much faster.

$ git checkout -q upstream/master && python3 test.py
0.5681149999982154
$ git checkout -q optimize-building && python3 test.py
0.0826580000029935

(test.py)

I have some ideas for related tests, but don't want to put any more time into this unless it might actually go somewhere. It passes the existing tests on py2 and 3, at least on a clear Thursday night with a waxing crescent moon.

Worth noting up front: this approach is mutually exclusive with anything like #724

@davidism
Copy link
Member

Wow, thanks for looking into this. Building URLs definitely needs to be faster, and this is an interesting approach.

However, I need to be able to maintain this, which means I need to understand what's going on. Would you give at least a basic explanation of what this does?

@ThiefMaster
Copy link
Member

Heh, exactly what I thought - this looks like black magic, which is something I usually like, but from a quick look it wasn't obvious to me what you are doing there. Something with assembling Python bytecode yourself?

@edk0
Copy link
Contributor Author

edk0 commented Apr 20, 2018

Would you give at least a basic explanation of what this does?

I'll give it a go!

@ThiefMaster is right—I'm emitting python bytecode for a function (well, actually two) per Rule that builds its return value in an optimal way, or at least the best way I could see. Apart from the obvious advantages of pre-compiling something, this has the neat (and, from what I've seen, probably quite significant) side effect of destructuring the values dict, which is a pain otherwise.

The bytecode for a_rule from above, /a/<int:x>/<y>, looks like this (though there's also a shorter version used when append_unknown is false). I've omitted some extraneous detail, like memory addresses, from the pastes a bit for brevity:

  1           0 LOAD_CONST               0 ('')
              2 LOAD_CONST               1 ('/a/')
              4 LOAD_CONST               2 (<bound method NumberConverter.to_url>)
              6 LOAD_FAST                0 (x)
              8 CALL_FUNCTION            1
             10 LOAD_CONST               3 ('/')
             12 LOAD_CONST               4 (<bound method BaseConverter.to_url>)
             14 LOAD_FAST                1 (y)
             16 CALL_FUNCTION            1
             18 LOAD_FAST                2 (.keyword_arguments)
             20 JUMP_IF_TRUE_OR_POP     28
             22 LOAD_CONST               0 ('')
             24 DUP_TOP
             26 JUMP_FORWARD            10 (to 38)
        >>   28 LOAD_CONST               5 (functools.partial(url_encode, [lots of stuff]))
             30 ROT_TWO
             32 CALL_FUNCTION            1
             34 LOAD_CONST               6 ('?')
             36 ROT_TWO
        >>   38 BUILD_STRING             6
             40 BUILD_TUPLE              2
             42 RETURN_VALUE

which is an awful lot like the code you'd get by compiling this:

def builder(x, y, **kwargs):
    return ('',
            ''.join(('/a/',
                     self._converters['x'].to_url(x),
                     '/',
                     self._converters['y'].to_url(y),
                     '?' if kwargs else '',
                     url_encode(kwargs, [lots of stuff]) if kwargs else '')))

but with one less jump, and with all the information that's known ahead-of-time loaded into the function as literals. I chose this route rather than, say, generating a Python AST (let's not even talk about generating literal source) and compiling that because it makes certain things easier: for example, Python 3.6 has formatted string literals, and the bytecode compiler can use their BUILD_STRING opcode just as easily as the ''.join technique.

As for altogether saner options like generating a list of things to do and executing them with regular Python, I considered it, then realised that list is basically what Python bytecode already does. With that said, I'm sure a middle ground is possible, and I recognize that sometimes fun has to give way to maintainability :)

@lepture
Copy link
Contributor

lepture commented Apr 21, 2018

This is really fantastic. But I don't agree with " I recognize that sometimes fun has to give way to maintainability". Werkzeug is widely used by many people, I think maintainability is very important.

@edk0
Copy link
Contributor Author

edk0 commented Apr 21, 2018

That's what I meant—maintainability is more important than fun

@agrif
Copy link

agrif commented Apr 21, 2018

Just two cents from Random Internet Passerby: Werkzeug is widely used by many people, who use it specifically so that Werkzeug can solve the heck out of common WSGI problems. It can afford some complexity to solve those problems well. I don't think a bytecode generator is going too far, if there is a practical benefit.

@edk0 edk0 force-pushed the optimize-building branch 3 times, most recently from 44ce9b1 to 261c7c1 Compare April 21, 2018 13:39
@CounterPillow
Copy link

Hi,

I'm the person who originally nerdsniped edk into writing this, and I have some interest in it being merged into upstream.

I'm a contributor to nyaa, a flask application codebase which has a fairly large deployment that currently ranks in the low 700s of Alexa's top sites. As the code is written to be deployable even if on a shoestring budget, any optimisations in hot paths are very much welcome.

url_for is such a hot path. Predictably, the majority of the server time is taken up by database queries, but when profiling a page, url_for is usually near the top as well. Pregenerating url_for calls is of course always a possibility, but making url_for faster is still the easier solution considering what it is.

@davidism
Copy link
Member

@edk0 Why can't #724 be used with this?

@edk0
Copy link
Contributor Author

edk0 commented May 22, 2018

First off: I was assuming that trying to destructure a MultiDict with ** would simply fail, but testing it now I see that a roundtrip through MultiDict and ** wraps everything in []:

>>> def x(**kw): print(kw)
... 
>>> x(**MultiDict({'a': 3}))
{'a': [3]}

which isn't nearly as fatal for my approach.

Anyway, the upshot is that MultiDict is not a problem as long as Rule.build is internal/undocumented, and MapAdapter.build (lossily) converts MultiDict down to dict. If that stops happening, my compiled builder functions will see lists of things where plain things were expected, and either fail to convert:

>>> a_rule.build({'x': 1, 'y': 2})
('', '/a/1/2')
>>> a_rule.build(MultiDict({'x': 1, 'y': 2}))
>>>

or yield silly URLs like /a/%5B1%5D, depending on the failure mode for the particular converter.

So, contrary to what I said in the OP, this wouldn't rule out #724, but it would take a little extra work to deal with it.

@davidism
Copy link
Member

Just to make sure I understand: MultiDict, when passed a dict, will convert single values (as opposed to lists) to lists of length 1. As long as the reverse happens during build, everything will work?

>>> a = MultiDict({'x': 1, 'y': [2, 3]})
>>> a
MultiDict([('a', 1), ('b', 2), ('b', 3)])

# build should do this
>>> to_dict(a)
{'a': 1, 'b': [2, 3]}

That's how the PR works now.

@edk0
Copy link
Contributor Author

edk0 commented May 22, 2018

As long as the reverse happens during build, everything will work?

Yes.

@davidism
Copy link
Member

I'm going to clean up that other PR, merge both locally and run the tests, then merge both here if everything works.

@davidism davidism added this to the 0.15 milestone May 22, 2018
@@ -741,6 +745,9 @@ def _build_regex(rule):
if not self.is_leaf:
self._trace.append((False, '/'))

self._build = self.compile_builder(False)
self._build_u = self.compile_builder(True)

This comment was marked as off-topic.

return quote


fast_url_quote = _make_url_encoder()

This comment was marked as off-topic.

This comment was marked as off-topic.

fn = types.FunctionType(co, {}, None, self.argdefs)
return fn

def compile_builder(self, append_unknown=True):

This comment was marked as off-topic.

This comment was marked as off-topic.


return domain_part, url
return self._build(**values)
except TypeError:

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@edk0 edk0 force-pushed the optimize-building branch 3 times, most recently from c935441 to 7d10b58 Compare May 23, 2018 16:40
@davidism
Copy link
Member

davidism commented May 23, 2018

OK, I think this is in a good place at this point. Would you rebase this to squash all the intermediate work into one commit against current master?

git fetch origin  # or whatever you called this remote
git rebase -i origin/master
git push -f

(You can try it out on a branch first if you want to make sure it will rebase cleanly.)

@edk0 edk0 force-pushed the optimize-building branch 3 times, most recently from f035a7f to f4c8d12 Compare May 23, 2018 17:11
@edk0
Copy link
Contributor Author

edk0 commented May 23, 2018

@davidism alright, rebased :)

@davidism davidism merged commit 6cb8227 into pallets:master May 23, 2018
davidism added a commit that referenced this pull request May 23, 2018
@davidism
Copy link
Member

Just curious, since someone reminded me I forgot to ask this during review: is there a reason not to just eval a formatted string to produce the functions? Are we gaining something by taking on the complexity of bytecode generation?

@edk0
Copy link
Contributor Author

edk0 commented May 24, 2018

I guess the reasons boil down to two general ones: generating source code that does what we want is deceptively hard, and generated bytecode is more optimized than what Python makes. It's difficult to get Python to not be dynamic :)

Specifically:

  • There is no literal syntax for self._converters['x'].to_url (or ''.join, or url_encode). You can get close with an argument with a default, but its name will be unavailable as a real argument name.
  • The bytecode generator can (ab)use the BUILD_STRING opcode to avoid a function call. I'm not sure if it's possible to get this any other way (you could generate an f-string, but then you'd get all the extra stuff f-strings have).
  • We know things Python doesn't: for example, when Python compiles my example code above it loads kwargs three times and tests it twice, though once each is sufficient.

@davidism
Copy link
Member

Just released Werkzeug 0.15 with this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants