Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak in Rule function builder #1521

Closed
wants to merge 1 commit into from

Conversation

@asottile
Copy link
Contributor

asottile commented Apr 24, 2019

Resolves #1520

This approach is slightly slower than the existing approach but fixes the memory leak.

The core problem of the memory leak is as follows, I'll be using the disassembler to explain this, hopefully this explanation is somewhat accessible ;)

Here's an example disassembly from the original functions before I changed them:

The rule I'm using throughout is Rule('/a/<string:b>')

dis.dis(rule._build)
  1           0 LOAD_CONST               0 ('')
              2 LOAD_CONST               1 ('/a/')
              4 LOAD_CONST               2 (<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f48f0359320>>)
              6 LOAD_FAST                0 (b)
              8 CALL_FUNCTION            1
             10 BUILD_STRING             2
             12 BUILD_TUPLE              2
             14 RETURN_VALUE
(Pdb) dis.dis(rule._build_unknown)
  1           0 LOAD_CONST               0 ('')
              2 LOAD_CONST               1 ('/a/')
              4 LOAD_CONST               2 (<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f48f0359320>>)
              6 LOAD_FAST                0 (b)
              8 CALL_FUNCTION            1
             10 LOAD_FAST                1 (.keyword_arguments)
             12 JUMP_IF_TRUE_OR_POP     20
             14 LOAD_CONST               0 ('')
             16 DUP_TOP
             18 JUMP_FORWARD            10 (to 30)
        >>   20 LOAD_CONST               3 (functools.partial(<function url_encode at 0x7f48f0618730>, charset='utf-8', sort=False, key=None))
             22 ROT_TWO
             24 CALL_FUNCTION            1
             26 LOAD_CONST               4 ('?')
             28 ROT_TWO
        >>   30 BUILD_STRING             4
             32 BUILD_TUPLE              2
             34 RETURN_VALUE

This ~roughly equates to the following functions (I'm using python3.6, the output is slightly different before that due to f-strings but it's not important for this discussion)

def f(b):  # _build
    return ('', f'{"/a/"}{to_url(b)}')   # to_url is loaded from `co_consts`

def f(b, **kwargs):  # _build_unknown
    if kwargs:
        q, params = '?', url_encode(kwargs)  # url_encode is loaded from `co_consts`
    else:
        q, params = '', ''

    return ('', f'{"/a/"}{to_url(b)}{q}{params}')  # to_url is loaded from `co_consts`

(in fact, here's the disassembly of those function(s) -- you can mostly ignore the FORMAT_VALUE opcodes, those are used to prepare f-strings and are noops for strings (which is what we're dealing with!)):

>>> dis.dis(f)  # _build
  9           0 LOAD_CONST               1 ('')
              2 LOAD_CONST               2 ('/a/')
              4 FORMAT_VALUE             0
              6 LOAD_GLOBAL              0 (to_url)
              8 LOAD_FAST                0 (b)
             10 CALL_FUNCTION            1
             12 FORMAT_VALUE             0
             14 BUILD_STRING             2
             16 BUILD_TUPLE              2
             18 RETURN_VALUE
>>> dis.dis(f)  # _build_unknown
 26           0 LOAD_FAST                1 (kwargs)
              2 POP_JUMP_IF_FALSE       20

 27           4 LOAD_CONST               1 ('?')
              6 LOAD_GLOBAL              0 (url_encode)
              8 LOAD_FAST                1 (kwargs)
             10 CALL_FUNCTION            1
             12 ROT_TWO
             14 STORE_FAST               2 (q)
             16 STORE_FAST               3 (params)
             18 JUMP_FORWARD             8 (to 28)

 29     >>   20 LOAD_CONST               4 (('', ''))
             22 UNPACK_SEQUENCE          2
             24 STORE_FAST               2 (q)
             26 STORE_FAST               3 (params)

 31     >>   28 LOAD_CONST               2 ('')
             30 LOAD_CONST               3 ('/a/')
             32 FORMAT_VALUE             0
             34 LOAD_GLOBAL              1 (to_url)
             36 LOAD_FAST                0 (b)
             38 CALL_FUNCTION            1
             40 FORMAT_VALUE             0
             42 LOAD_FAST                2 (q)
             44 FORMAT_VALUE             0
             46 LOAD_FAST                3 (params)
             48 FORMAT_VALUE             0
             50 BUILD_STRING             4
             52 BUILD_TUPLE              2
             54 RETURN_VALUE

Anyway, the issue with the original compiled function are these opcodes:

              4 LOAD_CONST               2 (<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f48f0359320>>)
        >>   20 LOAD_CONST               3 (functools.partial(<function url_encode at 0x7f48f0618730>, charset='utf-8', sort=False, key=None))

In particular, this opcode causes co_consts to contain these values:

(Pdb) rule._build.__code__.co_consts
('', '/a/', <bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f48f0359320>>)

Usually, co_consts should only contain constants! In particular, when the garbage collection machinery of cpython considers function objects, it doesn't consider the constants as potential trash because normally they're just constants (I don't have a citation for this, it's mostly based on what I observed while poking around with gc.get_referrers as can be seen here: #1520 (comment) -- notably you can see the tuple of the co_consts but not that a code object refers to it)

Because of this, it cannot detect the cycle that's introduced:

Map -> Rule -> code object -> co_consts tuple -> to_url method -> UnicodeConverter -> Map

And because it doesn't know about that, it can't collect the cycle during cyclical gc

The fix to this is to take the non-constant constants out of co_consts and instead refer to the objects directly.

My new versions of these functions adds a self argument to the generated methods, and attaches them as bound methods to the Rule class on creation. The converters are then looked up on the class as they're called. Here's the replacement disassembly (and my interpretation of what that compiles to):

>>> dis.dis(rule._build)
  1           0 LOAD_CONST               0 ('')
              2 LOAD_CONST               1 ('/a/')
              4 LOAD_FAST                0 (self)
              6 LOAD_ATTR                0 (_converters)
              8 LOAD_CONST               2 ('b')
             10 BINARY_SUBSCR
             12 LOAD_ATTR                1 (to_url)
             14 LOAD_FAST                1 (b)
             16 CALL_FUNCTION            1
             18 BUILD_STRING             2
             20 BUILD_TUPLE              2
             22 RETURN_VALUE
>>> dis.dis(rule._build_unknown)
  1           0 LOAD_CONST               0 ('')
              2 LOAD_CONST               1 ('/a/')
              4 LOAD_FAST                0 (self)
              6 LOAD_ATTR                0 (_converters)
              8 LOAD_CONST               2 ('b')
             10 BINARY_SUBSCR
             12 LOAD_ATTR                1 (to_url)
             14 LOAD_FAST                1 (b)
             16 CALL_FUNCTION            1
             18 LOAD_FAST                2 (.keyword_arguments)
             20 JUMP_IF_TRUE_OR_POP     28
             22 LOAD_CONST               0 ('')
             24 DUP_TOP
             26 JUMP_FORWARD            12 (to 40)
        >>   28 LOAD_FAST                0 (self)
             30 LOAD_ATTR                2 (_encode_query_vars)
             32 ROT_TWO
             34 CALL_FUNCTION            1
             36 LOAD_CONST               3 ('?')
             38 ROT_TWO
        >>   40 BUILD_STRING             4
             42 BUILD_TUPLE              2
             44 RETURN_VALUE
def f(self, b):  # _build
    return ('', f'{"/a/"}{self._converters["b"](b)}')

def f(b, **kwargs):  # _build_unknown
    if kwargs:
        q, params = '?', self._encode_query_vars(kwargs)
    else:
        q, params = '', ''

    return ('', f'{"/a/"}{self._converters["b"](b)}{q}{params}')

So we basically replaced the illegal co_consts lookup with lookups on self


Running the perf benchmark from the original PR, this seems to make these ever-so-slightly slower. (I used best-of-5 here)

$ git checkout fix_memory_leak
...
$ python test.py
0.11118158600038441
$ git checkout master
...
$ python test.py
0.10680924100051925

~4% slower (might as well be error noise, right?)

@asottile

This comment has been minimized.

Copy link
Contributor Author

asottile commented Apr 24, 2019

and if you're curious about the process and how I got there, I recorded the stream I did this on: https://www.youtube.com/watch?v=g2fxGOzzdAI

@edk0

This comment has been minimized.

Copy link
Member

edk0 commented Apr 24, 2019

Nice catch.

Slight ramble warning: I think I'd slightly prefer to break the cycle in a different way—the original reason I did this was that URL building did lots of slightly slow things, any of which on its own you probably wouldn't have called slow, so reintroducing one of them feels weird. Though it'd take 49 more issues and 4% slowdowns for us to be back where we started; maybe that's okay.

Concretely, this breaks if a URL parameter is named self.

@asottile

This comment has been minimized.

Copy link
Contributor Author

asottile commented Apr 24, 2019

to be clear, this doesn't break the cycle, it just makes it garbage collectable

@asottile

This comment has been minimized.

Copy link
Contributor Author

asottile commented Apr 24, 2019

Fixed self as well, good catch

@edk0

This comment has been minimized.

Copy link
Member

edk0 commented Apr 24, 2019

By the way, we use co_consts for defaults too, which don't need to be constant either. Personally I don't imagine there are going to be many more bugs like this, so I don't feel any need to change that, but if you're in the mood... :)


# ensure that the garbage collection has had a chance to collect cyclic
# objects
for _ in range(5):

This comment has been minimized.

Copy link
@davidszotten

davidszotten Apr 24, 2019

Contributor

gc.collect() returns the number of objects collected so i think this can be replaced by

while gc.collect():
    pass

This comment has been minimized.

Copy link
@davidism

davidism Apr 24, 2019

Member

That would cause the test to never stop instead of failing though. 5 seems like a safe bet.

This comment has been minimized.

Copy link
@asottile

asottile Apr 24, 2019

Author Contributor

gc.collect() returns the number of objects collected so i think this can be replaced by

while gc.collect():
    pass

Not necessarily, the first generation could collect zero objects (fast gc) and then later generations could collect cyclic gc

@asottile

This comment has been minimized.

Copy link
Contributor Author

asottile commented Apr 24, 2019

By the way, we use co_consts for defaults too, which don't need to be constant either. Personally I don't imagine there are going to be many more bugs like this, so I don't feel any need to change that, but if you're in the mood... :)

aren't those all strings?

@edk0

This comment has been minimized.

Copy link
Member

edk0 commented Apr 24, 2019

I guess we pre-bake them, so yes. Oops :)

@asottile

This comment has been minimized.

Copy link
Contributor Author

asottile commented Apr 24, 2019

I guess we pre-bake them, so yes. Oops :)

cool cool, I had a hard time figuring out the defaults bits

@asottile

This comment has been minimized.

Copy link
Contributor Author

asottile commented Apr 25, 2019

because I wanted to see if I could replace the opcode building with ast I went ahead and hacked that together: #1524

it produces ~nearly identical code and from my performance testing it matches the same perf as this PR

my hope was that it's a little easier to follow / maintain

@davidism

This comment has been minimized.

Copy link
Member

davidism commented Apr 26, 2019

closing in favor of #1524

@davidism davidism closed this Apr 26, 2019
@asottile asottile deleted the asottile:fix_memory_leak branch Apr 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.