-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak in Rule function builder #1521
Conversation
and if you're curious about the process and how I got there, I recorded the stream I did this on: https://www.youtube.com/watch?v=g2fxGOzzdAI |
Nice catch. Slight ramble warning: I think I'd slightly prefer to break the cycle in a different way—the original reason I did this was that URL building did lots of slightly slow things, any of which on its own you probably wouldn't have called slow, so reintroducing one of them feels weird. Though it'd take 49 more issues and 4% slowdowns for us to be back where we started; maybe that's okay. Concretely, this breaks if a URL parameter is named |
to be clear, this doesn't break the cycle, it just makes it garbage collectable |
Fixed |
By the way, we use |
|
||
# ensure that the garbage collection has had a chance to collect cyclic | ||
# objects | ||
for _ in range(5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gc.collect()
returns the number of objects collected so i think this can be replaced by
while gc.collect():
pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would cause the test to never stop instead of failing though. 5 seems like a safe bet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gc.collect()
returns the number of objects collected so i think this can be replaced bywhile gc.collect(): pass
Not necessarily, the first generation could collect zero objects (fast gc) and then later generations could collect cyclic gc
aren't those all strings? |
I guess we pre-bake them, so yes. Oops :) |
cool cool, I had a hard time figuring out the defaults bits |
because I wanted to see if I could replace the opcode building with ast I went ahead and hacked that together: #1524 it produces ~nearly identical code and from my performance testing it matches the same perf as this PR my hope was that it's a little easier to follow / maintain |
closing in favor of #1524 |
Resolves #1520
This approach is slightly slower than the existing approach but fixes the memory leak.
The core problem of the memory leak is as follows, I'll be using the disassembler to explain this, hopefully this explanation is somewhat accessible ;)
Here's an example disassembly from the original functions before I changed them:
The rule I'm using throughout is
Rule('/a/<string:b>')
This ~roughly equates to the following functions (I'm using python3.6, the output is slightly different before that due to f-strings but it's not important for this discussion)
(in fact, here's the disassembly of those function(s) -- you can mostly ignore the
FORMAT_VALUE
opcodes, those are used to prepare f-strings and are noops for strings (which is what we're dealing with!)):>>> dis.dis(f) # _build 9 0 LOAD_CONST 1 ('') 2 LOAD_CONST 2 ('/a/') 4 FORMAT_VALUE 0 6 LOAD_GLOBAL 0 (to_url) 8 LOAD_FAST 0 (b) 10 CALL_FUNCTION 1 12 FORMAT_VALUE 0 14 BUILD_STRING 2 16 BUILD_TUPLE 2 18 RETURN_VALUE
>>> dis.dis(f) # _build_unknown 26 0 LOAD_FAST 1 (kwargs) 2 POP_JUMP_IF_FALSE 20 27 4 LOAD_CONST 1 ('?') 6 LOAD_GLOBAL 0 (url_encode) 8 LOAD_FAST 1 (kwargs) 10 CALL_FUNCTION 1 12 ROT_TWO 14 STORE_FAST 2 (q) 16 STORE_FAST 3 (params) 18 JUMP_FORWARD 8 (to 28) 29 >> 20 LOAD_CONST 4 (('', '')) 22 UNPACK_SEQUENCE 2 24 STORE_FAST 2 (q) 26 STORE_FAST 3 (params) 31 >> 28 LOAD_CONST 2 ('') 30 LOAD_CONST 3 ('/a/') 32 FORMAT_VALUE 0 34 LOAD_GLOBAL 1 (to_url) 36 LOAD_FAST 0 (b) 38 CALL_FUNCTION 1 40 FORMAT_VALUE 0 42 LOAD_FAST 2 (q) 44 FORMAT_VALUE 0 46 LOAD_FAST 3 (params) 48 FORMAT_VALUE 0 50 BUILD_STRING 4 52 BUILD_TUPLE 2 54 RETURN_VALUE
Anyway, the issue with the original compiled function are these opcodes:
In particular, this opcode causes
co_consts
to contain these values:Usually,
co_consts
should only contain constants! In particular, when the garbage collection machinery of cpython considers function objects, it doesn't consider the constants as potential trash because normally they're just constants (I don't have a citation for this, it's mostly based on what I observed while poking around withgc.get_referrers
as can be seen here: #1520 (comment) -- notably you can see the tuple of theco_consts
but not that a code object refers to it)Because of this, it cannot detect the cycle that's introduced:
Map -> Rule -> code object -> co_consts tuple -> to_url method -> UnicodeConverter -> Map
And because it doesn't know about that, it can't collect the cycle during cyclical gc
The fix to this is to take the non-constant constants out of
co_consts
and instead refer to the objects directly.My new versions of these functions adds a
self
argument to the generated methods, and attaches them as bound methods to theRule
class on creation. The converters are then looked up on the class as they're called. Here's the replacement disassembly (and my interpretation of what that compiles to):So we basically replaced the illegal
co_consts
lookup with lookups onself
Running the perf benchmark from the original PR, this seems to make these ever-so-slightly slower. (I used best-of-5 here)
~4% slower (might as well be error noise, right?)