Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
weakref to break cyclic references #394
CPython Memory Management
CPython's memory management is a two pronged strategy. The first is reference counting - every time a reference to a Python object is created, a count is incremented on that object.
a = SomeClass() # count = 1 b = a # count = 2 a = None # count = 1 b = None # count = 0
Once the count is zero the object is unreachable, and it is immediately collected. This is just bookkeeping - there is no real computation involved in this memory management scheme.
However, reference counting has an Achilles heel - cyclic references.
class A(object): def __init__(self): self.b = B(self) class B(object): def __init__(self, a): self.a = a a = A() # the A instance has *two* references, one I j # just made, and the one that the B instance holds. a = None # B instance still holds a reference to A instance and vica versa!
This would be a memory leak if we didn't have some means to go around it.
This is where the second technique comes into play: garbage collection. Here we walk the reference graph of all the objects in the interpreter, looking for isolated parts of the graph. These must be cyclic references, and they can be removed. Garbage collection is much more computationally demanding than reference counting though, so it is run much less frequently.
Menpo's memory usage
I was noticing that some code in Menpo that should have had low memory usage was actually taking a lot of RAM up. Pseudocode:
for m in import_meshes('./'): corr_m = correspond(m) save_mesh(corr_m, './foo')
What I was seeing the Python garbage collector doing it's job.
While we have gotten this far without worrying about memory, it would be nice if we didn't have to put up with these memory patterns. The question of course is - why is the garbage collector even having to fire up in the first place? The answer as we know must be cyclic references (you can even turn the garbage collector off if you are confident you don't have any cyclic references in your code!).
There are two places in the code with cyclic dependencies:
This prevents the efficient reference counting from working whenever we delete anything with landmarks (which is pretty much everything we care about in Menpo). Whilst there are good reasons why we might want to change this behaviour anyway (we've already discussed how features should just be functions, landmarks not holding there targets would simplify things + enable more flexible viewing) there is a simple immediate solution.
The solution: weakref
Weakref is in the standard library, and it's sole purpose in life is to combat this problem. Weak references don't count towards reference counting. It's basically a way of saying: I hold a reference to something, but don't let me hold you back from cleaning it up if I'm all that's left.
This PR makes landmarks have weak references to what they landmark, and likewise makes the features proxy object do the same for the images they are attached to. This should massively improve memory usage in Menpo.
The ugly: subtle bugs galore
This PR is a significant improvement to Menpo, but it does expose places where we have potentially had long standing bugs that we just haven't noticed till now. As an example,
I'll turn this PR into a more fleshed out blog post when I get a mo!
Yeah I think we should be pretty aggressive in getting this in and checking for any bugs it surfaces. I've moved from
Everyone happy to push forward with this then? @nontas?