garbage collection problem (revisited) #141

Closed
fperez opened this Issue Aug 12, 2010 · 23 comments

Comments

Projects
None yet
2 participants
Owner

fperez commented Aug 12, 2010

This bug had originally been reported in Launchpad by Kilian as:

https://bugs.launchpad.net/ipython/+bug/269966

and we closed it with a test. Unfortunately, when I wrote the test case and closed the bug, my test case was slightly different than Kilian's actual report. And while now the test case we have does run correctly, it turns out that Kilian's original problem appears never to have been fixed.

So we should still try to find how some of these references are being kept by %run.

What follows is Kilian's original report for completeness here.
If a variable is created within a script that was executed using the %run command, there is a reference to it created somewhere
that prevents it from being garbage collected after deleting of the variable.

I am not sure if this is really a bug or just a misunderstanding on my side. However, it would be good if it were possible to delete
a variable created inside a script without deleting the whole name space using %reset.

This example is from an email thread in ipython-user: http://lists.ipython.scipy.org/pipermail/ipython-user/2008-July/005599.html

example:
create the following script named test_destructor.py and execute it using the ipython %run command:

kilian@chebang:~$ cat test_destructor.py
class C(object):
    def __del__(self):
        print 'deleting object...'

c = C()

kilian@chebang:~$ python test_destructor.py
deleting object...

now, let's try in ipython:

In [1]: run test_destructor.py

In [2]: del c

In [3]: import gc

In [4]: gc.collect()
Out[4]: 47

(object still not deleted)

In [5]: %reset
Once deleted, variables cannot be recovered. Proceed (y/[n])? y
deleting object...

Finally!

Owner

fperez commented Aug 12, 2010

I should add that the last behavior with %reset, which did work when Kilian reported the problem (2008-09-13), now is also broken! I just tested by doing a checkout of IPython on that date, and indeed it works as Kilian said:

git checkout `git rev-list -n 1 --before="2008-09-13" master`
python setup.py install --prefix=~/tmp/junk/

amirbar[scratch]> ip
/home/fperez/tmp/junk/lib/python2.6/site-packages/IPython/Magic.py:38: DeprecationWarning: the sets module is deprecated
  from sets import Set
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
Type "copyright", "credits" or "license" for more information.

IPython 0.9.rc1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object'. ?object also works, ?? prints more.

In [1]: run scratch.py

In [2]: del zz

In [3]: reset
Once deleted, variables cannot be recovered. Proceed (y/[n])?  y
deleting object...

But with current IPython, even %reset does not get the del method called!

amirbar[scratch]> ip
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
Type "copyright", "credits" or "license" for more information.

IPython 0.11.alpha1.git -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object'. ?object also works, ?? prints more.

In [1]: run scratch.py

In [2]: del zz

In [3]: reset
Once deleted, variables cannot be recovered. Proceed (y/[n])?  y

In [4]: 

Nothing!

So not only did we not include a fix for Kilian's actual problem with an identical test (instead using another related test that might be a good idea also, but that tests something else), but we've managed to lose the ability to clear these references even with %reset.

Owner

takluyver commented Mar 22, 2011

Replicated in trunk (0.11 dev).

Owner

takluyver commented Mar 22, 2011

OK, it looks like, among other possible places, there's a reference to it in ip._user_main_module which isn't getting deleted.

Owner

fperez commented Apr 8, 2011

@takluyver, is this fixed by your recent %reset work? If so, let's close this guy.

Owner

takluyver commented Apr 8, 2011

Unfortunately not. The reference in ip._user_main_module is left behind even after a %reset. I'll have a look, although I remember getting confused about the way namespaces are handled for %run before.

takluyver was assigned Apr 13, 2011

Owner

takluyver commented Apr 13, 2011

OK, I can get %reset to delete the reference, but that causes a test failure in core.tests.test_run.test_tclass. That likewise has a destructor method, but that method refers to sys. By the time the garbage collector triggers the destructor, sys in that namespace is gone, so it fails with a NameError.

So, what's more important? I seem to remember you've said previously that destructor methods shouldn't rely on anything other than self and builtins, so should I change tclass.py and go ahead?

Owner

fperez commented Apr 13, 2011

Hey,

On Wed, Apr 13, 2011 at 3:59 PM, takluyver
reply@reply.github.com
wrote:

OK, I can get %reset to delete the reference, but that causes a test failure in core.tests.test_run.test_tclass. That likewise has a destructor method, but that method refers to sys. By the time the garbage collector triggers the destructor, sys in that namespace is gone, so it fails with a NameError.

So, what's more important? I seem to remember you've said previously that destructor methods shouldn't rely on anything other than self and builtins, so should I change tclass.py and go ahead?

If you can get the test to pass correctly without the sys reference,
that's fine. I seem to remember I needed that so I could reliably
capture the print statement from another process, but there may be a
better way.

f

Owner

takluyver commented Apr 13, 2011

I could attach a reference to the object: self.flush_stdout = sys.stdout.flush. Ugly, but it works*.

*The test still fails because it now sees the second c being deleted as IPython exits, but I assume I should change the test to accommodate that.

Owner

fperez commented Apr 13, 2011

Hey,

On Wed, Apr 13, 2011 at 4:30 PM, takluyver
reply@reply.github.com
wrote:

I could attach a reference to the object: self.flush_stdout = sys.stdout.flush. Ugly, but it works*.

*The test still fails because it now sees the second c being deleted as IPython exits, but I assume I should change the test to accommodate that.

Just make sure we still test what the original intent was. This may
be best run interactively a few times just to make sure, that's what
the test was trying to capture in an automated fashion. The point is
to ensure that prior instances of objects do get properly released but
the current ones are kept. Failing in either direction means either
memory leaks or not being able to reach things you just defined.

f

Owner

takluyver commented Apr 14, 2011

I believe it's still passing both of those things. The instance created in the first run isn't deleted until after the second run (because there's a copy in ip._main_ns_cache. The only difference is that, as IPython exits, its .reset() method deletes the instance from the second run, so you see the output from that as well.

Owner

fperez commented Apr 14, 2011

But is the first run's instance automatically deleted when the second run finishes, or do we have to wait for a reset to happen (either manually via %reset or on exit)? The original scenario was that Kilian was running a script repeatedly, and large objects in memory were effectively killing his session after a while...

Owner

takluyver commented Apr 14, 2011

Testing manually, it behaves as expected (C-first is deleted immediately after the second run), but I take the point that the automatic check doesn't check this properly. I'm adding a third run to make the test more rigorous.

Owner

fperez commented Apr 14, 2011

On Thu, Apr 14, 2011 at 3:24 PM, takluyver
reply@reply.github.com
wrote:

Testing manually, it behaves as expected (C-first is deleted immediately after the second run), but I take the point that the automatic check doesn't check this properly. I'm adding a third run to make the test more rigorous.

Great, thanks (I also did test it manually just to be sure).

Owner

takluyver commented Apr 14, 2011

OK, so with those changes now merged, we're back where we were when this bug was filed. After %running a script, its namespace is stored in ip._user_main_module and ip._main_ns_cache. The references will be gone if either: you %run the same script again, you %reset (without the -s option), or you exit IPython.

Owner

fperez commented Apr 14, 2011

Idea: how about defining a %del magic that would do extra hunting of references to a name? Most users don't need this, but for those who really need to nuke one variable without resetting their whole namepace, this would do the job.

Owner

takluyver commented Apr 14, 2011

Not a bad idea. Is there a better short name, though, because for automagics, del gets tricky - it's not in any namespace, because it's a statement. So it could be translated to a magic call, when the user just expected a normal del.

Also, should this be combined with %reset_selective?

Owner

fperez commented Apr 14, 2011

On Thu, Apr 14, 2011 at 4:50 PM, takluyver
reply@reply.github.com
wrote:

Not a bad idea. Is there a better short name, though, because for automagics, del gets tricky - it's not in any namespace, because it's a statement. So it could be translated to a magic call, when the user just expected a normal del.

But actually that's even better: it will do del in the normal
namespace and go digging deeper. So normal users with automagic
will get something that 'just works', and without automagic there will
be no suprises. No?

Also, should this be combined with %reset_selective?

Sounds like a good idea.

Owner

takluyver commented Apr 15, 2011

Hmmm. I'm not convinced that we should override plain Python. Although in this case, it's somewhat ambiguous, because our default behaviour is different to the Python interpreter, in that we keep hidden references to user-defined objects. Maybe we should call it something like xdel (extra delete).

It's also not simple to do it reliably: if I enter c as an expression, the object it refers to gets cached in output history, but not under the name c. So if I then remove c from every namespace, it still exists in history.

Owner

fperez commented Apr 15, 2011

On Fri, Apr 15, 2011 at 2:48 AM, takluyver
reply@reply.github.com
wrote:

Hmmm. I'm not convinced that we should override plain Python. Although in this case, it's somewhat ambiguous, because our default behaviour is different to the Python interpreter, in that we keep hidden references to user-defined objects. Maybe we should call it something like xdel (extra delete).

Because of how we add referecnes, is why it seems to me that it's OK
to override in this case: our %del would actually reproduce the intent
of the user more faithfully, I think.

It's also not simple to do it reliably: if I enter c as an expression, the object it refers to gets cached in output history, but not under the name c. So if I then remove c from every namespace, it still exists in history.

But that's true of del too: it only removes the named reference, it
doesn't hunt for others. Granted, plain python has no output cache,
but there could be other manually named references:

x = bigthing()
y = x
del x # y keeps a reference.

With this in mind, I'd propose:

  • %del x: does 'del x' in user_ns plus searches our known internal
    references where x is named in other namespaces. Cheap and easy to
    implement.
  • '%xdel x': could be a more aggressive function that in addition to
    '%del x', would also search all output history checking for other
    things that point to the same object and deleting those as well. This
    version could even be more aggressive and search the rest of user_ns
    for aliases to x and delete those as well, and at the end of the run
    would print a warning if gc still indicated references to x exist, so
    at least the user would know the object wasn't fully deleted (there
    could be a reference to it in some nested data structure like a list
    or a dict that would be impossible to detect).

How does that sound?

Owner

takluyver commented Apr 15, 2011

We should definitely give the user some way to delete the various references
we hold behind the scenes, but I'm still not convinced by overriding del.
Apart from anything else, this will mean that del x has a different effect
depending on whether it's a single line cell, or part of a multi-line cell.
I think this goes against the principle of least surprise.

Owner

fperez commented Apr 15, 2011

On Fri, Apr 15, 2011 at 2:41 PM, takluyver
reply@reply.github.com
wrote:

We should definitely give the user some way to delete the various references
we hold behind the scenes, but I'm still not convinced by overriding del.
Apart from anything else, this will mean that del x has a different effect
depending on whether it's a single line cell, or part of a multi-line cell.
I think this goes against the principle of least surprise.

That's a good point, hadn't thought of the multiline case...

OK, then separate functions :)

Owner

takluyver commented May 11, 2011

I've added the magic xdel in PR #419.

Owner

takluyver commented May 29, 2011

And with xdel merged in (commit 9979966), this is as resolved as it's likely to get, at least for now, so I'm closing it.

takluyver closed this May 29, 2011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment