Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

itertools.groupby() leaks memory with circular reference #46499

Closed
ashemedai mannequin opened this issue Mar 6, 2008 · 9 comments
Closed

itertools.groupby() leaks memory with circular reference #46499

ashemedai mannequin opened this issue Mar 6, 2008 · 9 comments
Assignees
Labels
performance Performance or resource usage

Comments

@ashemedai
Copy link
Mannequin

ashemedai mannequin commented Mar 6, 2008

BPO 2246
Nosy @loewis, @rhettinger, @abalkin, @ashemedai, @mitsuhiko
Files
  • testcase.py: Testcase code
  • groupby-leak.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/rhettinger'
    closed_at = <Date 2008-10-20.21:43:09.097>
    created_at = <Date 2008-03-06.19:39:28.834>
    labels = ['performance']
    title = 'itertools.groupby() leaks memory with circular reference'
    updated_at = <Date 2008-10-20.21:43:09.085>
    user = 'https://github.com/ashemedai'

    bugs.python.org fields:

    activity = <Date 2008-10-20.21:43:09.085>
    actor = 'loewis'
    assignee = 'rhettinger'
    closed = True
    closed_date = <Date 2008-10-20.21:43:09.097>
    closer = 'loewis'
    components = []
    creation = <Date 2008-03-06.19:39:28.834>
    creator = 'asmodai'
    dependencies = []
    files = ['9624', '9625']
    hgrepos = []
    issue_num = 2246
    keywords = ['patch']
    message_count = 9.0
    messages = ['63332', '63335', '63336', '63337', '63338', '63339', '63340', '75009', '75011']
    nosy_count = 6.0
    nosy_names = ['loewis', 'rhettinger', 'belopolsky', '_doublep', 'asmodai', 'aronacher']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = 'resource usage'
    url = 'https://bugs.python.org/issue2246'
    versions = ['Python 2.5.3']

    @ashemedai
    Copy link
    Mannequin Author

    ashemedai mannequin commented Mar 6, 2008

    Quoting from my email to Raymond:

    In the Trac/Genshi community we've been tracking a bit obscure memory
    leak that causes us a lot of problems.

    Please see http://trac.edgewall.org/ticket/6614 and then
    http://genshi.edgewall.org/ticket/190 for background.

    We reduced the case to the following Python only code and believe it is
    a bug within itertool's groupby. As Armin Ronacher explains in Genshi
    ticket 190:

    "Looks like genshi is not to blame. itertools.groupby has a grouper
    with a reference to the groupby type but no traverse func. As soon as a
    circular reference ends up in the groupby (which happens thanks to the
    func_globals in our lambda) genshi leaks."

    This can be demonstrated with the following code (testcase attachment
    present with this issue):

    import gc
    from itertools import groupby
    
    def run():
        keyfunc = lambda x: x
        for i, j in groupby(range(100), key=keyfunc):
            keyfunc.x = j
    
    for x in xrange(20):
        gc.collect()
        run()
        print len(gc.get_objects())

    On executing this in will show numerical output of the garbage
    collector, but every iteration will be +4 from the previous, as Armin
    specifies:

    "a frame, a grouper, a keyfunc and a groupby object"

    We have been unable to come up with a decent patch and thus I am
    logging this issue now.

    @ashemedai ashemedai mannequin added the performance Performance or resource usage label Mar 6, 2008
    @abalkin
    Copy link
    Member

    abalkin commented Mar 6, 2008

    With the following patch:

    ===================================================================

    --- Lib/test/test_itertools.py  (revision 61284)
    +++ Lib/test/test_itertools.py  (working copy)
    @@ -707,6 +707,12 @@
             a = []
             self.makecycle(takewhile(bool, [1, 0, a, a]), a)
     
    +    def test_issue2246(self):
    +        n = 10
    +        keyfunc = lambda x: x
    +        for i, j in groupby(xrange(n), key=keyfunc):
    +            keyfunc.__dict__.setdefault('x',[]).append(j)
    +                    
     def R(seqn):
         'Regular generator'
         for i in seqn:
    $ ./python Lib/test/regrtest.py -R :: test_itertools

    reports n*3 + 13 reference leaks. This should give a clue ...

    @abalkin
    Copy link
    Member

    abalkin commented Mar 6, 2008

    It looks like the problem is that the internal grouper object becomes a
    part of a cycle: keyfunc -> grouper(x) -> keyfunc(tgtkey), but its type
    does not support GC. I will try to come up with a patch.

    @rhettinger
    Copy link
    Contributor

    No need. I'm already working on adding GC to the grouper.

    @rhettinger rhettinger self-assigned this Mar 6, 2008
    @abalkin
    Copy link
    Member

    abalkin commented Mar 6, 2008

    Oops. Here is my patch anyways.

    @doublep
    Copy link
    Mannequin

    doublep mannequin commented Mar 6, 2008

    Damn, I wrote a patch too ;)

    @rhettinger
    Copy link
    Contributor

    r61286. Applied a patch substantially similar to Alexanders. Thanks
    for the test case and the report.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Oct 20, 2008

    Backport candidate

    @loewis loewis mannequin reopened this Oct 20, 2008
    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Oct 20, 2008

    Already backported in r61287.

    @loewis loewis mannequin closed this as completed Oct 20, 2008
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    performance Performance or resource usage
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants