Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference cycles (zombis objects) in SortedDict #147

Closed
pivosan97 opened this issue May 12, 2020 · 6 comments
Closed

Reference cycles (zombis objects) in SortedDict #147

pivosan97 opened this issue May 12, 2020 · 6 comments

Comments

@pivosan97
Copy link

SortedDict creates reference cycles (zombie objects) which are not destroyed by reference counting algorithm but only by garbage collector iteration, what can cause performance and latency issues. It can be tracked using gc module.

@grantjenks
Copy link
Owner

It’s possible to create reference cycles with any container object. I can’t think of anything specific about the design of sortedcontainers that would create reference cycles. Please include code to demonstrate.

@pivosan97
Copy link
Author

I think that a reference cycle appears somewhere inside a SortedDict object.
Test

from sortedcontainers import SortedDict
import gc

def check(f):
    a = f()
    t = type(a)

    print(1)
    for obj in gc.get_objects():
        if type(obj) == t:
            print(obj)

    del a

    print(2)
    for obj in gc.get_objects():
        if type(obj) == t:
            print(obj)

    gc.collect()

    print(3)
    for obj in gc.get_objects():
        if type(obj) == t:
            print(obj)

check(lambda: SortedDict({'a': 1, 'b': 2}))


class MyDict(dict):
    pass

check(lambda: MyDict({'a': 1, 'b': 2}))

Output

1
SortedDict({'a': 1, 'b': 2})
2
SortedDict({'a': 1, 'b': 2})
3
1
{'a': 1, 'b': 2}
2
3

As you can see a regular dict is deallocated right after del operator (it is done by reference counting algorithm) while SortedDict object is collected only by GC what means there is a reference cycle.

@grantjenks
Copy link
Owner

Thank you for the code sample. I did a bit of debugging and traced the issue to this code in __init__:

        # Calls to super() are expensive so cache references to dict methods on
        # sorted dict instances.

        _dict = super(SortedDict, self)
        self._dict_clear = _dict.clear
        self._dict_delitem = _dict.__delitem__
        self._dict_iter = _dict.__iter__
        self._dict_pop = _dict.pop
        self._dict_setitem = _dict.__setitem__
        self._dict_update = _dict.update

SortedDict references dict through its base class reference and here the super call forces the dict methods to reference self which I think creates the cycle.

One fix would be to stop using super() and access the dictionary methods as dict.__delitem__ directly. The code where the functions are used would need to change so that self was passed as the first argument.

I'm not sure if this is worth fixing. Typically I only use a few SortedDict objects in a program. What's your use case like?

grantjenks added a commit that referenced this issue Jun 4, 2020
Caching dict methods using super creates a reference cycle which increases
memory pressure. User reported increased latencies due to GC pauses.
@grantjenks
Copy link
Owner

Fix pushed at 2b03703. Will deploy new version once Travis/AppVeyor are green.

@pivosan97 Would you please test on master?

Also, I would appreciate if you did not leave comments like this https://pythontips.com/2016/04/24/python-sorted-collections/comment-page-1/#comment-75266 for an open and on-going issue. My last comment (above) ended with "What's your use case like?" for which I didn't hear back.

@pivosan97
Copy link
Author

pivosan97 commented Jun 4, 2020 via email

@grantjenks
Copy link
Owner

Fix deployed in version 2.2.0 on PyPI: https://pypi.org/project/sortedcontainers/2.2.0/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants