Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slots class holding a reference to the original version #407

Closed
gmacon opened this issue Jul 11, 2018 · 10 comments · Fixed by #410
Closed

Slots class holding a reference to the original version #407

gmacon opened this issue Jul 11, 2018 · 10 comments · Fixed by #410

Comments

@gmacon
Copy link
Contributor

gmacon commented Jul 11, 2018

I have a class hierarchy implemented with slots enabled, like this:

import attr

@attr.s(slots=True)
class BaseClass(object):
    foo = attr.ib(default='foo')

@attr.s(slots=True)
class SubClass(BaseClass):
    bar = attr.ib(default='bar')

This works fine until I wanted to introspect the class hierarchy:

BaseClass.__subclasses__()  # returns [<class '__main__.SubClass'>, <class '__main__.SubClass'>]

One of these is the original class, and the other is the new one created and returned by attr.s. This behavior manifests on both Python 3 and Python 2 (3.7.0 and 2.7.14, to be specific).

My understanding was that classes kept weak references to their subclasses to allow the __subclasses__ method to work, and I know that attrs creates and returns a new class when slots=True, but it's not clear to me why the old class stays around. I don't see any obvious place where a strong reference to it is held.

I guessed that there might be a reference cycle somewhere, so I tried adding a call to gc.collect() and checking gc.garbage, but that turned out to be incorrect.

Because I didn't see that the reference-counting mentioned in this comment was ever addressed, I also guessed that this might be a leak due to the __class__ closure cell fixup. I decided that this isn't the cause, either, because Python 2 does not have the __class__ closure cell.

@hynek
Copy link
Member

hynek commented Jul 14, 2018

When replacing the class in slots=True attrs has to make sure that the class hierarchy remains intact, therefore the returned class is a subclass of the original one. Does that answer your question?

@gmacon
Copy link
Contributor Author

gmacon commented Jul 14, 2018

That would answer the question, but I'm not convinced that it's true...

Python 3.6.6 (default, Jun 27 2018, 13:11:40)
[GCC 8.1.1 20180531] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import attr
>>> @attr.s(slots=True)
... class Foo: pass
...
>>> class Bar(Foo): pass
...
>>> AttrsBar = attr.s(slots=True)(Bar)
>>> issubclass(AttrsBar, Bar)
False

@hynek
Copy link
Member

hynek commented Jul 15, 2018

Hm that might be due to how we create the class (by calling type() IIRC). Because OTOH:

>>> import attr

>>> class C: pass

>>> C2 = attr.s(slots=True)(C)

>>> C2.__mro__
(<class '__main__.C'>, <class 'object'>)

So the class is definitely there, it’s just that Python subclass machinery doesn’t know about it.

@gmacon
Copy link
Contributor Author

gmacon commented Jul 15, 2018

I think you're being misled by the fact that the class generated by attrs still thinks its name is C despite being bound to C2:

>>> import attr
>>> class C: pass
...
>>> C2 = attr.s(slots=True)(C)
>>> C2
<class '__main__.C'>
>>> C2.__mro__[0] is C
False
>>> C2.__mro__[0] is C2
True

When the class is created, it's using the original classes bases as the new bases, and I didn't see anywhere that the bases get modified...

@hynek
Copy link
Member

hynek commented Jul 17, 2018

Ah you’re right, turns out we’re more elaborate than I remembered:

cls = type(self._cls)(self._cls.__name__, self._cls.__bases__, cd)

So I guess if we have a leak, _ClassBuilder._create_slots_class() would be the place to look? Have you tried something like https://mg.pov.lt/objgraph/?

@gmacon
Copy link
Contributor Author

gmacon commented Jul 17, 2018

Ah, no I hadn't thought of that...

attrs_slots_subclasses

I think that means the correct fix is to add __weakref__ to the exception list here:

attrs/src/attr/_make.py

Lines 493 to 497 in 35f7745

cd = {
k: v
for k, v in iteritems(self._cls_dict)
if k not in tuple(self._attr_names) + ("__dict__",)
}

With that:

>>> import attr
>>> @attr.s(slots=True)
... class C: pass
...
>>> @attr.s(slots=True)
... class C2(C): pass
...
>>> import gc
>>> gc.collect()
11
>>> C.__subclasses__()
[<class '__main__.C2'>]

The original class is left as a cyclic isolate when attr.s returns. I think that's as far as the attrs project needs to go to fix this; I'll call gc.collect() from my application. I'll write a test and open a PR today or tomorrow.

gmacon added a commit to gmacon/attrs that referenced this issue Jul 18, 2018
self._cls_dict["__weakref__"] holds a reference to self._cls, preventing
self._cls from being released after the new, slots-enabled class is
returned.

Fixes python-attrs#407
hynek pushed a commit that referenced this issue Jul 28, 2018
self._cls_dict["__weakref__"] holds a reference to self._cls, preventing
self._cls from being released after the new, slots-enabled class is
returned.

Fixes #407
badicsalex added a commit to badicsalex/hun_law_py that referenced this issue Sep 12, 2020
Proper type annotations are needed by dict2object, which is done here.

There's also a weird bug with attrs slots and __subclasses__, so garbage
collection is also called.

See python-attrs/attrs#407
@Arcitec
Copy link

Arcitec commented Jan 25, 2024

Thanks both of you for investigating and solving that issue.

I checked the Python garbage collector docs and can hopefully clarify this topic a bit more.

First some basics about the Python garbage collector:

  • Python uses reference counting for all objects. Whenever a reference count reaches 0, the item is immediately freed (no need for garbage collection).
  • Cyclic references, where "islands of objects" point to each other (so they have 1+ references), but nothing outside points to them, will only be freed during a gc.collect() run. That's where Python does much deeper inspection to see whether there are any remaining "important" (non-weak) references to the object, or if it's okay to free the object from memory.
  • Python periodically runs gc.collect() automatically on its own (the interval can be changed via the gc library). It's on by default and most people don't need to worry about it.

Now the specific thing that's happening in attrs:

  • When attrs makes dict-based classes (slots=False), it simply dynamically adds methods to the existing class in Python's memory.
  • But when attrs makes a slots-based class instead, it has to copy the class and rewrite it as a __slots__ class, since slots cannot be modified/added to a class after it has already been created (by Python when it interprets your source code). To do this, attrs "deletes" your original class (the one you wrote in your source code) and dynamically creates the new slots-based variant with the same methods, same parents, a __slots__ property, the attrs-injected methods, etc etc.
  • Now the issue... All Python classes have a .__subclasses__ property which is a list of all immediate subclasses that have inherited from that class.
  • When Python interpreted the original source code, it added the basic dict-based (non-slots) class to the __subclasses__ of its parent class. Then attrs came in, deleted that class, created a new slots-based variant of the class, and Python therefore added the new __slots__ based class variant to the parent's __subclasses__.
  • As a result, the parent class .__subclasses__ property contains weak references to both classes, the original (deleted class) and the new attrs-slots class.
  • Because the original __dict__ based class (that attrs deleted) still has "1+ references" thanks to the parent's __subclasses__ property still linking to it, there are two side effects: The original class technically still exists in memory, and the original class is visible whenever observing class.__subclasses__ of the parent class.
  • But because __subclasses__ is a WEAK reference, it's allowed to be garbage collected.
  • So the next time Python runs gc.collect() automatically (or if the user triggers it manually), Python will analyze the old ("deleted") class that attrs deleted, then it will see that the only remaining reference to it is a weak reference from the parent's __subclasses__, and will free it from memory.

I think this only happens if your attrs-classes are inheriting from any base classes. If so, the parent classes will have the "removed" original class in its __subclasses__. But that still won't matter to most people. It will get automatically cleaned up after a while when Python runs its own GC periodically. And most people don't look at __subclasses__, so its inclusion of the weak ref to the original deleted class doesn't matter to most people. I also can't think of any non-debug/introspection related Python libraries that would ever care about looking at __subclasses__. So again, its existence won't matter to most people.

I also think none of this applies if your @attrs class doesn't have ANY base classes, by the way? Because if the class is not inheriting from anything, there wouldn't be any parents to carry __subclasses__, right? Maybe attrs sets the original (deleted) class as its own parent when it clones? But I doubt it. And even if it did, the original deleted class would not be accessible via code anymore in that case, and would get garbage collected too.

The reason why I don't know that last question for sure, is because I don't really understand the line that describes how classes are copied by attrs:

#407 (comment)

It seems like attrs is creating a class that's a child of the original classes' parent class. Rather than making it a child under the "deleted" class? But I don't know. That line is something I've never seen before. type(original_class)(original_class.__name__, original_class.parents) is basically how I understand that line, but I have no idea whether that's creating a child of the original parent or if it's a child of the deleted class.

I've already looked at TheClass.__bases__ to try to see if attrs inherits from the original, but it only outputs (<class 'object'>,) (<class 'object'>,) (even after garbage collection), which doesn't tell me if it's a child of the original class or not. Would love to know!

I hope this clarifies some things. Please correct if I've misunderstood anything, and please clarify whether the new class is a child of the deleted class, and if so, would that matter in any way? :)

@Arcitec
Copy link

Arcitec commented Jan 25, 2024

Well, I made a small test which seems to confirm that attrs does NOT make the "copied" class a subclass of the deleted class. It's only a subclass of the original parent.

Here's the test that confirmed this, as far as I can tell:

import attrs
import gc

# We must disable automatic garbage collection,
# otherwise Python will literally collect trash between
# the various class definitions below.
#
# Comment out this line to see the difference. ;)
gc.disable()

def readable_id(obj) -> str:
    addr = id(obj)
    return f"0x{addr:02x}"

class A:
    pass

@attrs.define(slots=True)
class B(A):
    pass

@attrs.define(slots=True)
class C(B):
    pass

print("Object ID of A (parent class of B, plain class):", readable_id(A))
print("Object ID of B (parent class of C, created by attrs):", readable_id(B))
print("Object ID of C (created by attrs):", readable_id(C))

def check_class_relationship(parent, child) -> None:
    print(f"- Comparing {parent.__name__} and {child.__name__}:")
    print(f"  * Subclasses of {parent.__name__}:")
    for x in parent.__subclasses__():
        print("    Class:", x, readable_id(x))
    print(f"  * Superclasses of {child.__name__}:")
    for x in child.__bases__:
        print("    Class:", x, readable_id(x))
    print(f"  * Subclasses of {child.__name__}:")
    for x in child.__subclasses__():
        print("    Class:", x, readable_id(x))

def inspect_classes(msg: str) -> None:
    print (f"\n{msg}:")

    check_class_relationship(A, B)
    check_class_relationship(B, C)

inspect_classes("Before garbage collection")

import gc
gc.collect()

inspect_classes("After garbage collection")

Result:

Object ID of A (parent class of B, plain class): 0x1d74e20
Object ID of B (parent class of C, created by attrs): 0x1d762c0
Object ID of C (created by attrs): 0x1d81e40

Before garbage collection:
- Comparing A and B:
  * Subclasses of A:
    Class: <class '__main__.B'> 0x1d75500
    Class: <class '__main__.B'> 0x1d762c0
  * Superclasses of B:
    Class: <class '__main__.A'> 0x1d74e20
  * Subclasses of B:
    Class: <class '__main__.C'> 0x1d769a0
    Class: <class '__main__.C'> 0x1d81e40
- Comparing B and C:
  * Subclasses of B:
    Class: <class '__main__.C'> 0x1d769a0
    Class: <class '__main__.C'> 0x1d81e40
  * Superclasses of C:
    Class: <class '__main__.B'> 0x1d762c0
  * Subclasses of C:

After garbage collection:
- Comparing A and B:
  * Subclasses of A:
    Class: <class '__main__.B'> 0x1d762c0
  * Superclasses of B:
    Class: <class '__main__.A'> 0x1d74e20
  * Subclasses of B:
    Class: <class '__main__.C'> 0x1d81e40
- Comparing B and C:
  * Subclasses of B:
    Class: <class '__main__.C'> 0x1d81e40
  * Superclasses of C:
    Class: <class '__main__.B'> 0x1d762c0
  * Subclasses of C:

I created the B and C sub-classes to get a bigger hierarchy to analyze, but since the result of the longer chain of classes was all the same, let's only focus on the relationship between A and B.

So... as far as I can see, B (the class-copy created by attrs) only has one superclass, the A class. It's not a child of the deleted version of B.

And we can also see that the only thing changed by forcing garbage collection manually, is that the A class list of subclasses gets cleaned up.

It's looking good. I'm basically doing this test to try to make sense of the "warning" in the attrs documentation which talked about downsides of copied slots classes, and to help others in the future make sense of this.

It doesn't seem like there's any actual downsides to attrs "cloning classes", apart from a temporary weak reference lingering in the original superclass, which only happens if your class inherits from anything. But I've never even heard of any code or libraries that care what's in the __subclasses__ property of the parent class, so in practically all codebases, the weak reference can safely sit there in __subclasses__ and will get automatically garbage collected later.

That's how I interpret things:

  • Don't worry about it whatsoever. Don't manually bother calling gc.collect(). It's only an issue in the extremely rare case that you somehow have a library, introspection etc which cares about looking at the __subclasses__ of the parent, but I've never seen such code being used for anything except introspection libraries. Furthermore, Python runs automatic GC very quickly (even between two class definitions as you can see from my gc.disable() code comment in the test), so you really don't need to worry about doing that manually.

@gmacon
Copy link
Contributor Author

gmacon commented Jan 25, 2024

I think this is almost right. The flaw I see in your understanding is that the original class is not kept alive by the weak reference from the parent's __subclasses__; the point of a weak reference is that it doesn't keep the referenced object alive. The class stays alive immediately because of the (three) reference cycles the class participates in. The bug was that the special __weakref__ object, which references the original class, was incorrectly copied to the new class, keeping it alive even after a full garbage collection was run.

All classes have a parent class; if you don't specify one, object is implied.

A more detailed explanation of the line you didn't understand. Here it is again, for reference:

cls = type(self._cls)(self._cls.__name__, self._cls.__bases__, cd)

Calling type with one argument returns the type of the thing. Calling it on a class usually returns type, but it can return another class when metaclasses are in play. So type(self._cls) gets the metaclass of the original class if there is one, otherwise it's just type. Then, that is called with the three-argument form that creates a new class, which you would normally never have to do by hand. cd is the __dict__ of the new class (not to be confused with the __dict__ of an instance of the class); it's created by copying the __dict__ of the original class, except for a few special attributes that are created anew by type, and with __slots__ defined based on the declared attributes.

The class generated by attrs isn't a subclass of the original class, that's correct.

Thinking about the warning: I originally ran into this because of a pattern in my code where I walked the inheritance tree in order to find all possible classes that might be needed when deserializing data from a database. It might be a good idea to expand the warning to explicitly mention another consequence: any custom metaclass will be invoked twice for the class. Depending on what the metaclass does, that might be fine or it might create a problem.

@Arcitec
Copy link

Arcitec commented Jan 25, 2024

The flaw I see in your understanding is that the original class is not kept alive by the weak reference from the parent's __subclasses__; the point of a weak reference is that it doesn't keep the referenced object alive.

Nono, it's kept alive. The weak reference means that the original class reference counter is still 1+. It's therefore not immediately freed when the class is "deleted" by attrs. It still has a reference (the weak reference), so the counter hasn't reached 0 yet.

Then, whenever Python's garbage collector runs, it detects that the ref count >= 1 is actually just weak reference(s) and finally frees the memory.

It's very much alive until then, but not reachable by any meaningful code, except code that looks at parent_class.__subclasses__().

Calling type with one argument returns the type of the thing.

Oh yeah, thanks for explaining that. My brain is tired today, and I totally forgot that type(a_class) returns type, not a_class. :) I thought it returned a_class, and that's why I was wondering if the new class actually inherits from the "deleted" original. Happy to hear that it's not inheriting from the deleted class.

Really grateful that you helped cleared that up. Thank you. It was also interesting to see that there's a low-level, programmatic syntax for creating classes. Fascinating. :)

I also appreciate the warning about metaclasses being invoked twice (once for the original and once for the copy). That's a good warning. So anyone writing metaclasses has to be aware of that fact. I think it would be good to add that warning to the attrs documentation @hynek, at https://www.attrs.org/en/stable/glossary.html#term-slotted-classes

Luckily almost nobody uses metaclasses, and they are only active when specifically invoked with class SomeClass(metaclass=Foo) (and in subclasses of such modified classes), which is a syntax most Python users have never even heard of and practically nobody uses. Therefore, almost nobody has to worry about that. ;)

Thank you both. I'm ready to move all my classes over to attrs slots now and save a bunch of memory (compared to dict). Feels good! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants