Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subscripted generic classes should not have independent class variables #392

Closed
ilevkivskyi opened this issue Feb 22, 2017 · 13 comments
Closed
Assignees

Comments

@ilevkivskyi
Copy link
Member

Currently we have

class A(Generic[T]):
    x = None

A[int].x = 1
A[str].x = 'a'

print(A[int].x) # prints 1
print(A[int]().x) # prints None, since type is erased on instantiation

@JukkaL argues and I agree with him, that this looks wrong. Conceptually, there should be only one runtime class. I think we can implement this without loosing the current runtime generic information by tweaking descriptor __dict__ of GenericMeta, so that all assignments and look-ups on subscripted class objects like A[int] will be transferred to the original class object A.

@gvanrossum What do you think?

@ilevkivskyi
Copy link
Member Author

It turns out that there is even a simpler way, just use __setattr__.

@gvanrossum
Copy link
Member

gvanrossum commented Feb 22, 2017 via email

@orenbenkiki
Copy link

I just posted the following in the python mailing list, and got a response pointing to this issue, which seems relevant.

TL;DR: trying to implement the type erasure by making all the "specialized" classes share the same dictionary is sometimes the wrong thing to do; also, it is only partially implemented in 3.6.1, and this causes code like cls.foo = True; assert cls.foo to break (inside meta classes) - something which is even worse IMVHO than the example given by @ilevkivskyi above.

My mailing list post (edited now that I know somewhat more) was:

TL;DR: We need improved documentation of the way meta-classes behave for generic classes, and possibly reconsider the way __setattr__ and __getattribute__ behave for such classes.

I am using meta-programming pretty heavily in one of my projects.
It took me a while to figure out the dance between meta-classes and generic classes in Python 3.6.0.

I couldn't find good documentation for any of this (if anyone has a good link, please share...), but with a liberal use of print I managed to reverse engineer how this works. The behavior isn't intuitive but I can understand the motivation (basically, "type annotations shall not change the behavior of the program").

For the uninitiated:

  • It turns out that there are two kinds of instances of generic classes: the unspecialized class (basically ignoring type parameters), and "specialized" classes (created when you write Foo[Bar], which know the type parameters, Bar in this case).

  • This means the meta-class __new__ method is called sometimes to create the unspecialized class, and sometimes to create a specialized one - in the latter case, it is called with different arguments...

  • No object is actually an instance of the specialized class; that is, the __class__ of an instance of Foo[Bar] is actually the unspecialized Foo (which means you can't get the type parameters by looking at an instance of a generic class).

So far, so good, sort of. I implemented my meta-classes to detect whether they are creating a "specialized" or "unspecialized" class and behave accordingly.

However, these meta-classes stopped working when switching to Python 3.6.1. The reason is that in Python 3.6.1, a __setattr__ implementation was added to GenericMeta, which redirects the setting of an attribute of a specialized class instance to set the attribute of the unspecialized class instance instead.

This causes code such as the following (inside the meta-class) to behave in a mighty confusing way:

    if is-not-specialized:
        cls._my_attribute = False
    else:  # Is specialized:
        cls._my_attribute = True
        assert cls._my_attribute  # Fails!

As you can imagine, this caused us some wailing and gnashing of teeth, until we figured out (1) that this was the problem and (2) why it was happening.

Looking into the source code in typing.py, I see that I am not the only one who had this problem. Specifically, the implementers of the abc module had the exact same problem. Their solution was simple: the GenericMeta.__setattr__ code explicitly tests whether the attribute name starts with _abc_, in which case it maintains the old behavior.

Obviously, I should not patch the standard library typing.py to preserve _my_attribute. My current workaround is to derive from GenericMeta, define my own __setattr__, which preserves the old behavior for _my_attribute, and use that instead of the standard GenericMeta everywhere.

My code now works in both 3.6.0 and 3.6.1. However, I think the following points are worth fixing and/or discussion:

Edit: Ok, I understand the API is officially unstable. It would have been nice to see a line saying "changes in the unstable typing API".

  • In general it would be good to have some documentation on the way that meta-classes and generic classes interact with each other, as part of the standard library documentation (apologies if it is there and I missed it... link?)

Edit: Ok, I understand this is under active discussion. This post is my $0.02.

  • I'm not convinced the new behavior is a better default. I don't recall seeing a discussion about making this change, possibly I missed it (link?)

Edit: I understand the concept of "type erasure" when dealing with an instance - I'm not arguing against this type erasure. However, I think that as long as we do have both "specialized" and "unspecialized" instances of the same generic class, trying to pretend we don't has unforeseen undesirable consequences, especially when considering code inside meta-classes for generic classes. Perhaps we shouldn't try to pretend there is only one runtime class after all... Specifically:

  • There is a legitimate need for the old behavior (normal per-instance attributes). For example, it is needed by the "abc" module (as well as my project). So, some mechanism should be recommended (in the documentation) for people who need the old behavior.

Edit: This is independent of the previous point. That is, even if we make the default behavior to be "as if" there is a single runtime class, there are legitimate reasons to want to have un-erased attributes, so some mechanism should be provided (and, eventually, documented).

  • Separating between "really per instance" attributes and "forwarded to the unspecialized instance" attributes based on their prefix seems to violate "explicit is better than implicit". For example, it would have been explicit to say cls.__unspecialized__.attribute (other explicit mechanisms are possible).

Edit: Alternatively, cls.__specialized__.attribute, if the default goes the other way.

  • Perhaps the whole notion of specialized vs. unspecialized class instances needs to be made more explicit in the GenericMeta API...

Edit: This again goes to the API being unstable. I assume it would be made more explicit when the API stabilizes.

  • Finally and IMVHO most importantly, it is very confusing to override __setattr__ and not override __getattribute__ to match. This gives rise to code like cls._foo = True; assert cls._foo failing. This feels wrong.... And presumably fixing the implementation so that __getattribute__ forwards the same set of attributes to the "unspecialized" instance wouldn't break any code... Other than code that already broken due to the new functionality, that is.

Edit: This is the core issue, IMVHO - __setattr__ and __getattribute__ should be consistent, whatever the behavior is. Right now (in 3.6.1) they aren't.

@ilevkivskyi
Copy link
Member Author

@orenbenkiki
I have read the whole post twice and still don't understand anything. Could you please just provide a self-consistent snippet of code that behaves wrong in your opinion? Also please provide a more detailed description of how it is wrong (simple # Fails! is not enough).

Specifically, the implementers of the abc module had the exact same problem. Their solution was simple: the GenericMeta.__setattr__ code explicitly tests whether the attribute name starts with _abc_, in which case it maintains the old behavior.

FWIW I put this code there, and I am not an "implementer of the abc module". This code is there just to not interfere with ABC infrastructure and other internal mechanisms (all generics are also ABCs).

@orenbenkiki
Copy link

"Not interfere with the ABC infrastructure and other internal mechanisms" is sort of my problem, except in this case it is "interfere with my project's infrastructure and other external mechanisms" :-) Special-casing ABC is understood as it is part of the standard library. The thing is, meta-classes in external libraries (such as mine) have the same requirements, and they don't have the advantage of being able to inject this workaround into GenericMeta.

Here is a minimal piece of code:

from typing import GenericMeta, Generic, TypeVar

T = TypeVar('T')

class M(GenericMeta):
    def __new__(*args, **kwargs):
        cls = GenericMeta.__new__(*args, **kwargs)
        print(id(cls), list(kwargs.keys()))
        cls.foo = len(kwargs)
        print(id(cls), cls.foo)
        assert(cls.foo == len(kwargs))
        return cls

class G(Generic[T], metaclass=M):
    pass

print(G.foo)
print(G[int].foo)

I only have Python 3.6.0 on my laptop here, so this works fine and prints:

60320576 []
60320576 0
0
60321048 ['tvars', 'args', 'origin', 'extra', 'orig_bases']
60321048 5
5

If you run the same program with Python 3.6.1, the assert will raise AssertionError(...) and abort the program.

That is, if you write cls.foo = 7 and the next line is print(cls.foo) it will not print 7.
I find this to be... let's go with "surprising" :-)

@ilevkivskyi
Copy link
Member Author

This is not the problem with the concept of "relaying" attributes to the erased class, but a bug in the implementation: we don't have a custom __getattribute__ and just rely on the fact that C is in C[int].__mro__. But, we also copy __dict__ on subscription, which is a logical flaw. There are two possible ways to fix this:

  • Don't copy the whole dictionary, but only a required minimum (a la _no_slots_copy that we have now). I don't like this TBH, since this might be fragile.
  • Just add __getattribute__ that mirrors __setattr__. I did this and it works fine with your example (produces same output as in Python 3.6.0).

Both options will lead to some performance penalty, but I expect it to be pretty minor.

There is also an issue about documenting GenericMeta (assigned to me), but I think there might be some changes to improve speed as discussed in another issue here.

@ilevkivskyi ilevkivskyi reopened this Jul 16, 2017
@ilevkivskyi ilevkivskyi self-assigned this Jul 16, 2017
@orenbenkiki
Copy link

orenbenkiki commented Jul 16, 2017

Thanks, fixing __getattribute__ was the core issue, as I said.

I take it that the recommended way to have attributes which are not shared, is to override __setattr__ and __getattribute__ (like was done for the ABC attributes), as opposed to, say, providing a standard __unshared__ dictionary attribute (or something like that)?

@orenbenkiki
Copy link

BTW, not arguing with the "relaying attributes" decision, but it is worth noting it isn't compatible with 3.6.0. Specifically, if the code sample said:

from typing import GenericMeta, Generic, TypeVar

T = TypeVar('T')

class M(GenericMeta):
    def __new__(*args, **kwargs):
        cls = GenericMeta.__new__(*args, **kwargs)
        print(id(cls), list(kwargs.keys()))
        cls.foo = len(kwargs)
        print(id(cls), cls.foo)
        assert(cls.foo == len(kwargs))
        return cls

class G(Generic[T], metaclass=M):
    pass

print(G.foo)
print(G[int].foo)
print(G.foo, '<- Added, different')

Then in 3.6.0 it would print:

60320576 []
60320576 0
0
60321048 ['tvars', 'args', 'origin', 'extra', 'orig_bases']
60321048 5
5
0 <- Added, different

While in the fixed 3.6.1 it will presumably print:

60320576 []
60320576 0
0
60321048 ['tvars', 'args', 'origin', 'extra', 'orig_bases']
60321048 5
5
5 <- Added, different

Again, I can see why you'd want to do that, just noting that it is incompatible with 3.6.0.

@gvanrossum
Copy link
Member

Please stop bringing up the incompatibilities between versions. You are now aware that this package is in the stdlib provisionally and you just need to accept that, or wait until it's stabilized (in 3.7 the status will probably not be provisional). Other than that, your bug report is valuable.

@ilevkivskyi
Copy link
Member Author

ilevkivskyi commented Jan 30, 2018

Note that this was essentially fixed by PEP 560. The only remaining part is to be more "selective" with dunder attributes. Namely we can still relay some dunders if they don't have any particular meaning in Python, so that C[int].__random_name__ will actually return C.__random_name__.

@drhagen
Copy link

drhagen commented Feb 16, 2018

I encountered what I considered to be unexpected behavior in Python and tracked it down to this issue. I thought I would report on what I experienced. I didn't need a metaclass or monkey-patching to trigger this. Now that I've read the source code I understand why this happened, but I was not expecting attribute assignment on one type to proxy to the other type.

Here's a simplified example of a mixin that enhances a generic serializer class by finding the appropriate serializer for each type argument:

from typing import Generic, TypeVar, List

T = TypeVar('T')


# A mixin that gets all the default serializers for the type arguments
class GenericSerializerMixin:
    element_serializers = None

    @classmethod
    def _init_element_serializer(cls):
        if cls.element_serializers is None:
            cls.element_serializers = []
            for type_arg in cls.__args__:
                # Pretend this is expensive operation we only want to incur once per subclass
                if type_arg == int:
                    cls.element_serializers.append(lambda x: 'I am an integer: ' + str(x))
                elif type_arg == str:
                    cls.element_serializers.append(lambda x: 'I am a string: ' + str(x))


# A generic serializer for a generic class
class ListSerializer(GenericSerializerMixin, Generic[T]):
    @classmethod
    def to_data(cls, value: List[T]):
        cls._init_element_serializer()
        return [cls.element_serializers[0](x) for x in value]


foo1 = ListSerializer[int]
foo2 = ListSerializer[str]

print(foo1.to_data([1, 2, 3]))
print(foo2.to_data(['a', 'b', 'c']))  # <-- WHAT?? Calls lambda x: 'I am a integer: ' + str(x)

I found it very surprising that modifying cls caused the other object to be modified, especially because the ids were different and __mro__ listed each type a proper and different subtype of the base class. Once I inspected the source code, I found __getattr__ and the workaround (rename element_serializers to __element_serializers__).

@ilevkivskyi
Copy link
Member Author

I believe your problem is a bit opposite to the issue. PEP 484 defines that type variables only apply to instances, not classes, i.e. List[int], List[str], List, etc. share the same namespace. The bug (fixed I believe in Python 3.7 beta1) is that this sometimes doesn't happen.

I would recommend you to read PEPs 484 and 483 about why types (static concept) shouldn't be confused with classes (runtime concept). The fact that List[int] was an actual class object is an (unfortunate and deprecated) implementation detail.

@srittau
Copy link
Collaborator

srittau commented Nov 4, 2021

If there is still a bug in the implementation, it should be brought up on bugs.python.org.

@srittau srittau closed this as completed Nov 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants