Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define behavior of descriptor-typed fields on dataclasses #91330

Closed
debonte mannequin opened this issue Mar 30, 2022 · 12 comments
Closed

Define behavior of descriptor-typed fields on dataclasses #91330

debonte mannequin opened this issue Mar 30, 2022 · 12 comments
Labels
3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@debonte
Copy link
Mannequin

debonte mannequin commented Mar 30, 2022

BPO 47174
Nosy @ericvsmith, @JelleZijlstra, @debonte

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2022-03-30.19:00:12.315>
labels = ['type-feature', 'library', '3.11']
title = 'Define behavior of descriptor-typed fields on dataclasses'
updated_at = <Date 2022-04-01.20:03:22.690>
user = 'https://github.com/debonte'

bugs.python.org fields:

activity = <Date 2022-04-01.20:03:22.690>
actor = 'zzzeek'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2022-03-30.19:00:12.315>
creator = 'debonte'
dependencies = []
files = []
hgrepos = []
issue_num = 47174
keywords = []
message_count = 1.0
messages = ['416392']
nosy_count = 4.0
nosy_names = ['eric.smith', 'zzzeek', 'JelleZijlstra', 'debonte']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue47174'
versions = ['Python 3.11']

@debonte
Copy link
Mannequin Author

debonte mannequin commented Mar 30, 2022

Recent discussions about PEP-681 (dataclass_transform) have focused on support for descriptor-typed fields. See the email thread here: https://mail.python.org/archives/list/typing-sig@python.org/thread/BW6CB6URC4BCN54QSG2STINU2M7V4TQQ/

Initially we were thinking that dataclass_transform needed a new parameter to switch between two modes. In one mode, it would use the default behavior of dataclass. In the other mode, it would be smarter about how descriptor-typed fields are handled. For example, __init__ would pass the value for a descriptor-typed field to the descriptor's __set__ method. However, Carl Meyer found that dataclass already has the desired behavior at runtime! We missed this because mypy and Pyright do not correctly mirror this runtime behavior.

Although this is the current behavior of dataclass, I haven't found it documented anywhere and the behavior is not covered by unit tests. Since dataclass_transform wants to rely on this behavior and the behavior seems desirable for dataclass as well, I'm proposing that we add additional dataclass unit tests to ensure that this behavior does not change in the future.

Specifically, we would like to document (and add unit tests for) the following behavior given a field whose default value is a descriptor:

  1. The value passed to __init__ for that field is passed to the descriptor’s __set__ method, rather than overwriting the descriptor object.

  2. Getting/setting the value of that field goes through __get__/set, rather than getting/overwriting the descriptor object.

Here's an example:

class Descriptor(Generic[T]):
    def __get__(self, __obj: object | None, __owner: Any) -> T:
        return getattr(__obj, "_x")

    def __set__(self, __obj: object | None, __value: T) -> None:
        setattr(__obj, "_x", __value)

@dataclass
class InventoryItem:
    quantity_on_hand: Descriptor[int] = Descriptor[int]()

i = InventoryItem(13)     # calls __set__ with 13
print(i.quantity_on_hand) # 13 -- obtained via call to __get__
i.quantity_on_hand = 29   # calls __set__ with 29
print(i.quantity_on_hand) # 29 -- obtained via call to __get__

I took a first stab at unit tests here: debonte@c583e7c

We are aware of two other descriptor-related behaviors that may also be worth documenting:

First, if a field is annotated with a descriptor type but is *not* assigned a descriptor object as its default value, it acts like a non-descriptor field. Here's an example:

@dataclass
class InventoryItem:
    quantity_on_hand: Descriptor[int] # No default value

i = InventoryItem(13)      # Sets quantity_on_hand to 13 -- No call to Descriptor.__set__
print(i.quantity_on_hand)  # 13 -- No call to Descriptor.__get__

And second, when a field with a descriptor object as its default value is initialized (when the code for the dataclass is initially executed), __get__ is called with a None instance and the return value is used as the field's default value. See the example below. Note that if __get__ doesn't handle this None instance case (for example, in the initial definition of Descriptor above), a call to InventoryItem() fails with "TypeError: InventoryItem.__init__() missing 1 required positional argument: 'quantity_on_hand'".

I'm less sure about documenting this second behavior, since I'm not sure what causes it to work, and therefore I'm not sure how intentional it is.

class Descriptor(Generic[T]):
    def __init__(self, *, default: T):
        self._default = default

    def __get__(self, __obj: object | None, __owner: Any) -> T:
        if __obj is None:
            return self._default
    return getattr(__obj, "_x")
    def __set__(self, __obj: object | None, __value: T) -> None:
        if __obj is not None:
            setattr(__obj, "_x", __value)

# When this code is executed, __get__ is called with __obj=None and the
# returned value is used as the default value of quantity_on_hand.
@dataclass
class InventoryItem:
    quantity_on_hand: Descriptor[int] = Descriptor[int](default=100)

i = InventoryItem()       # calls __set__ with 100
print(i.quantity_on_hand) # 100 -- obtained via call to __get__

@debonte debonte mannequin added 3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Mar 30, 2022
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
@ericvsmith
Copy link
Member

I think it's fine to document and test for the first two items.

  1. The value passed to init for that field is passed to the descriptor’s set method, rather than overwriting the descriptor object.
  2. Getting/setting the value of that field goes through get/set, rather than getting/overwriting the descriptor object.

For:
3. If a field is annotated with a descriptor type but is not assigned a descriptor object as its default value, it acts like a non-descriptor field.

I think that's the correct behavior. To do otherwise would require dataclasses to look at the annotation to figure out what type of descriptor to create, which I think is outside its scope.

For:
4. When a field with a descriptor object as its default value is initialized (when the code for the dataclass is initially executed), get is called with a None instance and the return value is used as the field's default value. See the example below.

I don't really understand what's going on. I'll look at it further.

@carljm
Copy link
Member

carljm commented May 21, 2022

Behavior 4 happens because in _get_field dataclasses does default = getattr(cls, a_name, MISSING) to get the default value (if any) off the class (where it naturally is found since default values are syntactically defined as class attribute values.) If cls.a_name has been set to a descriptor, the normal behavior of a descriptor when it is accessed directly off the class is to call its __get__ method with the obj/instance argument set to None.

I actually think this behavior is quite nice in practice for dataclasses, and if we are going to document these behaviors, IMO it makes sense to document it too. It is nice because it gives the __get__ method a consistent return type which is the type of both the default value and any other value returned from accessing the field on an instance, and gives dataclass descriptor fields a simple way (that already naturally falls out of the existing implementation) to specify their default value.

I think the possible downside is that it is somewhat common today for descriptors to return self if the obj argument is None -- i.e. if you access the descriptor on the class you get the actual descriptor object back. So there may be existing cases in the wild where people are using descriptors with this return self behavior on dataclass-like classes, and would have to change it in order to work with the current behavior of dataclasses. (SQLAlchemy may fall into that category?) On the other hand, as far as I know there often isn't a strong use case for the return self behavior -- it's just that there's no other obvious thing to return when accessed on a class and so return self is kind of a simple fallback that supports more transparent introspection of the class. So it may not be a major cost to move away from return self?

@debonte
Copy link
Contributor

debonte commented Jun 28, 2022

I actually think this behavior is quite nice in practice for dataclasses, and if we are going to document these behaviors, IMO it makes sense to document it too.

@carljm, thanks for explaining why this works. I agree that it's worth documenting.

I think the possible downside is that it is somewhat common today for descriptors to return self if the obj argument is None

Are you saying that this is a downside to documenting it? Given that dataclass already behaves this way and no one has expressed a desire to change that behavior, is there a downside?

I think your point is that if a user relies on the default value of a descriptor-typed field and the descriptor's __get__ returns self for the None obj case, it's likely that the user will run into problems at runtime because they'll be passing a descriptor instance (the self instance) to the descriptor's __set__ method. Is that correct?

So perhaps type checkers and language servers will choose not to support this because they don't want to lead users into that sort of problem. But documenting the behavior at least makes it common knowledge and can steer descriptor authors in the right direction in the future. And maybe tools can help there, for example maybe by warning if a __get__ overload for None obj returns a descriptor type.

Do you disagree?

@zzzeek
Copy link

zzzeek commented Jun 28, 2022

SQLAlchemy's descriptor (the one in question) does not return "self" when "obj" is None, it returns a SQL expression construct. this doesn't actually affect SQLAlchemy with dataclasses in any case as we don't make any dataclasses with said descriptor interpreted in a dataclasses context at runtime.

as far as descriptors generally reutnring "self" when obj is None, I'm not sure what else you think would be returned by the average "normal" descriptor. returning "self" is usually kind of important for tools like Sphinx autodoc and such to work, and also the primary document on descriptors pretty much uses "return self" in all the examples.

@debonte
Copy link
Contributor

debonte commented Jun 28, 2022

@zzzeek, are you arguing for not documenting this behavior at all or just pointing out that this is only relevant to descriptors that are used as dataclass field types rather than all descriptors?

@zzzeek
Copy link

zzzeek commented Jun 28, 2022

I am sort of pointing out the latter, but also wondering what the suggestion is for garden variety descriptors to do when their __get__() method is called with obj=None.

@debonte
Copy link
Contributor

debonte commented Jun 28, 2022

I think that may be intentionally left open-ended. PEP 252 says:

__get__(): ...This is also referred to as a “binding” operation...Exactly what is returned by the binding operation depends on the semantics of the descriptor

@carljm
Copy link
Member

carljm commented Jun 29, 2022

Are you saying that this is a downside to documenting it? Given that dataclass already behaves this way and no one has expressed a desire to change that behavior, is there a downside?

No, I don't really think there is a downside to documenting it. I guess I meant "downside" broadly as in "this behavior kind of conflicts with a typical way that descriptors are written," but I don't think that's sufficient reason either to change dataclasses' behavior or avoid documenting it. I'm not sure how common use of descriptors as dataclass fields will ever be, but I think if they are used, this way of setting the default value makes as much sense as anything, and has some strong advantages (not least that it is the status quo).

It may also be worth documenting the corollary that if obj is None: raise AttributeError() is how a descriptor __get__() method would signal to dataclasses that there is no default value for the field.

@zzzeek I wasn't trying to question return self as one reasonable choice of obj=None behavior for descriptors that aren't meant for use as dataclass fields.

ambv added a commit that referenced this issue Jul 5, 2022
)

Co-authored-by: Łukasz Langa <lukasz@langa.pl>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 5, 2022
…ythonGH-94424)

Co-authored-by: Łukasz Langa <lukasz@langa.pl>
(cherry picked from commit 5f31930)

Co-authored-by: Erik De Bonte <erikd@microsoft.com>
ambv pushed a commit to ambv/cpython that referenced this issue Jul 5, 2022
…fields (pythonGH-94424)

Co-authored-by: Łukasz Langa <lukasz@langa.pl>
(cherry picked from commit 5f31930)

Co-authored-by: Erik De Bonte <erikd@microsoft.com>
@debonte
Copy link
Contributor

debonte commented Jul 5, 2022

@ericvsmith, my PR got merged without your review. Are you OK with the test changes in there?

ambv pushed a commit that referenced this issue Jul 5, 2022
) (GH-94576)

Co-authored-by: Erik De Bonte <erikd@microsoft.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
(cherry picked from commit 5f31930)
ambv added a commit that referenced this issue Jul 5, 2022
…GH-94424) (GH-94577)

Co-authored-by: Erik De Bonte <erikd@microsoft.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
(cherry picked from commit 5f31930)
@ambv
Copy link
Contributor

ambv commented Jul 5, 2022

Yes, this is now landed in 3.10 - 3.12. Thanks, Erik!

I'll leave the issue open for a while in case Eric's got any test improvement suggestions. We can easily make those improvements forward.

@ericvsmith
Copy link
Member

Thanks, @debonte. I thought I'd reviewed this but I guess I never committed it. It looks good to me, so I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants