Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't construct dataclass as ABC (or runtime check as data protocol) #83315

Open
ahirner mannequin opened this issue Dec 26, 2019 · 15 comments
Open

can't construct dataclass as ABC (or runtime check as data protocol) #83315

ahirner mannequin opened this issue Dec 26, 2019 · 15 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@ahirner
Copy link
Mannequin

ahirner mannequin commented Dec 26, 2019

BPO 39134
Nosy @gvanrossum, @ericvsmith, @ahirner
Files
  • dc_repro.py: reproducing cases A to D
  • dc2_repro.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-12-26.00:05:40.422>
    labels = ['type-bug', '3.8']
    title = "can't construct dataclass as ABC (or runtime check as data protocol)"
    updated_at = <Date 2020-01-19.16:30:34.545>
    user = 'https://github.com/ahirner'

    bugs.python.org fields:

    activity = <Date 2020-01-19.16:30:34.545>
    actor = 'gvanrossum'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = []
    creation = <Date 2019-12-26.00:05:40.422>
    creator = 'cybertreiber'
    dependencies = []
    files = ['48802', '48817']
    hgrepos = []
    issue_num = 39134
    keywords = []
    message_count = 14.0
    messages = ['358876', '358881', '358884', '358893', '358904', '358906', '358946', '358981', '359163', '359165', '359186', '359204', '360255', '360261']
    nosy_count = 3.0
    nosy_names = ['gvanrossum', 'eric.smith', 'cybertreiber']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue39134'
    versions = ['Python 3.6', 'Python 3.8']

    @ahirner
    Copy link
    Mannequin Author

    ahirner mannequin commented Dec 26, 2019

    At runtime, we want to check whether objects adhere to a data protocol. This is not possible due to problematic interactions between ABC and @DataClass.

    The attached file tests all relevant yet impossible cases. Those are:

    1. A(object): Can't check due to "Protocols with non-method members don't support issubclass()" (as outlined in PEP-554)
    2. B(ABC): "Can't instantiate abstract class B with abstract methods x, y"
    3. C(Protocol): same as A or same as B if @Property is @AbstractMethod

    The problem can be solved in two parts. First allowing to implement @abstractproperty in a dataclass (B). This doesn't involve typing and enables the expected use case of dataclass+ABC. I analysed this problem as follows:
    Abstract properties evaluate to a default of property, not to dataclasses.MISSING. Hence, dataclasses._init_fn throws TypeError because of deriving from class vars without defaults.

    Second, eliding the exception of @runtime_checkable Protocols with non-method members if and only if the the the protocol is in its MRO. I didn't think that through fully, but instantiation could e.g. fail for missing implementations as expected from ABC behaviour (see case D in attached file). I'm not sure about the runtime overhead of this suggestion.

    @ahirner ahirner mannequin added 3.8 only security fixes type-bug An unexpected behavior, bug, or error labels Dec 26, 2019
    @ericvsmith
    Copy link
    Member

    Is dataclasses doing something here that a regular, hand-written class wouldn't do?

    @ahirner
    Copy link
    Mannequin Author

    ahirner mannequin commented Dec 26, 2019

    Here, nothing but less boiler plate.

    Wouldn't it pay off to not rewrite dataclass features like frozen, replace, runtime refelection and the ability to use dataclass aware serialization libraries (e.g. pydantic)? I think @DataClass+@abstractproperty behaviour is yet to be defined.

    The second part about subclass check is not specific to dataclasses.

    @gvanrossum
    Copy link
    Member

    1. PEP-554 is about multiple interpreters. Which PEP did you mean?

    2. The double negative in “Wouldn't it pay off to not rewrite dataclass features” is confusing. What did you mean?

    @ahirner
    Copy link
    Mannequin Author

    ahirner mannequin commented Dec 27, 2019

    Pardon my sloppiness.

    1. That should have been PEP-544. The last point referred to the notion of data protocols [0].

    2. I think solving this issue for dataclasses would ensure better composition with modern libraries and other idioms like Protocol and ABC.

    [0] https://www.python.org/dev/peps/pep-0544/#runtime-checkable-decorator-and-narrowing-types-by-isinstance

    @gvanrossum
    Copy link
    Member

    Thanks. Can you be specific about “modern libraries”? Please clarify your use cases.

    @ahirner
    Copy link
    Mannequin Author

    ahirner mannequin commented Dec 28, 2019

    We construct a computational graph and need to validate (or search for possible new) edges at runtime. The data is specific to computer vision. One example is clipping geometry within 2D boundaries. A minimal implementation of this data would be:

    @dataclass
    class FramedGeometry:
        width: PositiveInt
        height: PositiveInt
        geometry: Geometry

    However, different properties are manifest in different encodings. Height, width can be meta data from an image database or inherent to a decoded image (as np.ndarray.shape). The transform will then dataclasses.replace(geometry=...) its attribute(s).. If width and height are not implemented, another transition is needed to produce them whilst only looking into the image header, not fully decoding potentially large and many images.

    The read-only interface ensures that transitions are generic wrt some forms of inputs. The replace interface preserves runtime types.

    Inputs and outputs are annotated with @DataClass or tuples of them. Those dataclasses are a mixin of base dataclasses that declare concrete properties like a URI of an image and ABCs that declare accessors like get_width(self) -> PositiveInt. We use @pydantic.dataclass to parse, validate and deserialize concrete classes to great extent [0].

    In order to not implement accessors on top of dataclasses, we'd want that abstract properties are compatible with dataclasses and issubclass works with data protocols (given the necessary constraints).

    PS:
    Polymorphism for computer-vision data is an interesting problem. Other approaches exist, each with their own struggle to model "traits" the right way [1]. E.g., scaling would be valid for FramedGeometry since no image data is included but invalid when images are referenced but cannot be resized, like:

    class EncodedSizedAnnotatedFrame:
        width: PositiveInt
        height: PositiveInt
        image_bin: bytes
        geometry: Geometry

    Thanks!

    [0] https://pydantic-docs.helpmanual.io/usage/dataclasses/
    [1] pytorch/vision#1406

    @gvanrossum
    Copy link
    Member

    Have you tried dropping ABCMeta? Mypy checks @AbstractMethod regardless.

    @ahirner
    Copy link
    Mannequin Author

    ahirner mannequin commented Jan 1, 2020

    Dropping ABCMeta stops at instantiation. This should be in the dataclass code that's been generated.

    File "<string>", line 2, in __init__
    AttributeError: can't set attribute

    Repro:

    class QuasiABC:
        @property
        @abstractmethod
        def x(self) -> int: ...
    
    @dataclass(frozen=True)
    class E(QuasiABC):
        x: int
    
    E(10)
    

    Interestingly, frozen=False is giving the same error.

    @gvanrossum
    Copy link
    Member

    Try adding a setter to the base class method.

    @ahirner
    Copy link
    Mannequin Author

    ahirner mannequin commented Jan 2, 2020

    This results in E(x=None).

    I'd need to look into that to understand why.

    @gvanrossum
    Copy link
    Member

    No doubt because your getter returns None.

    @ahirner
    Copy link
    Mannequin Author

    ahirner mannequin commented Jan 19, 2020

    In that case, what should the getter return? It doesn't know about the implementation of x.
    Maybe I'm not getting the idea behind adding getters/setters.

    @gvanrossum
    Copy link
    Member

    In the example, it should be int, right?

    Anyway, the bug tracker is not a good place to get questions answered. Since this is mostly about type checking, I recommend that you try this Gitter instance: https://gitter.im/python/typing

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @taranu
    Copy link

    taranu commented Jan 19, 2023

    I ran into the same "can't define dataclass as ABC" problem and found that using py3.10's slots=True kwarg for dataclass fixes it. However, it does have the admittedly obscure side effect of preventing calls to superclass __post_init__ when inheriting from another dataclass. Dataclasses can't call super(A, self) and dataclasses with slots=True can't call super() either, unless I'm misunderstanding something.

    from abc import ABCMeta, abstractmethod
    from dataclasses import dataclass
    
    class ABC(metaclass=ABCMeta):
        @property
        @abstractmethod
        def x(self) -> int: ...
    
    @dataclass(slots=True, kw_only=True)
    class A(ABC):
        x: int
    
        def __post_init__(self):
            pass
    
    @dataclass(slots=True, kw_only=True)
    class E(A):
        y: int
    
        def __post_init__(self):
            # super() raises this error whether slots=True or not:
            # TypeError: super(type, obj): obj must be an instance or subtype of type
            super(A, self).__post_init__()
    
    A(x=1) # works fine
    E(x=1, y=2) # 'super' object has no attribute '__post_init__'
    

    On further testing, super(A, self).__thisclass__.__post_init__(self) does work, although it's, uh, not pretty.

    @iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 26, 2023
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants