New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support __slots__? #28

Closed
ericvsmith opened this Issue Sep 7, 2017 · 8 comments

Comments

Projects
None yet
6 participants
@ericvsmith
Owner

ericvsmith commented Sep 7, 2017

Currently the draft PEP specifies and the code supports the optional ability to add __slots__. This is the one place where @dataclass cannot just modify the given class and return it: because __slots__ must be specified at class creation time, it's too late by the time the dataclass decorator gets control. The current approach is to dynamically generate a new class while setting __slots__ in the new class and copying over other class attributes. The decorator then returns the new class.

The question is: do we even want to support setting __slots__? Is having __slots__ important enough to have this deviation from the "we just add a few dunder methods to your class" behavior?

I see three options:

  1. Leave it as-is, with @dataclass(slots=True) returning a new class.
  2. Completely remove support for setting __slots__.
  3. Add a different decorator, say @add_slots, which takes a data class and creates a new class with __slots__ set.

I think we should either go with 2 or 3. I don't mind not supporting __slots__, but if we do want to support it, I think it's easier to explain with a separate decorator.

@add_slots
@dataclass
class C:
    x: int
    y: int

It would be an error to use @add_slots on a non-dataclass class.

@ilevkivskyi

This comment has been minimized.

Show comment
Hide comment
@ilevkivskyi

ilevkivskyi Sep 7, 2017

Contributor

I think we should allow __slots__. Although they are not mainstream, they are still used. I am however not sure about API that we should use. I think @add_slots still sounds like you patch an existing class. Maybe call it @with_slots? Finally, maybe we still can use single decorator, but call the keyword with_slots to distinguish it from other keywords? My point is that people who will use with_slots are probably familiar with how slots work, so that they will not be surprised this option returns a new class.

Contributor

ilevkivskyi commented Sep 7, 2017

I think we should allow __slots__. Although they are not mainstream, they are still used. I am however not sure about API that we should use. I think @add_slots still sounds like you patch an existing class. Maybe call it @with_slots? Finally, maybe we still can use single decorator, but call the keyword with_slots to distinguish it from other keywords? My point is that people who will use with_slots are probably familiar with how slots work, so that they will not be surprised this option returns a new class.

@gvanrossum

This comment has been minimized.

Show comment
Hide comment
@gvanrossum

gvanrossum Sep 7, 2017

gvanrossum commented Sep 7, 2017

ericvsmith added a commit that referenced this issue Sep 7, 2017

ericvsmith added a commit that referenced this issue Sep 7, 2017

Merge pull request #30 from ericvsmith/remove-slots
Issue #28: Remove support for setting slots. Support might be added back at a later date.
@ericvsmith

This comment has been minimized.

Show comment
Hide comment
@ericvsmith

ericvsmith Sep 7, 2017

Owner

Agreed. I removed slots in PR #30. The git tag last-version-with-slots points to the code where slots was working.

Owner

ericvsmith commented Sep 7, 2017

Agreed. I removed slots in PR #30. The git tag last-version-with-slots points to the code where slots was working.

@ericvsmith ericvsmith closed this Sep 7, 2017

@cjrh

This comment has been minimized.

Show comment
Hide comment
@cjrh

cjrh Oct 13, 2017

@ericvsmith Adding __slots__ manually works as long as there are no defaults:

>>> @dataclass
... class C:
...     __slots__ = {'x', 'y'}
...     x: int
...     y: int
...     
>>> o = C(1,2)
>>> o
C(x=1, y=2)
>>> @dataclass
... class C:
...     __slots__ = {'x', 'y'}
...     x: int
...     y: int = 1
...     
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ValueError: 'y' in __slots__ conflicts with class variable

You're likely already aware of this, but I'm letting you know on the small chance it got missed.

(My interest in this is making dataclasses work with my "autoslot" toy class which injects slots into the class definition via a metaclass-enabled superclass: https://github.com/cjrh/autoslot. To make it compatible with @dataclass, Inside my metaclass I can look for __annotations__ in the cls namespace, and that works fine, but I can't get around the class problem in the traceback above.)

cjrh commented Oct 13, 2017

@ericvsmith Adding __slots__ manually works as long as there are no defaults:

>>> @dataclass
... class C:
...     __slots__ = {'x', 'y'}
...     x: int
...     y: int
...     
>>> o = C(1,2)
>>> o
C(x=1, y=2)
>>> @dataclass
... class C:
...     __slots__ = {'x', 'y'}
...     x: int
...     y: int = 1
...     
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ValueError: 'y' in __slots__ conflicts with class variable

You're likely already aware of this, but I'm letting you know on the small chance it got missed.

(My interest in this is making dataclasses work with my "autoslot" toy class which injects slots into the class definition via a metaclass-enabled superclass: https://github.com/cjrh/autoslot. To make it compatible with @dataclass, Inside my metaclass I can look for __annotations__ in the cls namespace, and that works fine, but I can't get around the class problem in the traceback above.)

@cjrh

This comment has been minimized.

Show comment
Hide comment
@cjrh

cjrh Oct 13, 2017

Thinking it over, I think my use-case is different to what dataclasses are for, and so compatibility probably doesn't make sense anyway.

cjrh commented Oct 13, 2017

Thinking it over, I think my use-case is different to what dataclasses are for, and so compatibility probably doesn't make sense anyway.

@aaronchall

This comment has been minimized.

Show comment
Hide comment
@aaronchall

aaronchall Dec 5, 2017

I totally think slots should be default behavior.

(Disclaimer - I gave the Pycon 2017 slots talk: https://www.youtube.com/watch?v=N7MfisN44nY and I had the latest contribution to the datamodel docs on __slots__)

To break it down: slots add a data descriptor to the class that points to a slot in a struct-like datastructure. They get accessed pretty fast, and they take much less space than even the new smaller dict (like a tuple amount of space). It should be easy to programmatically determine if they should be added in the child or not. This should be a strictly dominant addition. But adding it later could break backwards compatibility if users start making the unfortunate decision to assume access to __dict__ directly or via vars.

Here's some finer points relevant to the dataclasses, as I see it:

  • it can be ok for parents to have __dict__ - that just allows __dict__ to be created if accessed for a child. (Same for __weakref__.) Basically, even if the parent allows __dict__, as long as the child implementation only uses the correct slotted attributes, __dict__ isn't created. But I can see the value in warning/erroring for the case where users typo the attribute. We could have an argument, like no__dict__=True that would ensure there's no slot for a __dict__, or maybe allow__dict__ for the opposite.
  • we should check for redundant slots (if a parent and a child declare the same slot, the docs say the meaning of the program is undefined - basically, in my mental model, the parent has a slot descriptor that points to one place, and the child has a slot descriptor that points to another place - a contradiction) - so this should be raising an error on redundant slots. (My talk even misses this point facepalm.)

aaronchall commented Dec 5, 2017

I totally think slots should be default behavior.

(Disclaimer - I gave the Pycon 2017 slots talk: https://www.youtube.com/watch?v=N7MfisN44nY and I had the latest contribution to the datamodel docs on __slots__)

To break it down: slots add a data descriptor to the class that points to a slot in a struct-like datastructure. They get accessed pretty fast, and they take much less space than even the new smaller dict (like a tuple amount of space). It should be easy to programmatically determine if they should be added in the child or not. This should be a strictly dominant addition. But adding it later could break backwards compatibility if users start making the unfortunate decision to assume access to __dict__ directly or via vars.

Here's some finer points relevant to the dataclasses, as I see it:

  • it can be ok for parents to have __dict__ - that just allows __dict__ to be created if accessed for a child. (Same for __weakref__.) Basically, even if the parent allows __dict__, as long as the child implementation only uses the correct slotted attributes, __dict__ isn't created. But I can see the value in warning/erroring for the case where users typo the attribute. We could have an argument, like no__dict__=True that would ensure there's no slot for a __dict__, or maybe allow__dict__ for the opposite.
  • we should check for redundant slots (if a parent and a child declare the same slot, the docs say the meaning of the program is undefined - basically, in my mental model, the parent has a slot descriptor that points to one place, and the child has a slot descriptor that points to another place - a contradiction) - so this should be raising an error on redundant slots. (My talk even misses this point facepalm.)
@dan-blanchard

This comment has been minimized.

Show comment
Hide comment
@dan-blanchard

dan-blanchard Jun 22, 2018

Without slots, the usability of data classes is really limited. When I would want to use something like this, it is almost always in a situation where I will have many instances of the same simple data points. Without __slots__, that becomes untenable memory-wise. It's interesting that you can combine the two approaches when you don't set defaults, but the defaults are part of what make this useful in the first place.

dan-blanchard commented Jun 22, 2018

Without slots, the usability of data classes is really limited. When I would want to use something like this, it is almost always in a situation where I will have many instances of the same simple data points. Without __slots__, that becomes untenable memory-wise. It's interesting that you can combine the two approaches when you don't set defaults, but the defaults are part of what make this useful in the first place.

@ericvsmith

This comment has been minimized.

Show comment
Hide comment
@ericvsmith

ericvsmith Jun 22, 2018

Owner

You use code like @add_slots from https://github.com/ericvsmith/dataclasses/blob/master/dataclass_tools.py

>>> from dataclasses import *
>>> from dataclass_tools import *
>>> @add_slots
... @dataclass
... class C:
...    i: int = 10
...
>>> c=C()
>>> c
C(i=10)
>>> c.__slots__
('i',)
>>> c.j=0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'C' object has no attribute 'j'
>>>

The reason this isn't in dataclasses itself is because all other features just involve adding methods to your class. __slots__ requires creating a new class, because @dataclass doesn't get control until after the class has been created, at which point it is too late to set __slots__.

There are a few possible approaches here:

  • As outlined above, we could redesign how slots work to make it possible to add __slots__ after class creation. However, this would be difficult or maybe impossible.
  • Add a slots=True to @dataclass, which would create and return a new class. attrs takes this approach. We'd want to make sure the implications of this are understood by the users.
  • Add @add_slots to dataclasses.py.
  • Something else I haven't thought of.

I suggest taking this to python-ideas if you'd like to champion one of these ideas.

Owner

ericvsmith commented Jun 22, 2018

You use code like @add_slots from https://github.com/ericvsmith/dataclasses/blob/master/dataclass_tools.py

>>> from dataclasses import *
>>> from dataclass_tools import *
>>> @add_slots
... @dataclass
... class C:
...    i: int = 10
...
>>> c=C()
>>> c
C(i=10)
>>> c.__slots__
('i',)
>>> c.j=0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'C' object has no attribute 'j'
>>>

The reason this isn't in dataclasses itself is because all other features just involve adding methods to your class. __slots__ requires creating a new class, because @dataclass doesn't get control until after the class has been created, at which point it is too late to set __slots__.

There are a few possible approaches here:

  • As outlined above, we could redesign how slots work to make it possible to add __slots__ after class creation. However, this would be difficult or maybe impossible.
  • Add a slots=True to @dataclass, which would create and return a new class. attrs takes this approach. We'd want to make sure the implications of this are understood by the users.
  • Add @add_slots to dataclasses.py.
  • Something else I haven't thought of.

I suggest taking this to python-ideas if you'd like to champion one of these ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment