Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support __slots__? #28

Closed
ericvsmith opened this issue Sep 7, 2017 · 15 comments

Comments

@ericvsmith
Copy link
Owner

@ericvsmith ericvsmith commented Sep 7, 2017

Currently the draft PEP specifies and the code supports the optional ability to add __slots__. This is the one place where @dataclass cannot just modify the given class and return it: because __slots__ must be specified at class creation time, it's too late by the time the dataclass decorator gets control. The current approach is to dynamically generate a new class while setting __slots__ in the new class and copying over other class attributes. The decorator then returns the new class.

The question is: do we even want to support setting __slots__? Is having __slots__ important enough to have this deviation from the "we just add a few dunder methods to your class" behavior?

I see three options:

  1. Leave it as-is, with @dataclass(slots=True) returning a new class.
  2. Completely remove support for setting __slots__.
  3. Add a different decorator, say @add_slots, which takes a data class and creates a new class with __slots__ set.

I think we should either go with 2 or 3. I don't mind not supporting __slots__, but if we do want to support it, I think it's easier to explain with a separate decorator.

@add_slots
@dataclass
class C:
    x: int
    y: int

It would be an error to use @add_slots on a non-dataclass class.

@ilevkivskyi

This comment has been minimized.

Copy link
Contributor

@ilevkivskyi ilevkivskyi commented Sep 7, 2017

I think we should allow __slots__. Although they are not mainstream, they are still used. I am however not sure about API that we should use. I think @add_slots still sounds like you patch an existing class. Maybe call it @with_slots? Finally, maybe we still can use single decorator, but call the keyword with_slots to distinguish it from other keywords? My point is that people who will use with_slots are probably familiar with how slots work, so that they will not be surprised this option returns a new class.

@gvanrossum

This comment has been minimized.

Copy link

@gvanrossum gvanrossum commented Sep 7, 2017

ericvsmith added a commit that referenced this issue Sep 7, 2017
ericvsmith added a commit that referenced this issue Sep 7, 2017
Issue #28: Remove support for setting slots. Support might be added back at a later date.
@ericvsmith

This comment has been minimized.

Copy link
Owner Author

@ericvsmith ericvsmith commented Sep 7, 2017

Agreed. I removed slots in PR #30. The git tag last-version-with-slots points to the code where slots was working.

@ericvsmith ericvsmith closed this Sep 7, 2017
@cjrh

This comment has been minimized.

Copy link

@cjrh cjrh commented Oct 13, 2017

@ericvsmith Adding __slots__ manually works as long as there are no defaults:

>>> @dataclass
... class C:
...     __slots__ = {'x', 'y'}
...     x: int
...     y: int
...     
>>> o = C(1,2)
>>> o
C(x=1, y=2)
>>> @dataclass
... class C:
...     __slots__ = {'x', 'y'}
...     x: int
...     y: int = 1
...     
Traceback (most recent call last):
  File "<input>", line 1, in <module>
ValueError: 'y' in __slots__ conflicts with class variable

You're likely already aware of this, but I'm letting you know on the small chance it got missed.

(My interest in this is making dataclasses work with my "autoslot" toy class which injects slots into the class definition via a metaclass-enabled superclass: https://github.com/cjrh/autoslot. To make it compatible with @dataclass, Inside my metaclass I can look for __annotations__ in the cls namespace, and that works fine, but I can't get around the class problem in the traceback above.)

@cjrh

This comment has been minimized.

Copy link

@cjrh cjrh commented Oct 13, 2017

Thinking it over, I think my use-case is different to what dataclasses are for, and so compatibility probably doesn't make sense anyway.

@aaronchall

This comment has been minimized.

Copy link

@aaronchall aaronchall commented Dec 5, 2017

I totally think slots should be default behavior.

(Disclaimer - I gave the Pycon 2017 slots talk: https://www.youtube.com/watch?v=N7MfisN44nY and I had the latest contribution to the datamodel docs on __slots__)

To break it down: slots add a data descriptor to the class that points to a slot in a struct-like datastructure. They get accessed pretty fast, and they take much less space than even the new smaller dict (like a tuple amount of space). It should be easy to programmatically determine if they should be added in the child or not. This should be a strictly dominant addition. But adding it later could break backwards compatibility if users start making the unfortunate decision to assume access to __dict__ directly or via vars.

Here's some finer points relevant to the dataclasses, as I see it:

  • it can be ok for parents to have __dict__ - that just allows __dict__ to be created if accessed for a child. (Same for __weakref__.) Basically, even if the parent allows __dict__, as long as the child implementation only uses the correct slotted attributes, __dict__ isn't created. But I can see the value in warning/erroring for the case where users typo the attribute. We could have an argument, like no__dict__=True that would ensure there's no slot for a __dict__, or maybe allow__dict__ for the opposite.
  • we should check for redundant slots (if a parent and a child declare the same slot, the docs say the meaning of the program is undefined - basically, in my mental model, the parent has a slot descriptor that points to one place, and the child has a slot descriptor that points to another place - a contradiction) - so this should be raising an error on redundant slots. (My talk even misses this point facepalm.)
@dan-blanchard

This comment has been minimized.

Copy link

@dan-blanchard dan-blanchard commented Jun 22, 2018

Without slots, the usability of data classes is really limited. When I would want to use something like this, it is almost always in a situation where I will have many instances of the same simple data points. Without __slots__, that becomes untenable memory-wise. It's interesting that you can combine the two approaches when you don't set defaults, but the defaults are part of what make this useful in the first place.

@ericvsmith

This comment has been minimized.

Copy link
Owner Author

@ericvsmith ericvsmith commented Jun 22, 2018

You use code like @add_slots from https://github.com/ericvsmith/dataclasses/blob/master/dataclass_tools.py

>>> from dataclasses import *
>>> from dataclass_tools import *
>>> @add_slots
... @dataclass
... class C:
...    i: int = 10
...
>>> c=C()
>>> c
C(i=10)
>>> c.__slots__
('i',)
>>> c.j=0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'C' object has no attribute 'j'
>>>

The reason this isn't in dataclasses itself is because all other features just involve adding methods to your class. __slots__ requires creating a new class, because @dataclass doesn't get control until after the class has been created, at which point it is too late to set __slots__.

There are a few possible approaches here:

  • As outlined above, we could redesign how slots work to make it possible to add __slots__ after class creation. However, this would be difficult or maybe impossible.
  • Add a slots=True to @dataclass, which would create and return a new class. attrs takes this approach. We'd want to make sure the implications of this are understood by the users.
  • Add @add_slots to dataclasses.py.
  • Something else I haven't thought of.

I suggest taking this to python-ideas if you'd like to champion one of these ideas.

@YoSTEALTH

This comment has been minimized.

Copy link

@YoSTEALTH YoSTEALTH commented May 9, 2019

I like idea of adding @dataclass(slots=True)

@VideoPlayerCode

This comment has been minimized.

Copy link

@VideoPlayerCode VideoPlayerCode commented Sep 27, 2019

It's very wasteful to have a struct-like data holder class, which relies on a bloated dynamic dictionary for storage. The slots behavior should be the only behavior and dict should be banished from dataclasses. Seriously.

But okay, if we manually add __slots__ to our classes (and do not use default values), will the resulting dataclass still work properly? Or will there be internal dataclass bugs caused by lacking a dict?

@VideoPlayerCode

This comment has been minimized.

Copy link

@VideoPlayerCode VideoPlayerCode commented Sep 27, 2019

I just saw https://www.youtube.com/watch?v=T-TwcmT6Rcw on YouTube and it ends with saying that yes you can manually add slots to dataclasses.

But I have decided to use attrs instead. This comment from YouTube sums it up well:

For a company that does not allow external packages (due to code safety reasons), use dataclasses. For everyone else, always use the attrs package. It is much better. Dataclasses is a subset of attrs. So with attrs you can do everything and more. Attrs allows auto-generating "slots" to optimize memory usage, and allows adding validators if you want, etc.

To illustrate the need, here's an example for a class with 3 attributes, on 64-bit Python 3.7.4:

  • @dataclass or @attr.s class, creates a regular dictionary to hold all instance values. The class instances generated are exactly the same size regardless of the attrs or dataclass libraries are used... 424 bytes. And every new field you add bloats each class instance by +88 bytes.

  • @attr.s(slots=True) creates a "slots" class to hold all instance values. The class instances only use 160 bytes in memory. And every new field you add increases the instance size by +40 bytes.

So forget about dataclasses. Use the attrs library with slots. It offers more features, less memory, and more speed (since slots are faster than dictionaries). What's not to love?! ;)

I agree. Sure, you can add slots manually to dataclasses, but then you lose default values, and you have to manually write each variable name in the slots list. Ew. And the dataclass instance with manually written slots was only 8 bytes smaller than the equivalent attrs instance, which can be explained by attrs metadata variables or something like that, and isn't much extra RAM to pay for all the huge benefits of attrs.

@VideoPlayerCode

This comment has been minimized.

Copy link

@VideoPlayerCode VideoPlayerCode commented Sep 27, 2019

import attr
from dataclasses import dataclass
from pympler import asizeof
import time

# every additional field adds 88 bytes
@attr.s
class A:
    a = attr.ib(type=int, default=0)
    b = attr.ib(type=int, default=4)
    c = attr.ib(type=int, default=2)
    d = attr.ib(type=int, default=8)

# every additional field adds 40 bytes
@attr.s(slots=True)
class B:
    a = attr.ib(type=int, default=0)
    b = attr.ib(type=int, default=4)
    c = attr.ib(type=int, default=2)
    d = attr.ib(type=int, default=8)

# every additional field adds 88 bytes
@dataclass
class C:
    a: int = 0
    b: int = 4
    c: int = 2
    d: int = 8

# every additional field adds 40 bytes
@dataclass
class D:
    __slots__ = {"a", "b", "c", "d"}
    a: int
    b: int
    c: int
    d: int

Ainst = A()
Binst = B()
Cinst = C()
Dinst = D(0,4,2,8)

print("attrs size", asizeof.asizeof(Ainst)) # 512 bytes

print("attrs-with-slots size", asizeof.asizeof(Binst)) # 200 bytes

print("dataclass size", asizeof.asizeof(Cinst)) # 512 bytes

print("dataclass-with-slots size", asizeof.asizeof(Dinst)) # 192 bytes

s = time.perf_counter()
for i in range(0,250000000):
    x = Ainst.a
elapsed = time.perf_counter() - s
print("elapsed attrs:", (elapsed*1000), "milliseconds")

s = time.perf_counter()
for i in range(0,250000000):
    x = Binst.a
elapsed = time.perf_counter() - s
print("elapsed attrs-with-slots:", (elapsed*1000), "milliseconds")

s = time.perf_counter()
for i in range(0,250000000):
    x = Cinst.a
elapsed = time.perf_counter() - s
print("elapsed dataclass:", (elapsed*1000), "milliseconds")

s = time.perf_counter()
for i in range(0,250000000):
    x = Dinst.a
elapsed = time.perf_counter() - s
print("elapsed dataclass-with-slots:", (elapsed*1000), "milliseconds")

Results: Slots win heavily in the memory usage department, regardless of whether you use dataclass or attrs. And dataclass with manually written slots reduce total usage by 8 bytes (static number, does not change based on how many fields the class has) compared to attrs-with-slots. But dataclass loses with its lack of features, lack of default values if slots are used, and tedious way to write slots manually (see class "D").

attrs size 512
attrs-with-slots size 200
dataclass size 512
dataclass-with-slots size 192

As for data access benchmarks: The result varied too much between runs to draw any conclusions except to say that slots was slightly faster than dictionary-based storage. And that there's no real difference between the dataclass and attrs libraries in access-speed.

@ericvsmith

This comment has been minimized.

Copy link
Owner Author

@ericvsmith ericvsmith commented Sep 27, 2019

I suggest you raise this issue on the python-ideas mailing list. This tracker is just for the backport of dataclasses features to Python 3.6 (which admittedly I'm behind on, but I'll get to it).

When this issue is on python-ideas, I'll post my thoughts there.

@VideoPlayerCode

This comment has been minimized.

Copy link

@VideoPlayerCode VideoPlayerCode commented Sep 27, 2019

@ericvsmith Ah I didn't realize that. I'll post on the mailing list.

@ciupicri

This comment has been minimized.

Copy link

@ciupicri ciupicri commented Sep 29, 2019

For what it's worth, there is also typing.NamedTuple which uses slots and you can also give a field a default value. Though fields with a default value must come after any fields without a default. Example:

class Employee(NamedTuple):
    name: str
    id: int = 3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
9 participants
You can’t perform that action at this time.