-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support __slots__? #28
Comments
|
I think we should allow |
|
I propose to punt this down the road. If people want slots they can
manually add `__slots__ = ('x', 'y', 'z')` to their class.
Regarding whether people would be surprised by the need to generate a new
class, *I* was surprised, and I built slots. :-)
In the future we can choose any of the other options. I would be fine with
eventually getting back `slots=True` and only generating a new class if
that's given. (FWIW it should probably complain if any base class has a
`__dict__` -- that's a common error case.)
In the meantime people can also use NamedTuple if they just want slots.
|
Issue #28: Remove support for setting slots. Support might be added back at a later date.
|
Agreed. I removed slots in PR #30. The git tag last-version-with-slots points to the code where slots was working. |
|
@ericvsmith Adding You're likely already aware of this, but I'm letting you know on the small chance it got missed. (My interest in this is making dataclasses work with my "autoslot" toy class which injects slots into the class definition via a metaclass-enabled superclass: https://github.com/cjrh/autoslot. To make it compatible with |
|
Thinking it over, I think my use-case is different to what dataclasses are for, and so compatibility probably doesn't make sense anyway. |
|
I totally think slots should be default behavior. (Disclaimer - I gave the Pycon 2017 slots talk: https://www.youtube.com/watch?v=N7MfisN44nY and I had the latest contribution to the datamodel docs on To break it down: slots add a data descriptor to the class that points to a slot in a struct-like datastructure. They get accessed pretty fast, and they take much less space than even the new smaller dict (like a tuple amount of space). It should be easy to programmatically determine if they should be added in the child or not. This should be a strictly dominant addition. But adding it later could break backwards compatibility if users start making the unfortunate decision to assume access to Here's some finer points relevant to the dataclasses, as I see it:
|
|
Without slots, the usability of data classes is really limited. When I would want to use something like this, it is almost always in a situation where I will have many instances of the same simple data points. Without |
|
You use code like The reason this isn't in There are a few possible approaches here:
I suggest taking this to python-ideas if you'd like to champion one of these ideas. |
|
I like idea of adding |
|
It's very wasteful to have a struct-like data holder class, which relies on a bloated dynamic dictionary for storage. The slots behavior should be the only behavior and dict should be banished from dataclasses. Seriously. But okay, if we manually add |
|
I just saw https://www.youtube.com/watch?v=T-TwcmT6Rcw on YouTube and it ends with saying that yes you can manually add slots to dataclasses. But I have decided to use attrs instead. This comment from YouTube sums it up well:
I agree. Sure, you can add slots manually to dataclasses, but then you lose default values, and you have to manually write each variable name in the slots list. Ew. And the dataclass instance with manually written slots was only 8 bytes smaller than the equivalent attrs instance, which can be explained by attrs metadata variables or something like that, and isn't much extra RAM to pay for all the huge benefits of attrs. |
import attr
from dataclasses import dataclass
from pympler import asizeof
import time
# every additional field adds 88 bytes
@attr.s
class A:
a = attr.ib(type=int, default=0)
b = attr.ib(type=int, default=4)
c = attr.ib(type=int, default=2)
d = attr.ib(type=int, default=8)
# every additional field adds 40 bytes
@attr.s(slots=True)
class B:
a = attr.ib(type=int, default=0)
b = attr.ib(type=int, default=4)
c = attr.ib(type=int, default=2)
d = attr.ib(type=int, default=8)
# every additional field adds 88 bytes
@dataclass
class C:
a: int = 0
b: int = 4
c: int = 2
d: int = 8
# every additional field adds 40 bytes
@dataclass
class D:
__slots__ = {"a", "b", "c", "d"}
a: int
b: int
c: int
d: int
Ainst = A()
Binst = B()
Cinst = C()
Dinst = D(0,4,2,8)
print("attrs size", asizeof.asizeof(Ainst)) # 512 bytes
print("attrs-with-slots size", asizeof.asizeof(Binst)) # 200 bytes
print("dataclass size", asizeof.asizeof(Cinst)) # 512 bytes
print("dataclass-with-slots size", asizeof.asizeof(Dinst)) # 192 bytes
s = time.perf_counter()
for i in range(0,250000000):
x = Ainst.a
elapsed = time.perf_counter() - s
print("elapsed attrs:", (elapsed*1000), "milliseconds")
s = time.perf_counter()
for i in range(0,250000000):
x = Binst.a
elapsed = time.perf_counter() - s
print("elapsed attrs-with-slots:", (elapsed*1000), "milliseconds")
s = time.perf_counter()
for i in range(0,250000000):
x = Cinst.a
elapsed = time.perf_counter() - s
print("elapsed dataclass:", (elapsed*1000), "milliseconds")
s = time.perf_counter()
for i in range(0,250000000):
x = Dinst.a
elapsed = time.perf_counter() - s
print("elapsed dataclass-with-slots:", (elapsed*1000), "milliseconds")Results: Slots win heavily in the memory usage department, regardless of whether you use dataclass or attrs. And dataclass with manually written slots reduce total usage by 8 bytes (static number, does not change based on how many fields the class has) compared to attrs-with-slots. But dataclass loses with its lack of features, lack of default values if slots are used, and tedious way to write slots manually (see class "D"). As for data access benchmarks: The result varied too much between runs to draw any conclusions except to say that slots was slightly faster than dictionary-based storage. And that there's no real difference between the dataclass and attrs libraries in access-speed. |
|
I suggest you raise this issue on the python-ideas mailing list. This tracker is just for the backport of dataclasses features to Python 3.6 (which admittedly I'm behind on, but I'll get to it). When this issue is on python-ideas, I'll post my thoughts there. |
|
@ericvsmith Ah I didn't realize that. I'll post on the mailing list. |
|
For what it's worth, there is also class Employee(NamedTuple):
name: str
id: int = 3 |
Currently the draft PEP specifies and the code supports the optional ability to add
__slots__. This is the one place where@dataclasscannot just modify the given class and return it: because__slots__must be specified at class creation time, it's too late by the time thedataclassdecorator gets control. The current approach is to dynamically generate a new class while setting__slots__in the new class and copying over other class attributes. The decorator then returns the new class.The question is: do we even want to support setting
__slots__? Is having__slots__important enough to have this deviation from the "we just add a few dunder methods to your class" behavior?I see three options:
@dataclass(slots=True)returning a new class.__slots__.@add_slots, which takes a data class and creates a new class with__slots__set.I think we should either go with 2 or 3. I don't mind not supporting
__slots__, but if we do want to support it, I think it's easier to explain with a separate decorator.It would be an error to use
@add_slotson a non-dataclass class.The text was updated successfully, but these errors were encountered: