[Enhancement] Speed up setting and deleting mutable attributes on non-dataclass subclasses of frozen dataclasses #102578

XuehaiPan · 2023-03-10T13:16:47Z

Feature or enhancement

The dataclasses library provides an easy way to create classes. The library will automatically generate relevant methods for the users.

Creating dataclasses with argument frozen=True will automatically generate methods __setattr__ and __delattr__ in _frozen_get_del_attr.

This issue proposes to change the tuple-based lookup to set-based lookup. Reduce the time complexity from $O(n)$ to $O(1)$.

In [1]: # tuple-based

In [2]: %timeit 'a' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
9.91 ns ± 0.0982 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

In [3]: %timeit 'd' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
33.2 ns ± 0.701 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [4]: %timeit 'g' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
56.4 ns ± 0.818 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [5]: # set-based

In [6]: %timeit 'a' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11.3 ns ± 0.0723 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

In [7]: %timeit 'd' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11 ns ± 0.106 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

In [8]: %timeit 'g' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11.1 ns ± 0.126 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

A tiny benchmark script:

from contextlib import suppress
from dataclasses import FrozenInstanceError, dataclass

@dataclass(frozen=True)
class Foo2:
    a: int
    b: int

foo2 = Foo2(1, 2)

def bench2(inst):
    with suppress(FrozenInstanceError):
        inst.a = 0
    with suppress(FrozenInstanceError):
        inst.b = 0

@dataclass(frozen=True)
class Foo7:
    a: int
    b: int
    c: int
    d: int
    e: int
    f: int
    g: int

foo7 = Foo7(1, 2, 3, 4, 5, 6, 7)

def bench7(inst):
    with suppress(FrozenInstanceError):
        inst.a = 0
    with suppress(FrozenInstanceError):
        inst.b = 0
    with suppress(FrozenInstanceError):
        inst.c = 0
    with suppress(FrozenInstanceError):
        inst.d = 0
    with suppress(FrozenInstanceError):
        inst.e = 0
    with suppress(FrozenInstanceError):
        inst.f = 0
    with suppress(FrozenInstanceError):
        inst.g = 0

class Bar(Foo7):
    def __init__(self, a, b, c, d, e, f, g):
        super().__init__(a, b, c, d, e, f, g)
        self.baz = 0

def bench(inst):
    inst.baz = 1

Result:

set-based lookup:

In [2]: %timeit bench2(foo2)
1.08 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [3]: %timeit bench7(foo7)
3.81 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [4]: %timeit bench(bar)
249 ns ± 6.31 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

tuple-based lookup (original):

In [2]: %timeit bench2(foo2)
1.15 µs ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [3]: %timeit bench7(foo7)
3.97 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [4]: %timeit bench(bar)
269 ns ± 4.09 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


Result:

`set`-based lookup:

```python
In [2]: %timeit bench2(foo2)
1.08 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [3]: %timeit bench7(foo7)
3.81 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

tuple-based lookup (original):

In [2]: %timeit bench2(foo2)
1.15 µs ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [3]: %timeit bench7(foo7)
3.97 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

The set-based is constantly faster than the old approach. And the theoretical time complexity is also smaller ($O(1)$ vs. $O(n)$).

Ref: #102573

Pitch

(Explain why this feature or enhancement should be implemented and how it would be used.
Add examples, if applicable.)

In the autogenerate __setattr__ and __delattr__, they have a sanity check at the beginning of the method. For example:

def __setattr__(self, name, value):
    if type(self) is {{UserType}} or name in ({{a tuple of field names}}):
        raise FrozenInstanceError(f"cannot assign to field {name!r}")
    super(cls, self).__setattr__(name, value)

If someone inherits the frozen dataclass, the sanity check will take $O(n)$ time on the tuple__contains__(...) and finally calls super().__setattr__(...). For example:

@dataclass(frozen=True)
class FrozenBase:
    x: int
    y: int
    ... # N_FIELDS

class Foo(FrozenBase):
    def __init__(self, x, y, somevalue, someothervalue):
        super().__init__(x, y)
        self.somevalue = somevalue            # takes O(N_FIELDS)
        self.someothervalue = someothervalue  # takes O(N_FIELDS) time again

foo = Foo(1, 2, 3, 4)
foo.extravalue = extravalue  # takes O(N_FIELDS) time again

Previous discussion

N/A.

Linked PRs

gh-102578: Optimise setting and deleting mutable attributes on non-dataclass subclasses of frozen dataclasses #102573

The text was updated successfully, but these errors were encountered:

AlexWaygood · 2023-03-10T14:42:09Z

Here is a benchmark that seems more reflective of a real-world use case, that shows a clear speedup from this proposed change:

import dataclasses
import string
import time

Foo = dataclasses.make_dataclass(
    "Foo",
    [(letter, int) for letter in string.ascii_lowercase],
    frozen=True
)

class Bar(Foo): ...

instance = Bar(*range(26))
t0 = time.perf_counter()
for _ in range(10_000_000):
    instance.foo = 1
    del instance.foo
print(f"{time.perf_counter() - t0:.2f}")

ericvsmith · 2023-03-10T15:37:31Z

The idea seems reasonable. I'll look at the PR.

…taclass subclasses of frozen dataclasses (gh-102573)

ericvsmith · 2023-03-11T00:21:59Z

Thanks, @XuehaiPan !

…non-dataclass subclasses of frozen dataclasses (pythongh-102573)

XuehaiPan added the type-feature A feature request or enhancement label Mar 10, 2023

bedevere-bot mentioned this issue Mar 10, 2023

gh-102578: Optimise setting and deleting mutable attributes on non-dataclass subclasses of frozen dataclasses #102573

Merged

AlexWaygood added stdlib Python modules in the Lib dir performance Performance or resource usage labels Mar 10, 2023

AlexWaygood changed the title ~~[Enhancement] Speed up sanity check for __setattr__ and __delattr__ for frozen dataclasses~~ [Enhancement] Speed up setting and deleting mutable attributes on non-dataclass subclasses of frozen dataclasses Mar 10, 2023

AlexWaygood added the 3.12 bugs and security fixes label Mar 10, 2023

ericvsmith self-assigned this Mar 10, 2023

ericvsmith closed this as completed in #102573 Mar 11, 2023

ericvsmith pushed a commit that referenced this issue Mar 11, 2023

gh-102578: Optimise setting and deleting mutable attributes on non-da…

ee6f841

…taclass subclasses of frozen dataclasses (gh-102573)

iritkatriel pushed a commit to iritkatriel/cpython that referenced this issue Mar 12, 2023

pythongh-102578: Optimise setting and deleting mutable attributes on …

b85d003

…non-dataclass subclasses of frozen dataclasses (pythongh-102573)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] Speed up setting and deleting mutable attributes on non-dataclass subclasses of frozen dataclasses #102578

[Enhancement] Speed up setting and deleting mutable attributes on non-dataclass subclasses of frozen dataclasses #102578

XuehaiPan commented Mar 10, 2023 •

edited

AlexWaygood commented Mar 10, 2023

ericvsmith commented Mar 10, 2023

ericvsmith commented Mar 11, 2023

[Enhancement] Speed up setting and deleting mutable attributes on non-dataclass subclasses of frozen dataclasses #102578

[Enhancement] Speed up setting and deleting mutable attributes on non-dataclass subclasses of frozen dataclasses #102578

Comments

XuehaiPan commented Mar 10, 2023 • edited

Feature or enhancement

Pitch

Previous discussion

Linked PRs

AlexWaygood commented Mar 10, 2023

ericvsmith commented Mar 10, 2023

ericvsmith commented Mar 11, 2023

XuehaiPan commented Mar 10, 2023 •

edited