Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Speed up setting and deleting mutable attributes on non-dataclass subclasses of frozen dataclasses #102578

Closed
XuehaiPan opened this issue Mar 10, 2023 · 3 comments · Fixed by #102573
Assignees
Labels
3.12 bugs and security fixes performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@XuehaiPan
Copy link
Contributor

XuehaiPan commented Mar 10, 2023

Feature or enhancement

The dataclasses library provides an easy way to create classes. The library will automatically generate relevant methods for the users.

Creating dataclasses with argument frozen=True will automatically generate methods __setattr__ and __delattr__ in _frozen_get_del_attr.

This issue proposes to change the tuple-based lookup to set-based lookup. Reduce the time complexity from $O(n)$ to $O(1)$.

In [1]: # tuple-based

In [2]: %timeit 'a' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
9.91 ns ± 0.0982 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

In [3]: %timeit 'd' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
33.2 ns ± 0.701 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [4]: %timeit 'g' in ('a', 'b', 'c', 'd', 'e', 'f', 'g')
56.4 ns ± 0.818 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [5]: # set-based

In [6]: %timeit 'a' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11.3 ns ± 0.0723 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

In [7]: %timeit 'd' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11 ns ± 0.106 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

In [8]: %timeit 'g' in {'a', 'b', 'c', 'd', 'e', 'f', 'g'}
11.1 ns ± 0.126 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)

A tiny benchmark script:

from contextlib import suppress
from dataclasses import FrozenInstanceError, dataclass

@dataclass(frozen=True)
class Foo2:
    a: int
    b: int

foo2 = Foo2(1, 2)

def bench2(inst):
    with suppress(FrozenInstanceError):
        inst.a = 0
    with suppress(FrozenInstanceError):
        inst.b = 0

@dataclass(frozen=True)
class Foo7:
    a: int
    b: int
    c: int
    d: int
    e: int
    f: int
    g: int

foo7 = Foo7(1, 2, 3, 4, 5, 6, 7)

def bench7(inst):
    with suppress(FrozenInstanceError):
        inst.a = 0
    with suppress(FrozenInstanceError):
        inst.b = 0
    with suppress(FrozenInstanceError):
        inst.c = 0
    with suppress(FrozenInstanceError):
        inst.d = 0
    with suppress(FrozenInstanceError):
        inst.e = 0
    with suppress(FrozenInstanceError):
        inst.f = 0
    with suppress(FrozenInstanceError):
        inst.g = 0

class Bar(Foo7):
    def __init__(self, a, b, c, d, e, f, g):
        super().__init__(a, b, c, d, e, f, g)
        self.baz = 0

def bench(inst):
    inst.baz = 1

Result:

set-based lookup:

In [2]: %timeit bench2(foo2)
1.08 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [3]: %timeit bench7(foo7)
3.81 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [4]: %timeit bench(bar)
249 ns ± 6.31 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

tuple-based lookup (original):

In [2]: %timeit bench2(foo2)
1.15 µs ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [3]: %timeit bench7(foo7)
3.97 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [4]: %timeit bench(bar)
269 ns ± 4.09 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Result:

`set`-based lookup:

```python
In [2]: %timeit bench2(foo2)
1.08 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [3]: %timeit bench7(foo7)
3.81 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

tuple-based lookup (original):

In [2]: %timeit bench2(foo2)
1.15 µs ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [3]: %timeit bench7(foo7)
3.97 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

The set-based is constantly faster than the old approach. And the theoretical time complexity is also smaller ($O(1)$ vs. $O(n)$).

Ref: #102573

Pitch

(Explain why this feature or enhancement should be implemented and how it would be used.
Add examples, if applicable.)

In the autogenerate __setattr__ and __delattr__, they have a sanity check at the beginning of the method. For example:

def __setattr__(self, name, value):
    if type(self) is {{UserType}} or name in ({{a tuple of field names}}):
        raise FrozenInstanceError(f"cannot assign to field {name!r}")
    super(cls, self).__setattr__(name, value)

If someone inherits the frozen dataclass, the sanity check will take $O(n)$ time on the tuple__contains__(...) and finally calls super().__setattr__(...). For example:

@dataclass(frozen=True)
class FrozenBase:
    x: int
    y: int
    ... # N_FIELDS

class Foo(FrozenBase):
    def __init__(self, x, y, somevalue, someothervalue):
        super().__init__(x, y)
        self.somevalue = somevalue            # takes O(N_FIELDS)
        self.someothervalue = someothervalue  # takes O(N_FIELDS) time again

foo = Foo(1, 2, 3, 4)
foo.extravalue = extravalue  # takes O(N_FIELDS) time again

Previous discussion

N/A.

Linked PRs

@XuehaiPan XuehaiPan added the type-feature A feature request or enhancement label Mar 10, 2023
@AlexWaygood AlexWaygood added stdlib Python modules in the Lib dir performance Performance or resource usage labels Mar 10, 2023
@AlexWaygood AlexWaygood changed the title [Enhancement] Speed up sanity check for __setattr__ and __delattr__ for frozen dataclasses [Enhancement] Speed up setting and deleting mutable attributes on non-dataclass subclasses of frozen dataclasses Mar 10, 2023
@AlexWaygood
Copy link
Member

Here is a benchmark that seems more reflective of a real-world use case, that shows a clear speedup from this proposed change:

import dataclasses
import string
import time

Foo = dataclasses.make_dataclass(
    "Foo",
    [(letter, int) for letter in string.ascii_lowercase],
    frozen=True
)

class Bar(Foo): ...

instance = Bar(*range(26))
t0 = time.perf_counter()
for _ in range(10_000_000):
    instance.foo = 1
    del instance.foo
print(f"{time.perf_counter() - t0:.2f}")

@AlexWaygood AlexWaygood added the 3.12 bugs and security fixes label Mar 10, 2023
@ericvsmith
Copy link
Member

The idea seems reasonable. I'll look at the PR.

@ericvsmith ericvsmith self-assigned this Mar 10, 2023
ericvsmith pushed a commit that referenced this issue Mar 11, 2023
@ericvsmith
Copy link
Member

Thanks, @XuehaiPan !

iritkatriel pushed a commit to iritkatriel/cpython that referenced this issue Mar 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes performance Performance or resource usage stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
3 participants