Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse code objects for similar dataclass definitions #100930

Closed
brandtbucher opened this issue Jan 11, 2023 · 5 comments
Closed

Reuse code objects for similar dataclass definitions #100930

brandtbucher opened this issue Jan 11, 2023 · 5 comments
Assignees
Labels
3.12 bugs and security fixes performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@brandtbucher
Copy link
Member

brandtbucher commented Jan 11, 2023

A little over a year ago, @dabeaz came up with a cool way of speeding up dataclass creation by avoiding unnecessary exec calls. Essentially, his proof-of-concept dataklasses module caches code objects for methods of "similarly-shaped" dataclasses, and patches them with the correct names:

https://github.com/dabeaz/dataklasses

I have a working prototype of a similar idea for the stdlib dataclasses module over in #92650. It basically doubles the speed of dataclass definitions.

CC @ericvsmith

Linked PRs

@brandtbucher brandtbucher added performance Performance or resource usage stdlib Python modules in the Lib dir 3.12 bugs and security fixes labels Jan 11, 2023
@brandtbucher brandtbucher self-assigned this Jan 11, 2023
@ssweber
Copy link

ssweber commented Jul 17, 2023

I’ve been looking at dataclasses and this interests me. What would need to happen to take this over the finish-line?

@ericvsmith
Copy link
Member

For starters, an analysis of memory usage and cache hit ratio for common dataclass usage patterns.

@ssweber
Copy link

ssweber commented Jul 17, 2023

Thanks! I’m assuming this would take some legwork, as pyperformance lacks a dataclass benchmark (if that issue is still relevant)?

I admit to not being super knowledgeable on the subject, but I like digging into topics. I’ll report back when I have more to contribute.

@brandtbucher
Copy link
Member Author

Closing in favor of #109870, which is the current direction dataclasses seems to be heading in.

@ssweber
Copy link

ssweber commented Nov 25, 2023

From my understanding, this could be complementary with the combined exec rework.

Combined exec speeds up initial creation, this could further speed up creation of subsequent similar dataclasses.

Maybe once that has landed, we could rework this if there is interest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes performance Performance or resource usage stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

3 participants