-
-
Notifications
You must be signed in to change notification settings - Fork 29.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pyc file with a large tuple is ~5x bigger in 3.11+ than in 3.9 #109036
Comments
@carljm for generating the list of pyc sizes for versions I didn't have installed. |
Looks like one factor here is a compiler change from 3.9 to 3.10 that limits stack size used and thus switches away from a giant |
It does look like |
I was able to mitigate this issue at Meta by making the generated tuple sparse in the .py and at import time expanding it with the |
Summary: In python 3.10 large structs are very expensive in bytecode compilation resulting in overly large pyc files and lots of memory regression. see: python/cpython#109036 They are large because they are expanded to contain skipped thrift id's To solve this we move to a sparse form of thrift_spec in the generated py source and expand at import time. This has shown to provide the following savings in skynet. ``` 80% reduction in RSS for 3.10 and some win for 3.8 Generated Code is much smaller also Before fix py3.8 RSS = 161672, Generated Code + pyc size 162M py3.10 RSS = 681720, Generated Code + pyc size 461M After Fix py3.8 RSS = 146700, Generated Code + pyc size 90M py3.10 RSS = 135260, Generated Code + pyc size 80M ``` Reviewed By: junzh0u Differential Revision: D49041314 fbshipit-source-id: 2e3d84650a4b463a9431eea3cada12e6e0cb7037
Summary: In python 3.10 large structs are very expensive in bytecode compilation resulting in overly large pyc files and lots of memory regression. see: python/cpython#109036 They are large because they are expanded to contain skipped thrift id's To solve this we move to a sparse form of thrift_spec in the generated py source and expand at import time. This has shown to provide the following savings in skynet. ``` 80% reduction in RSS for 3.10 and some win for 3.8 Generated Code is much smaller also Before fix py3.8 RSS = 161672, Generated Code + pyc size 162M py3.10 RSS = 681720, Generated Code + pyc size 461M After Fix py3.8 RSS = 146700, Generated Code + pyc size 90M py3.10 RSS = 135260, Generated Code + pyc size 80M ``` Reviewed By: junzh0u Differential Revision: D49041314 fbshipit-source-id: 2e3d84650a4b463a9431eea3cada12e6e0cb7037
Since the massive regression in 3.10 is already fixed, and the remaining regression here in 3.11+ looks to be explained by precise error locations (which were accepted knowing they would increase pyc file size), and there's no specific proposal here to further reduce pyc file sizes while maintaining functionality, I'm closing this issue. |
Bug description:
The code to produce the regression is large see: https://gist.github.com/fried/75fc3e3477634927444693329444648c for the best example I could come up with that I could share.
Basically we have this old code generator for thrift that generates these things like DataClasses for use in serialization and rpc. Well there is a C level accelerator that helps to serialize and desalinize the structures so it generates a runtime available form of the rpc/field specs that we can pass into the accelerator. The form is an array where the array position is the field id in thrift. If a field id is skipped in the thrift specification you put a None in the empty positions in the array. This works well but there are some edge cases where structs skip large number of field ids, say going from field id 5 to field id 9000. You do this enough in a thrift file and you have many large tuples. To make a form of this I could share I just have one tuple with 20k entries.
Looking at the pyc sizes you can see there is a serious issue in 3.10
This is directly translating to Memory usage of the imported pyc. I believe because in 3.10 the bytecode creates the tuples as a list then convert them to a tuple at import with LIST_TO_TUPLE. So we need the memory for a list and a tuple to exist at the same time so we see around a 2.8x memory regression in real world examples.
CPython versions tested on:
3.8, 3.9, 3.10, 3.11, 3.12, CPython main branch
Operating systems tested on:
Linux, macOS
The text was updated successfully, but these errors were encountered: