Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code objects, function objects and generator object contain quite a lot of redundant information #100719

Closed
markshannon opened this issue Jan 3, 2023 · 3 comments
Labels
performance Performance or resource usage

Comments

@markshannon
Copy link
Member

markshannon commented Jan 3, 2023

There is a quite a lot of redundancy in code, function and generator objects.

Intra-object redundancy

  1. Code objects have four fields ,co_nlocalsplus, co_nplaincellvars, co_nlocals, co_nfreevars. Any of these fields can be computed from the other three.
  2. Code objects have a qualified name and a name. The name is always the suffix of the qualified name. Changing this to qualifying prefix and name would save space and allow sharing.
  3. The defaults and keyword defaults for a function are separate and the keyword defaults are a dict. They could be combined into a single array.
  4. Generator objects have a gi_code field, which is redundant, as the frame contains a reference to the code object.

Inter-object redundancy

  1. Functions and generators have qualified name and name fields, which are almost always the same as the underlying code object. These should be lazily initialized

Linked PRs

@stevendaprano
Copy link
Member

I have no opinion on whether these changes are good or bad, but they are surely breaking changes. Code objects aren't well documented, but they're not private implementation details. How do you plan to manage these changes?

@markshannon
Copy link
Member Author

The PyCodeObject struct is in the Include/cpython folder. So while it is not entirely private, we are allowed to change it.

@terryjreedy
Copy link
Member

@stevendaprano The python-level doc for 'internal types', including code objects, https://docs.python.org/3/reference/datamodel.html#codeobject says

Their definitions may change with future versions of the interpreter, but they are mentioned here for completeness.

'co_nplaincellvars' is not even mentioned in the 3.11 docs.

generator objects are not listed here and their internal gi_xxx methods, as opposed to __next__, send, and throw, are not indexed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

3 participants