-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize function annotation #86368
Comments
Look this example: code:
Four Annotation information can be stored in some compact form. And creating annotation dict can be postponed to when Ideas for the compact form:
|
Are annotations now always known at compile time? As for representation, it can also be a sequence of pairs (('x', 'int'), ('z', 'float'), ('return', 'Hoge')) or a pair of sequences (('x', 'z', 'return'), ('int', 'float', 'Hoge')). It would be better to save a dict directly in pyc files, but it needs changing the marshal protocol. Also, it makes sense to make annotations attribute of the code object, so avoid the overhead at function creation time. I have a dream to split the pyc file into several files or sections and save docstrings and annotations (and maybe line numbers) separately from the main code. They should be loaded by demand, when you read __doc__ or __annotation__. Most code does not use them at run time, so we can save memory and loading time. It can also help with internationalization. |
I like the 1st option which uses a tuple |
@serhiy race condition sorry ;) |
Yes, because
Yes, but it is bit larger than my single tuple idea in most cases.
I am not sure this is the best option because there are many code object without annotation.
I have same dream. |
Yes, but the code for creating a dict can be simpler. In any case we will better see what format is better when try to write a code.
In this case it can be None or NULL. I like your idea. It is easy to implement it now. Later we can make annotations an attribute of the code object. |
Note that many annotations are not accessed. RAM usage of annotation information is important than how easy to create dict. I don't like Please use ('x', 'int', 'z', 'float', 'return', 'Hoge') instead. |
For top level functions (functions created once) this isn't going to make any real difference. There might be a small speedup for function creation, but it isn't going to be measurable. For nested functions with annotations, where many functions are created from a single code object, this could be worthwhile. However, before we add yet another attribute to code objects, I'd like to see some evidence of a speedup. |
I have just implemented Benchmark simply run Results:
|
Yurii, I don't believe that benchmark measures what you need to measure (once imported module is kept imported forever until unloaded, so successive imports are no-ops). See how the side effects of importing bbb only happen once: % cat bbb.py % time python -m timeit "import bbb" % cat bbb.log |
If you want to measure import time, use python -m timeit -s "from sys import modules; modules_copy = modules.copy()" "import black; modules.clear(); modules.update(modules_copy)" But I would be surprised to see significant difference in this case. What Mark means, measure the time of creation of nested function. python -m timeit "def f(a: int, b: str) -> None: pass" And maybe test with different number of arguments if there is a difference. |
I have run tests with different types of function declaration. A function declaration with annotations is more than 2 times faster with the co_annotatins feature. If function doesn't have annotations time almost same as without co_annotatins feature. Results:
|
I don't like co_annotations.
func.__annotations__ = ('x', 'int', 'z', 'float', 'return', 'Hoge') is much better because:
|
Inada, I totally agree with you. Sorry, I didn't realize all pitfalls with extra field to codeobject. New implementation with annotations representation as a single tuple doesn't require a lot to change to the existing codebase. And I have already done it. I rerun all benchmarks and there is no performance degradation in a case when the function doesn't have annotations and it's more than 2 times faster when the function has annotations. Benchmark results:
|
I believe this change accidentally affected the API of PyFunction_GetAnnotations: previously it would only return dict or NULL, now it can also return a tuple. See bpo-46236 |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: