-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize namedtuple creation #72824
Comments
I surprised how functools make import time slower. When I replaced _CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"]) this line with (before) (replaced) |
I feel this patch is safe enough to be landed in 3.6. |
I doubt this deserves a change. The slow import is the case only the first time functools is imported. Later imports will just use the cache (sys.modules). And if this is gonna change, maybe we don't have to copy the entire namedtuple structure? |
Yes. But first import time is also important for CLI applications. Since many stdlib uses functools, many applications may be suffered from
https://docs.python.org/3.5/library/functools.html#functools.lru_cache The doc says it's a namedtuple. So it should be namedtuple compatible. |
The lazy import system could benefit many libs so the result could be impressive. But here only functools is enhanced, half a millisecond is reduced. Performance of course is important, but replicating code sounds not good. It means you have to maintain two pieces. |
On the other hand, implementing lazy import makes application complex. This patch only enhance functools, but it is very important one module. Half milliseconds is small, but it isn't negligible on some situation. Additionally, it make noise when profiling startup time.
Yes. Balance is important. |
What is the main culprit, importing the collections module or compiling a named tuple? |
Using namedtuple is not new in 3.6, thus this is not a regression that can be fixed at beta stage. Inlining the source of a named tuple class looks ugly solution. It would be better to write the source in separate file and import it. Makefile can have a rule for recreating this source file if collections.py is changed. More general solution would be to make namedtuple() using cached precompiled class and update the cache if it doesn't match namedtuple arguments. Yet one solution is to make namedtuple() not using compiling, but return patched local class. But Raymond is against this. |
In this time, later.
Make sense.
What "precompiled class" means? pyc file? or source string to be
I'll search the discussion. |
Here is a sample patch that make namedtuple() not using dynamic compilation. It has rough the same performance effect as inlining the named tuple source, but affects all named tuples. |
(tip) (namedtuple-no-compile.patch) Nice! |
One of problems with this patch is that it make instantiating a namedtuple much slower (due to parsing arguments by Python code). This can be solved by using eval() for creating only the __new__ method (see commented out line "result.__new__ = eval(...)"). This increases the time of creating named tuple class, but it still is faster than with current code. |
This file is derived from my namedlist project on PyPI. I've stripped out the namedlist stuff, and just left namedtuple. I've also removed the Python 2 support, and I've removed support for default parameters. After that surgery, I have not tested it very well. Those are my excuses for why the code is more complex that it would be if I were writing it from scratch. Anyway, instead of using eval() of a string to create the new() function, I use ast manipulations to generate a function that does all of the correct type checking. It calls eval() too, of course, but with a code object. I originally wrote this as an exercise in learning how to generate AST's. I can't say it's the best way to solve this problem, and I haven't benchmarked it ever. So just consider it as a proof of concept, or ignore it if you're not interested. |
I would like to caution against any significant changes to save microscopic amounts of time. Twisting the code into knots for minor time savings is rarely worth it and it not what Python is all about.
I would say that it is almost always negligible and reflects a need for a better sense of proportion and perspective. Also, in the past we've found that efforts to speed start-up time were measurable only in trivial cases. Tools like mercurial end-up importing and using a substantial chunk of the standard library anyway, so those tools got zero benefit from the contortions we did to move _collections_abc.py from underneath the collections module. In the case of functools, if the was a real need (and I don't believe there is), I would be willing to accept INADA's original patch replacing the namedtuple call with its source. That said, I don't think half millisecond is worth the increase in code volume and the double maintenance problem keeping it in-sync with any future changes to namedtuple. In my opinion, accumulating technical debt in this fashion is a poor software design practice. |
I'll echo Raymond's concerns here, as we simply don't have the collective maintenance capacity to sustain a plethora of special case micro-optimisations aimed at avoiding importing common standard library modules. I will note however, that there has been relatively little work done on optimising CPython's code generator, as the use of pyc files and the fact namedtuples are typically only defined at start-up already keeps it out of the critical path in most applications. While work invested there would technically still be a micro-optimisation at the language level, it would benefit more cases than just avoiding the use of namedtuple in functools would. Alternatively, rather than manually duplicating the namedtuple code and having to keep it in sync by hand, you could investigate the work Larry Hastings has already done for Argument Clinic in Python's C files: https://docs.python.org/3/howto/clinic.html Argument Clinic already includes the machinery necessary to assist with automated maintenance of generated code (at least in C), and hence could potentially be adapted to the task of "named tuple inlining". If Victor's AST transformation pipeline and function guard proposals in PEP's 511 and 510 are accepted at some point in the future, then such inlining could potentially even be performed implicitly some day. Caring about start-up performance is certainly a good thing, but when considering potential ways to improve the situation, structural enhancements to the underlying systems are preferable to ad hoc special cases that complicate future development efforts. |
Thanks Nick. I'm going to mark this as closed, as the proposal to microscopic to warrant incurring technical debt. If someone comes forward with more fully formed idea for code generation or overall structural enchancement, that can be put in another tracker item. |
I noticed argument clinic supports Python 1. So there is one way to code generation already. |
Updated patch: fixed small issue in argument clinic, and added |
Ah, I had forgotten that Larry had already included Python support in Argument Clinic. With the inline code auto-generated from the pure Python implementation, that addresses the main maintenance concerns I had. I did briefly wonder about the difficulties of bootstrapping Argument Clinic (since it uses functools), but that's already accounted for in the comment-based design of Argument Clinic itself (i.e. since the generated code is checked in, the previous iteration can be used to generate the updated one when the namedtuple template changes). Raymond, how does this variant look to you? |
(reopen the issue to discuss about using Argument Clinic) |
Argument Clinic is not needed, since we can use Makefile. |
The concern with using the "generate a private module that can be cached" approach is that it doesn't generalise well - any time you want to micro-optimise a new module that way, you have to add a custom Makefile rule. By contrast, Argument Clinic is a general purpose tool - adopting it for micro-optimisation in another file would just be a matter of adding that file to the list of files that trigger a clinic run. functools.py would be somewhat notable as the first Python file we do that for, but it isn't a novel concept overall. That leads into my main comment on the AC patch: the files that are explicitly listed as triggering a new clinic run should be factored out into a named variable and that list commented accordingly. |
done. |
Just because I disagree with you doesn't mean I'm pestering anyone. Can you stop being so obnoxious? |
Check the issue history - the issue has been rejected by Raymond, and then reopened for further debate by other core developers multiple times. That's not a reasonable approach to requesting reconsideration of a module/API maintainers design decision. I acknowledge that those earlier reopenings weren't by you, but the issue should still remain closed until *Raymond* agrees to reconsider it (and given the alternative option of instead making the lower overhead PyStructSequence visible at the Python level, I'd be surprised if he does). |
Sorry, I don't have much data at this point, but it's not the first time that I noticed that namedtuple is super slow. We have much more efficient code like structseq in C. Why not reusing it at least in our stdlib modules? About the _source attribute, honestly, I'm not aware of anyone using it. I don't think that the fact that a *private* attribute is document should prevent it to make Python faster. I already noticed the _source attribute when I studied the Python memory usage. See my old isuse bpo-19640: "Drop _source attribute of namedtuple (waste memory)", I later changed the title to "Dynamically generate the _source attribute of namedtuple to save memory)". About "Python startup time doesn't matter", this is just plain wrong. Multiple core developers spent a lot of time on optimizing exactly that. Tell me if you really need a long rationale to work on that. While I'm not sure about Naoki's exact optimization, I agree about the issue title: "Optimize namedtuple creation", and I like the idea of keeping the issue open to find a solution. |
Yes, I'm saying you need a really long justification to explain why you want to break backwards compatibility solely for a speed increase. For namedtuple instances, the leading underscore does *NOT* indicate a private attribute - it's just there to avoid colliding with field names. Speed isn't everything, and it certainly isn't adequate justification for breaking public APIs that have been around for years. Now, you can either escalate that argument to python-dev, and try to convince Guido to overrule Raymond on this point, *or* you can look at working out a Python level API to dynamically define PyStructSequence subclasses. That won't be entirely straightforward (as my recollection is that structseq is designed to build on static C structs), but if you're successful, it will give you something that should be faster than namedtuple in every way, not just at definition time. |
Benchmark comparing collections.namedtuple to structseq, to get an attribute:
So structseq is 1.9x faster than namedtuple to get an attribute by name. haypo@speed-python$ ./bin/python3 -m perf timeit -s "from collections import namedtuple; Point=namedtuple('Point', 'x y'); p=Point(1,2)" "p.x" --duplicate=1024 -o name_namedtuple.json haypo@speed-python$ ./bin/python3 -m perf timeit -s "from sys import flags" "flags.debug" --duplicate=1024 -o name_structseq.json --- Getting an attribute by its integer index is as fast as tuple: haypo@speed-python$ ./bin/python3 -m perf timeit --inherit=PYTHONPATH -s "p=(1,2)" "p[0]" --duplicate=1024 -o int_tuple.json |
Oops, I wrote it backward: So namedtuple is 1.9x slower than structseq to get an attribute by name. (1.9x slower doesn't mean 1.9x faster, sorry.) |
What about the memory usage?
""" /usr/lib/python3.4/collections/init.py: 676.2 kB |
I respect Raymond's rejection. But I want to write down why I like Jelle's approach. Currently, functools is the only module which is very popular. This is also problem for applications heavily depending on namedtuple. It's possible to keep It's possible to expose StructSeq somewhere. It can make it faster to That's why I'm willing collections.namedtuple overhead is reasonable and predictable. |
Hum, when I mentioned structseq: my idea was more to reimplement |
On python-dev Raymond agreed to reopen the issue and consider Jelle's implementation (#2736). |
Re-opening per discussion on python-dev. Goals:
|
Should we consider a C-based implementation like https://github.com/llllllllll/cnamedtuple? It could improve speed even more, but would be harder to maintain and test and harder to keep compatible. My sense is that it's not worth it unless benchmarks show a really dramatic difference. As for Raymond's list of goals, my PR now preserves _source and verbose=True and the test suite passes. I think the only docs change needed is in the description for _source (https://docs.python.org/3/library/collections.html#collections.somenamedtuple.\_source), which is no longer "used to create the named tuple class". I'll add that to my PR. I haven't done anything towards the last two goals yet. Should the change be applied to 3.6? It is fully backwards compatible, but perhaps the change is too disruptive to be included in the 3.6 series at this point. |
Thanks Raymond and Jelle. The bar for a reimplementation in C is much higher (though we'll have to agree that Jelle's version is fast enough before we reject it). The bar for backporting this to 3.6 is much higher as well and I think it's not worth disturbing the peace (people depend on the craziest things staying the same between bugfix releases, but for feature releases they have reasons to do thorough testing). |
Why not just do the following: >>> from collections import namedtuple
>>> Point = namedtuple('Point', ['x', 'y'])
>>> Point._source
"from collections import namedtuple\nPoint = namedtuple('Point', ['x', 'y'])\n"
>>> The docs make it seems as if the primary use case of the _source attribute is |
I've just filed a ticket for this: llllllllll/cnamedtuple#7 |
I added a benchmark suite (using Victor's perf utility) to cnamedtuple. The results are here: https://github.com/llllllllll/cnamedtuple#benchmarks To summarize: type creation is much faster; instance creation and named attribute access are a bit faster. |
I benchmarked some common namedtuple operations with the following script: #!/bin/bash echo 'namedtuple instantiation' echo 'namedtuple attribute access' echo 'namedtuple _make' -------------------------------------- $ ./ntbenchmark.sh
namedtuple creation
2000 loops, best of 5: 101 usec per loop
namedtuple instantiation
500000 loops, best of 5: 477 nsec per loop
namedtuple attribute access
5000000 loops, best of 5: 59.9 nsec per loop
namedtuple _make
500000 loops, best of 5: 430 nsec per loop With unpatched CPython master I get: $ ./ntbenchmark.sh
namedtuple creation
500 loops, best of 5: 409 usec per loop
namedtuple instantiation
500000 loops, best of 5: 476 nsec per loop
namedtuple attribute access
5000000 loops, best of 5: 60 nsec per loop
namedtuple _make
1000000 loops, best of 5: 389 nsec per loop So creating a class is about 4x faster (similar to the benchmarks various other people have run) and calling _make() is 10% slower. That's probably because of the line "if len(result) != cls._num_fields:" in my implementation, which would have been something like "if len(result) != 3" in the exec-based implementation. I also cProfiled class creation with my patch. These are results for creating 10000 3-element namedtuple classes:
Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) So about 35% of time is still spent in the exec() call to create __new__. Another 10% is in .format() calls, so using f-strings instead of .format() might also be worth it. |
Thanks Joe! I adapted your benchmark suite to also run my implementation. See JelleZijlstra/cnamedtuple@61b6fbf for the code and results. The results are consistent with what we've seen before. Joe's cnamedtuple is about 40x faster for class creation than the current implementation, and my PR only speeds class creation up by 4x. That difference is big enough that I think we should seriously consider using the C implementation. |
I want to focus on pure Python implementation in this issue. While "40x faster" is more 10x faster than "4x faster", C implementation And sometimes "more 10x faster" is not so important. 4x faster: 1sec -> 0.7sec (-30%) In this case, "4x faster" reduces 0.3sec and "more 10x faster" reduces Of course, 1.9x faster attribute access (http://bugs.python.org/issue28638#msg298499) is attractive. |
As a counter argument against "let's not do it because it'll be harder to maintain" I'd like to point out that namedtuple API is already kind of over engineered (see: "verbose", "rename", "module" and "_source") and as such it seems likely it will remain pretty much the same in the future. So why not treat namedtuple like any other basic data structure, boost its internal implementation and simply use the existing unit tests to make sure there are no regressions? It seems the same barrier does not apply to tuples, lists and sets.
It is indeed and it makes a huge difference in situations like busy loops. E.g. in case of asyncio 1.9x faster literally means being able to serve twice the number of reqs/sec: cpython/Lib/asyncio/selector_events.py Line 523 in 3e2ad8e
|
I didn't say "let's not do it". Even if C implementation is added later, pure Python optimization |
General note about this issue: while the issie title is "Optimize namedtuple creation", it would be *nice* to not only optimization the creation but also attribute access by name: Maybe we can have a very fast C implementation using structseq, and a fast Python implementation (faster than the current Python implementation) fallback for non-CPython. |
Yeah, it looks like the standard |
FWIW, once the property/itemgetter pair are instantiated in the NT class, the actual lookup runs through them at C speed (no pure python steps). There is not much fluff here. |
Side-note: Some of the objections to a C level namedtuple implementation appear to be based on the maintenance hurdle, and other have noted that a structseq-based namedtuple might be an option. I have previously attempted to write a C replacement for namedtuple that dynamically created a StructSequence. I ran into a roadblock due to PyStructSequence_NewType (the API that exists to allow creation of runtime defined structseq) being completely broken (bpo-28709). If the struct sequence API was fixed, it should be a *lot* easier to implement a C level namedtuple with minimal work, removing (some) of the maintenance objections by simply reducing the amount of custom code involved. The testnewtype.c code attached to bpo-28709 (that demonstrates the bug) is 66 lines of code, and implements a basic C level namedtuple creator function (full support omitted for brevity, but aside from _source, most of it would be easy). I'd expect a finished version to be low three digit lines of custom code, a third or less of what the cnamedtuple project needed to write the whole thing from scratch. |
Microbenchmark for caching docstrings: $ ./python -m perf timeit -s "from collections import namedtuple; names = ['field%d' % i for i in range(1000)]" -- "namedtuple('A', names)" With sys.intern(): Mean +- std dev: 3.57 ms +- 0.05 ms |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: