-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Local import is too slow #66747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Locale import is too slow in comparison with caching module in a global or even in sys.modules. >>> import timeit
>>> def f():
... import locale
...
>>> min(timeit.repeat(f, number=100000, repeat=10))
0.4501200000013341
>>> _locale = None
>>> def g():
... global _locale
... if _locale is None:
... import _locale
...
>>> min(timeit.repeat(g, number=100000, repeat=10))
0.07821200000034878
>>> import sys
>>> def h():
... try:
... locale = sys.modules['locale']
... except KeyError:
... import locale
...
>>> min(timeit.repeat(h, number=100000, repeat=10))
0.12357599999813829 I think there is an overhead of look up __import__, packing arguments in a tuple and calling a function. This can be omitted by looking first in sys.module and calling __import__ only when nothing was found. |
This would sound reasonable to me, but I wonder if it may change behaviour with weird custom __import__ overrides. |
__import__ is intended as an absolute override (including of the sys.modules cache lookup), so we can't bypass it without breaking backwards compatibility. It's possible there is room for other optimisations that don't break the import override semantics (such as a fast path for when __import__ is the standard import function). |
I'm not experienced in import machinery. Here is preliminary patch which Performance effect is almost so good as manual caching in a global. >>> import timeit
>>> def f():
... import locale
...
>>> min(timeit.repeat(f, number=100000, repeat=10))
0.09563599999819417 Of course it breaks tests.
Good idea. |
Second version of the patch uses fast patch only when builtin __import__ is >>> import timeit
>>> def f():
... import locale
...
>>> min(timeit.repeat(f, number=100000, repeat=10))
0.10502300000371179 The code is simpler, but still some cumbersome. It would be good to optimize |
I would suggest factoring out IMPORT_NAME into a separate import_name() function, like is already one for import_from(). |
Some more general comments about this:
|
The issue is local imports, not imports of the locale module :-) |
Yes, my CPU is slow. Here is a patch which factors out IMPORT_NAME into a separate import_name() function and adds optimization for more general case when __import__ is not overloaded. |
Fixed bugs making test_importlib failing. Microbenchmark results on faster machine: $ ./python -m timeit 'import locale'
Unpatched: 1000000 loops, best of 3: 0.839 usec per loop
Patched: 10000000 loops, best of 3: 0.176 usec per loop
$ ./python -m timeit 'import os.path'
Unpatched: 100000 loops, best of 3: 2.02 usec per loop
Patched: 1000000 loops, best of 3: 1.77 usec per loop
$ ./python -m timeit 'from locale import getlocale'
Unpatched: 100000 loops, best of 3: 3.69 usec per loop
Patched: 100000 loops, best of 3: 3.39 usec per loop And it looks to me that there is a bug in existing code (opened separate bpo-27352). 0.839 usec is not very slow by CPython's standards, but is equal to about 50 assignments to local variable, 15 attribute revolvings or 5 simple function calls. If some module is optionally needed in fast function, the overhead of local import can be significant. We can lazily initialize global variable (the second example in msg228561), but this code looks more cumbersome. |
Do you happen to know why you didn't get a review link for your patch, Serhiy? |
faster_import_4.patch is based on the revision d736c9490333 which is not part of the CPython repository. |
I rebased faster_import_4.patch on default. |
Thanks Victor. |
faster_import_pkg.patch optimizes also an import of names with dots. $ ./python -m timeit 'import os.path'
Unpatched: 100000 loops, best of 3: 2.08 usec per loop
faster_import_5.patch: 1000000 loops, best of 3: 1.79 usec per loop
faster_import_pkg.patch: 1000000 loops, best of 3: 0.474 usec per loop |
Seems not all such easy. Looking in sys.module is not enough, we should check __spec__._initializing. Following patch moves optimizations to PyImport_ImportModuleLevelObject (C implementation of standard __import__). Main optimizations:
Microbenchmarking results: $ ./python -m timeit 'import os'
Unpatched: 1000000 loops, best of 3: 0.845 usec per loop
Patched: 1000000 loops, best of 3: 0.338 usec per loop
$ ./python -m timeit 'import os.path'
Unpatched: 100000 loops, best of 3: 2.07 usec per loop
Patched: 1000000 loops, best of 3: 0.884 usec per loop
$ ./python -m timeit 'from os import path'
Unpatched: 100000 loops, best of 3: 3.7 usec per loop
Patched: 100000 loops, best of 3: 2.77 usec per loop |
Thank you for your review Brett. |
New changeset 64f195790a3a by Serhiy Storchaka in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: