-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bytecode compile times are O(nlocals**2)
#97912
Comments
If it proves too complex to track multiple local variable states at the same time, another solution might be to only scan for the first hundred local variables or something like that. It's not clear to me how easy tracking everything would be. Do we need potentially unbounded stack space, since blocks might need to be processed more than once? Would the approach still be linear in the face of many different But if you manage to do it all linearly, that sounds great. |
Here's an example of a "hard" case: def f():
a1 = a2 = a3 = a4 = a5 = 1
for i in range(2):
if cond1(): use(a1)
else: del a1
if cond2(): use(a2)
else: del a2
if cond3(): use(a3)
else: del a3
if cond4(): use(a4)
else: del a4
if cond5(): use(a5)
else: del a5
# (generalize to more than 5 locals...) For example, the fact that That's not a proof that linear time is impossible, just that there may be some cleverness required (moving info across multiple basicblock linkages at once?) if we want linear time in all cases. However, some sort of heuristic may be possible that takes care of most typical cases without many/any By the way, the reason to propagate "possible undefinedness" rather than "certain definedness" is that any path with undefinedness leading into a LOAD_FAST is enough to have to use LOAD_FAST_CHECK. |
Here's a script that gathered The scriptfrom pathlib import Path
from types import CodeType
def all_code_recursive(code):
# based on https://github.com/faster-cpython/tools/blob/main/scripts/count_opcodes.py
yield code
for x in code.co_consts:
if isinstance(x, CodeType):
yield from all_code_recursive(x)
def all_code(root):
for path in root.glob("**/*.py"):
text = path.read_text("utf-8")
code = compile(text, path.name, "exec")
yield from all_code_recursive(code)
from collections import Counter
import sympy
var_counts = Counter()
init = Path(sympy.__file__)
for code in all_code(init.parent):
n = code.co_nlocals
var_counts[code.co_nlocals] += 1
if code.co_nlocals > 50:
print(code.co_qualname)
print(dict(sorted(var_counts.items())))
total = var_counts.total()
for bound in [2, 5, 10, 20, 50, 100, 200, 500, 1000]:
bigger = sum(count for n, count in var_counts.items() if n > bound)
print(f"{bigger/total:.2%} of code objects have >{bound} locals") The output
Summarized output:
Data from other places:
|
Thanks! The biggest one is https://github.com/sympy/sympy/blob/0e4bab73562d6f90fcdf5fa079731f26a455347c/sympy/integrals/rubi/rules/sine.py#L148, which looks terrifying. It is noticeably slower to compile on 3.12 than 3.11:
|
One speedup might come from doing some precomputation for the "first pass": start by storing The #define MAYBE_PUSH(B) do { \
if ((B)->b_visited > target) { \
*(*stack_top)++ = (B); \
(B)->b_visited = target + 1; \
} \
} while (0) Though again, I'm not sure whether it would be better to just limit the number of locals to analyze and set the rest to use If someone else hasn't already started a patch, I can work on this. |
It won't reduce the big-O complexity, but handling 64 locals at once (using an |
This should be fixed now. Thanks for finding this! (feel free to re-open if I missed something) |
@JelleZijlstra discovered something interesting yesterday: bytecode compile times on
main
are currently quadratic in the number of local variables (3.11
is still linear). For example, a function with 100,000 assignments to unique local variables takes well over a minute to compile (but is less than 2 seconds on 3.11).I believe the culprit is the new
LOAD_FAST
/LOAD_FAST_CHECK
stuff. The current implementation appears to loop over every local variable, and for each one, loop over all bytecode instructions. This can probably be refactored to run in one pass for all locals.CC: @markshannon @sweeneyde
The text was updated successfully, but these errors were encountered: