New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RETURN_NONE bytecode instruction #72986
Comments
Attached patch adds a new bytecode instruction: RETURN_NONE. It is similar to "LOAD_CONST; RETURN_VALUE" but it avoids the need of adding None to constants of the code object. For function with an empty body, it reduces the stack size from 1 to 0. Example on reference Python 3.7: >>> def func(): return
...
>>> import dis; dis.dis(func)
1 0 LOAD_CONST 0 (None)
2 RETURN_VALUE
>>> func.__code__.co_stacksize
1 Example on patched Python 3.7: >>> def func(): return
...
>>> import dis; dis.dis(func)
1 0 RETURN_CONST
>>> func.__code__.co_stacksize
0 If the function has a docstring, RETURN_CONST avoids adding None to code constants: >>> def func():
... "docstring"
... return
...
>>> func.__code__.co_consts
('docstring',) I will now run benchmarks on the patch. |
I'm proposing this patch because I noticed that reducing the number of opcodes makes Python faster in my old registervm project: a fork of CPython which uses a register-based bytecode: I also plan to propose a CALL_PROCEDURE method to replace CALL_FUNCTION+POP_TOP. Only for the the simple CALL_FUNCTION, not for complex CALL_FUNCTION_KW nor CALL_FUNCTION_EX. |
Do you want to add RETURN_NONE or RETURN_CONST? Or both? Adding new special opcodes can decrease the size of the code and increase performance of some cases. But it adds maintenance burden, increases the complexity of the compiler and peephole optimizer, and increases the size of ceval loop. The latter can have negative effect on the performance. I think we should add new specialized opcodes only if they adds measurable gain to global performance or large speed up of important particular cases. It would help if you gather the statistics of RETURN_* opcodes. How many RETURN_VALUE, RETURN_CONST and RETURN_NONE instructions are compiled and executed during running Python tests? Compare it with total number of compiled and executed instructions. |
Results of performance 0.5.0 on speed-python: haypo@speed-python$ python3 -m perf compare_to -G --min-speed=3 2016-11-23_19-34-default-3d660ed2a60e.json patch.json
Faster (3):
Benchmark hidden because not significant (58): 2to3, call_method_slots, call_method_unknown, (...) Hum, boring result. This change alone doesn't change any significant speedup, even some slowndon. Maybe it's just a bad idea. Maybe it should be combined with other new bytecode instructions. Maybe only a full new instruction set using registers show significant speedup. I don't know :-( |
Only RETURN_NONE because on some corner cases it allow to avoid completely the None constant from code.co_consts and reduce the stack size. I'm not sure that I want to start taking the same road of WPython which added a *lot* of specialized instructions (combining two or more existing instructions). As you said, it has a cost on the maintenance, and might have a negative impact if _PyEval_EvalFrameDefault() becomes too big. |
The pair LOAD_CONST/RETURN_VALUE is on 19th place of the top of opcode pairs (see msg269391 in bpo-27255). Not all of these constants are None. And since the time of LOAD_CONST is much smaller then the time of RETURN_VALUE (the latter includes destroying a frame and should be in a pair with CALL_FUNCTION), I think the performance effect of RETURN_NONE is much less than 1%. |
When we added LIST_APPEND (the first of the custom opcodes intended for optimization), it was done only because it was frequently used in inner loops where it provided real benefits to users. In contrast this opcode seems like a waste. Historically for opcodes, we've valued orthogonality and parsimony. The opcode set was intentionally kept simple and minimal. The recent opcode additions have disregarded these values. |
My vote is to close this issue since the performance isn't panning out. |
I also don't see a good reason to keep this open now - adds complication for no quantifiable payoff. |
-1; we have too many opcodes already. Let's not complicate the code if there's no performance improvement. |
Sorry, I didn't want to bother you. I should run the benchmark *before* opening an issue next time :-) I agree that the speedup is negligible, so it's not worth it. The main purpose of the patch was an optimization. I close the issue. Thanks ;-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: