-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python built with clang -O0 allocates 10x more stack memory than clang -O3 on a Python function call #90758
Comments
Measure using this script on the main branch (commit 108e66b): import _testcapi
def f(): yield _testcapi.stack_pointer()
print(_testcapi.stack_pointer() - next(f())) Stack usage depending on the compiler and compiler optimization level:
-O0 allocates around 10x more memory. Moreover, "./configure --with-pydebug CC=clang" uses -O0 in CFLAGS, because "clang --help" output doesn't containt "-Og". I'm working on a configure change to use -Og on clang which supports it. |
#75235 enables -Og when using clang and ./configure --with-pydebug and so the example uses 736 bytes instead of 9,104 bytes. |
This issue is a follow-up of bpo-46542 "test_json and test_lib2to3 crash on s390x Fedora Clang 3.x buildbot". |
Previous issues about stack memory usage, work done in 2017:
I summarized the results in the "Stack consumption" section of my article: https://vstinner.github.io/contrib-cpython-2017q1.html |
See also bpo-30866: "Add _testcapi.stack_pointer() to measure the C stack consumption". |
stack_overflow-4.py: Update script from bpo-30866 to measure stack memory usage before Python crash or raises a RecursionError. I had to modify the script since calling a Python function from a Python function no longer allocates (additional) memory on the stack! See bpo-45256 "Remove the usage of the C stack in Python to Python calls". |
stack_overflow-4.py output depending on the compiler and compiler flags. gcc -O3 (./configure): => total: 275124 calls, 1184.3 bytes per call It's better than stack memory usage in 2017: https://bugs.python.org/issue30866#msg297826 clang -O3 (./configure CC=clang): => total: 270185 calls, 1408.4 bytes per call clang allocates a little bit more memory on the stack than gcc. I didn't try PGO or LTO yet. |
PR 31052 seems to have broken a bunch of buildbots. If no fix is provided in 24 hours, we will need to revert :( |
test_gdb fails if Python is built with clang -Og. I don't think that it's a regression. It's just that previously, buildbots using clang only build Python with -O0 or -O3. I'm investigating the test_gdb issue: it's easy to reproduce on Linux (clang 13.0.0). I may skip test_gdb is Python is built with clang -Og. |
FWIW, it seems -O0 don't merge local variables in different path or lifetime. For example, see _Py_abspath
wchar_t is 4bytes and MAXPATHLEN is 4096 on Linux. So each cwd is 16388bytes. I don't know what is the specific optimization flag in -Og do merge local variable, but I think -Og is very important for _PyEval_EvalFrameDefault() since it has many local variables in huge switch-case statements. By the way, clang 13 has |
For functions which are commonly called in Python at runtime, it may be worth it to manually merged large local variables to save a few bytes on the stack when Python is built with -O0. For _Py_abspath(), this function is only called at startup, if I recall correctly, so it should be a big issue in practice. |
I didn't mean _Py_abspath is problem. I just used it to describe why -O0 and -Og is so different. We can reduce stack usage of it easily, but it is not a problem than _PyEval_EvalFrameDefault. |
Is there anything left to do here? |
There is always room for enhancement :-) But for now, IMO merged changes are enough to make the issue less complicated. The main change is to use -Og when Python is built in debug mode by clang. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: