Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose API for writing perf map files #103295

Closed
carljm opened this issue Apr 5, 2023 · 18 comments · Fixed by #104823
Closed

expose API for writing perf map files #103295

carljm opened this issue Apr 5, 2023 · 18 comments · Fixed by #104823
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@carljm
Copy link
Member

carljm commented Apr 5, 2023

#96123 added support for CPython to write /tmp/perf-<pid>.map files, associating instruction address ranges with a human-readable frame name for the Linux perf profiler.

Two external Python JIT compilers, Cinder and Pyston, both also independently write to perf map files.

Since perf map files are one-per-process, multiple separate libraries trying to write perf map entries independently can lead to file corruption from simultaneous writes.

It's unlikely for both Cinder and Pyston JITs to be used in the same process, but it's quite reasonable to use one of these JITs along with CPython's native perf trampoline support.

In order for this to be safe, CPython should expose a thread-safe API for writing perf map entries that all these clients can use.

(We've backported the 3.12 perf trampolines feature to Cinder 3.10 and experimented with using it, and we've seen this write corruption problem occur in practice; it's not just a theoretical risk we identified.)

cc @pablogsal , @kmod

Linked PRs

@carljm carljm added the type-feature A feature request or enhancement label Apr 5, 2023
@arhadthedev arhadthedev added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Apr 19, 2023
pablogsal pushed a commit that referenced this issue May 21, 2023
Co-authored-by: Aniket Panse <aniketpanse@fb.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Carl Meyer <carl@oddbird.net>
@hroncok
Copy link
Contributor

hroncok commented May 23, 2023

Hey folks, we experience this failure on Fedora 37, 38 and 39 on i686 (32bit):

test_write_perf_map_entry (test.test_perfmaps.TestPerfMapWriting.test_write_perf_map_entry) ... *** stack smashing detected ***: terminated
Fatal Python error: Aborted
Current thread 0xf7e72700 (most recent call first):
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/test/test_perfmaps.py", line 13 in test_write_perf_map_entry
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/unittest/case.py", line 589 in _callTestMethod
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/unittest/case.py", line 634 in run
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/unittest/case.py", line 690 in __call__
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/unittest/suite.py", line 122 in run
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/unittest/suite.py", line 84 in __call__
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/unittest/suite.py", line 122 in run
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/unittest/suite.py", line 84 in __call__
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/unittest/suite.py", line 122 in run
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/unittest/suite.py", line 84 in __call__
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/unittest/runner.py", line 240 in run
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/test/support/__init__.py", line 1115 in _run_suite
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/test/support/__init__.py", line 1241 in run_unittest
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/test/libregrtest/runtest.py", line 294 in _test_module
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/test/libregrtest/runtest.py", line 330 in _runtest_inner2
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/test/libregrtest/runtest.py", line 373 in _runtest_inner
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/test/libregrtest/runtest.py", line 248 in _runtest
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/test/libregrtest/runtest.py", line 278 in runtest
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/test/libregrtest/main.py", line 365 in rerun_failed_tests
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/test/libregrtest/main.py", line 783 in _main
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/test/libregrtest/main.py", line 738 in main
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/test/libregrtest/main.py", line 802 in main
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/test/regrtest.py", line 43 in _main
  File "/builddir/build/BUILD/Python-3.12.0b1/Lib/test/regrtest.py", line 47 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main
Extension modules: _testcapi, _testinternalcapi (total: 2)
/var/tmp/rpm-tmp.nNpjuE: line 79: 3052900 Aborted                 (core dumped) LD_LIBRARY_PATH=$ConfDir $ConfDir/python -m test.regrtest -wW --slowest -j6 --timeout=2700 -i test_freeze_simple_script

We are yet to investigate, but is this thing supposed to work on i686 as well?

@pablogsal
Copy link
Member

test_write_perf_map_entry (test.test_perfmaps.TestPerfMapWriting.test_write_perf_map_entry) ... *** stack smashing detected ***: terminated

This seems to have crashed due to some hardening stuff. Can you share your hardening flags?

@pablogsal
Copy link
Member

We are yet to investigate, but is this thing supposed to work on i686 as well?

The test is just writing to a file, not running perf so I would say that it should work in i686 for sure.

@pablogsal
Copy link
Member

CC: @gsallam

@hroncok
Copy link
Contributor

hroncok commented May 23, 2023

Can you share your hardening flags?

CFLAGS='-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m32 -march=i686 -mtune=generic -msse2 -mfpmath=sse -mstackrealign -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'

https://src.fedoraproject.org/rpms/redhat-rpm-config/blob/rawhide/f/redhat-hardened-cc1
https://src.fedoraproject.org/rpms/redhat-rpm-config/blob/rawhide/f/redhat-annobin-cc1

@pablogsal
Copy link
Member

How easy is for you to get a native backtrace?

@hroncok
Copy link
Contributor

hroncok commented May 23, 2023

I'm building it locally on my x86_64 machine and will try to salvage a coredump but I have never done this before with i686, hopefully, I'll get it right. It takes a while.

Observation: The debug build passed the tests, only the optimized one did not.

@pablogsal
Copy link
Member

pablogsal commented May 23, 2023

only the optimized one did not.

That's certainly a bad omen 😓

@hroncok
Copy link
Contributor

hroncok commented May 23, 2023

I was unable to find a coredump, but this is what I get when I run it trough gdb:

Thread 1 "python" received signal SIGABRT, Aborted.
__kernel_vsyscall () at arch/x86/entry/vdso/vdso32/system_call.S:72
72              popl    %ebp                                                                                     
(gdb) bt
#0  __kernel_vsyscall () at arch/x86/entry/vdso/vdso32/system_call.S:72
#1  0xf768f507 in __pthread_kill_implementation (threadid=threadid@entry=4160464640, signo=signo@entry=6, 
    no_tid=no_tid@entry=0) at pthread_kill.c:43
#2  0xf768f58f in __pthread_kill_internal (signo=6, threadid=4160464640) at pthread_kill.c:78
#3  0xf763af65 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
#4  0xf7622370 in __GI_abort () at abort.c:79
#5  0xf7623422 in __libc_message (fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:150
#6  0xf7728083 in __GI___fortify_fail (msg=0xf77919c1 "stack smashing detected") at fortify_fail.c:24
#7  0xf7728e2f in __stack_chk_fail () at stack_chk_fail.c:24
#8  0xf6adc6a8 in __stack_chk_fail_local ()
   from /builddir/build/BUILD/Python-3.12.0b1/build/optimized/build/lib.linux-i686-3.12/_testinternalcapi.cpython-312-i386-linux-gnu.so
Fatal glibc error: malloc.c:2589 (sysmalloc): assertion failed: (old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)


Fatal signal: Aborted
----- Backtrace -----
0x56728411 ???
0x56879c8d ???
0xf7faf56f ???
0xf7faf559 ???
0xf6e8f506 __pthread_kill_implementation
	/usr/src/debug/glibc-2.37.9000-10.fc39.i386/nptl/pthread_kill.c:43
0xf6e3af64 __GI_raise
	../sysdeps/posix/raise.c:26
0xf6e2236f __GI_abort
	/usr/src/debug/glibc-2.37.9000-10.fc39.i386/stdlib/abort.c:79
0xf6e23421 __libc_message
	../sysdeps/posix/libc_fatal.c:150
0xf6e32e46 __libc_assert_fail
	/usr/src/debug/glibc-2.37.9000-10.fc39.i386/assert/__libc_assert_fail.c:31
0xf6e9d989 sysmalloc
	/usr/src/debug/glibc-2.37.9000-10.fc39.i386/malloc/malloc.c:2589
0xf6e9e7d5 _int_malloc
	/usr/src/debug/glibc-2.37.9000-10.fc39.i386/malloc/malloc.c:4462
0xf6e9f2f1 __GI___libc_malloc
	/usr/src/debug/glibc-2.37.9000-10.fc39.i386/malloc/malloc.c:3324
0x5668d059 ???
0xf6ea123d __GI__obstack_newchunk
	/usr/src/debug/glibc-2.37.9000-10.fc39.i386/malloc/obstack.c:261
0x5684aa2f ???
0x5684fae4 ???
0x568553ac ???
0x5684f7ee ???
0x56852d8c ???
0x5684f7a1 ???
0x56857a99 ???
0x568589b0 ???
0x56b38e45 ???
0x56b559cf ???
0x56b55b3e ???
0x566f3d0c ???
0x56816141 ???
0x567f3964 ???
0x567fa53d ???
0x567fc15f ???
0x568a93f7 ???
0x568a9f1a ???
0x568aa978 ???
0x568ab361 ???
0x568ac316 ???
0x56b2ccc9 ???
0x56765b7b ???
0x56b9e0a9 ???
0x5687a9d8 ???
0x5687c1f0 ???
0x56bc42c9 ???
0x5687a522 ???
0xf7d49dd1 ???
0x5687a1cd ???
0x5687a3bb ???
0x56879b20 ???
0x56e32403 ???
0x56e327eb ???
0x56e3346e ???
0x5697527b ???
0x56977694 ???
0x56628f19 ???
0xf6e23d88 __libc_start_call_main
	../sysdeps/nptl/libc_start_call_main.h:58
0xf6e23e4b __libc_start_main_impl
	../csu/libc-start.c:360
0x5663487a ???
0xffffffff ???
---------------------
A fatal error internal to GDB has been detected, further
debugging is not possible.  GDB will now terminate.

This is a bug, please report it.  For instructions, see:
<https://www.gnu.org/software/gdb/bugs/>.

Aborted (core dumped)

Apparently, I am really bad at this :D

@gsallam
Copy link
Contributor

gsallam commented May 23, 2023

I am currently trying to figure out how to reproduce this issue locally and will investigate it further.

@carljm
Copy link
Member Author

carljm commented May 23, 2023

@hroncok Can you give any more details about how you built Python on an x86_64 machine to repro this issue?

@hroncok
Copy link
Contributor

hroncok commented May 23, 2023

I am building the package without running the tests in https://copr.fedorainfracloud.org/coprs/churchyard/python3.12b1/builds/ -- that should hopefully make it easier to install it in a Fedora docker container and reproduce the issue there.


I've built the package using mock on Fedora:

git clone https://src.fedoraproject.org/forks/thrnciar/rpms/python3.12.git
cd python3.12
git switch update-to-3.12.0b1
fedpkg --release rawhide srpm
mock -r fedora-rawhide-i386 init
mock -r fedora-rawhide-i386 -nN python3.12-3.12.0~b1-1.fc39.src.rpm
...fails...
mock -r fedora-rawhide-i386 shell
...gets me in..., the build is in build/BUILD/Python-3.12.0b1/build/optimized/

Alternatively, the SRPM can be obtained from https://src.fedoraproject.org/rpms/python3.12/pull-request/40 Fedora CI scratch build result, currently https://kojipkgs.fedoraproject.org//work/tasks/3880/101493880/python3.12-3.12.0~b1-1.fc39.src.rpm and just use the mock commands.

@hroncok
Copy link
Contributor

hroncok commented May 23, 2023

$ podman run --rm -ti fedora:rawhide /usr/bin/bash
[root@0860d4a9ce57 /]# curl https://copr.fedorainfracloud.org/coprs/churchyard/python3.12b1/repo/fedora-rawhide/churchyard-python3.12b1-fedora-rawhide.repo | sed 's/$basearch/i386/g' > /etc/yum.repos.d/churchyard-python3.12b1-fedora-rawhide.repo
...
[root@0860d4a9ce57 /]# dnf -y --nodocs --setopt=install_weak_deps=False install python3.12{,-{test,debuginfo,debugsource,libs}}.i686
...
[root@0860d4a9ce57 /]# python3.12 -m test.regrtest test_perfmaps
0:00:00 load avg: 0.81 Run tests sequentially
0:00:00 load avg: 0.81 [1/1] test_perfmaps
*** stack smashing detected ***: terminated
Fatal Python error: Aborted

Current thread 0xf7f66700 (most recent call first):
  File "/usr/lib/python3.12/test/test_perfmaps.py", line 13 in test_write_perf_map_entry
  File "/usr/lib/python3.12/unittest/case.py", line 589 in _callTestMethod
  File "/usr/lib/python3.12/unittest/case.py", line 634 in run
  File "/usr/lib/python3.12/unittest/case.py", line 690 in __call__
  File "/usr/lib/python3.12/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.12/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.12/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.12/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.12/unittest/suite.py", line 122 in run
  File "/usr/lib/python3.12/unittest/suite.py", line 84 in __call__
  File "/usr/lib/python3.12/test/support/testresult.py", line 143 in run
  File "/usr/lib/python3.12/test/support/__init__.py", line 1115 in _run_suite
  File "/usr/lib/python3.12/test/support/__init__.py", line 1241 in run_unittest
  File "/usr/lib/python3.12/test/libregrtest/runtest.py", line 294 in _test_module
  File "/usr/lib/python3.12/test/libregrtest/runtest.py", line 330 in _runtest_inner2
  File "/usr/lib/python3.12/test/libregrtest/runtest.py", line 373 in _runtest_inner
  File "/usr/lib/python3.12/test/libregrtest/runtest.py", line 248 in _runtest
  File "/usr/lib/python3.12/test/libregrtest/runtest.py", line 278 in runtest
  File "/usr/lib/python3.12/test/libregrtest/main.py", line 483 in run_tests_sequential
  File "/usr/lib/python3.12/test/libregrtest/main.py", line 601 in run_tests
  File "/usr/lib/python3.12/test/libregrtest/main.py", line 779 in _main
  File "/usr/lib/python3.12/test/libregrtest/main.py", line 738 in main
  File "/usr/lib/python3.12/test/libregrtest/main.py", line 802 in main
  File "/usr/lib/python3.12/test/regrtest.py", line 43 in _main
  File "/usr/lib/python3.12/test/regrtest.py", line 47 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main

Extension modules: _testcapi, _testinternalcapi (total: 2)
Aborted (core dumped)

@hroncok
Copy link
Contributor

hroncok commented May 23, 2023

Also, in the same container:

[root@0860d4a9ce57 /]# dnf -y --nodocs --setopt=install_weak_deps=False --enablerepo=rawhide-debuginfo install gdb bzip2-debugsource.i686 bzip2-libs-debuginfo.i686 gcc-debugsource.i686 glibc-debuginfo.i686 glibc-debugsource.i686 libb2-debuginfo.i686 libb2-debugsource.i686 libffi-debuginfo.i686 libffi-debugsource.i686 libgcc-debuginfo.i686 libgomp-debuginfo.i686 openssl-debugsource.i686 openssl-libs-debuginfo.i686 xz-debugsource.i686 xz-libs-debuginfo.i686 zlib-debuginfo.i686 zlib-debugsource.i686
...
[root@0860d4a9ce57 /]# gdb python3.12
...
(gdb) run -m test.regrtest test_perfmaps
Starting program: /usr/bin/python3.12 -m test.regrtest test_perfmaps
warning: Error disabling address space randomization: Function not implemented
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0:00:00 load avg: 1.03 Run tests sequentially
0:00:00 load avg: 1.03 [1/1] test_perfmaps
*** stack smashing detected ***: terminated

Program received signal SIGABRT, Aborted.
0xf7f22559 in __kernel_vsyscall ()
(gdb) bt
#0  0xf7f22559 in __kernel_vsyscall ()
#1  0xf748f507 in __pthread_kill_implementation (threadid=threadid@entry=4158932736, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:43
#2  0xf748f58f in __pthread_kill_internal (signo=6, threadid=4158932736) at pthread_kill.c:78
#3  0xf743af65 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
#4  0xf7422370 in __GI_abort () at abort.c:79
#5  0xf7423422 in __libc_message (fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:150
#6  0xf7528083 in __GI___fortify_fail (msg=0xf75919c1 "stack smashing detected") at fortify_fail.c:24
#7  0xf7528e2f in __stack_chk_fail () at stack_chk_fail.c:24
#8  0xf6ca66a8 in __stack_chk_fail_local () from /usr/lib/python3.12/lib-dynload/_testinternalcapi.cpython-312-i386-linux-gnu.so
#9  0xf6ca4d68 in write_perf_map_entry (self=<module at remote 0xf6d3f2d0>, args=(4660, 5678, 'entry1')) at /usr/src/debug/python3.12-3.12.0~b1-1.fc39.i386/Modules/_testinternalcapi.c:778
#10 0xf7975add in cfunction_call (func=<built-in method write_perf_map_entry of module object at remote 0xf6d3f2d0>, args=(4660, 5678, 'entry1'), kwargs=0x0)
    at /usr/src/debug/python3.12-3.12.0~b1-1.fc39.i386/Objects/methodobject.c:548
#11 0xf794fe9e in _PyObject_MakeTpCall (tstate=0xf7d4ee70 <_PyRuntime+236880>, callable=<built-in method write_perf_map_entry of module object at remote 0xf6d3f2d0>, args=0xf7d688f8, nargs=3, keywords=0x0)
    at /usr/src/debug/python3.12-3.12.0~b1-1.fc39.i386/Objects/call.c:240
#12 0xf796e9ae in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=2147483651, args=0xf7d688f8, callable=<built-in method write_perf_map_entry of module object at remote 0xf6d3f2d0>, tstate=0xf7d4ee70 <_PyRuntime+236880>)
    at /usr/src/debug/python3.12-3.12.0~b1-1.fc39.i386/Include/internal/pycore_call.h:90
... rest is attached ...

@pablogsal
Copy link
Member

pablogsal commented May 23, 2023

I may know what's going on. We are parsing the code address using the "K" format specified into a void* The "K" format specifier parses the python integer into a unsigned long long that's always 8 bytes but the size of void * is 4 bytes on 32 bits so in 32 bit machines we are writing pass the size of the variable, hence triggering the stack protection.

@carljm
Copy link
Member Author

carljm commented May 23, 2023

Heh, yes, I just reached the same conclusion as @pablogsal and came here to report it :) I will work on a PR (unless @gsallam gets there first.)

@pablogsal
Copy link
Member

eh, yes, I just reached the same conclusion as @pablogsal and came here to report it :)

Ah but I am still the fastest gun in the west 🔫🤠

@carljm
Copy link
Member Author

carljm commented May 23, 2023

@hroncok I don't have a repro environment set up, but I expect #104811 to fix it. Can you verify?

miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 23, 2023
…ss (pythonGH-104811)

(cherry picked from commit e0b3078)

Co-authored-by: Carl Meyer <carl@oddbird.net>
carljm added a commit that referenced this issue May 23, 2023
…ess (GH-104811) (#104823)

gh-103295: fix stack overwrite on 32-bit in perf map test harness (GH-104811)
(cherry picked from commit e0b3078)

Co-authored-by: Carl Meyer <carl@oddbird.net>
@carljm carljm closed this as completed May 23, 2023
hrnciar added a commit to fedora-python/cpython that referenced this issue May 24, 2023
…GH-104811) (python#104823)

pythongh-103295: fix stack overwrite on 32-bit in perf map test harness (pythonGH-104811)
(cherry picked from commit e0b3078)

Co-authored-by: Carl Meyer <carl@oddbird.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
Projects
None yet
5 participants