-
-
Notifications
You must be signed in to change notification settings - Fork 29.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyOS_InterruptOccurred() now requires to hold the GIL: PyOS_Readline() crash #85003
Comments
$ gdb ./python
...
(gdb) r
...
Python 3.9.0b1 (tags/v3.9.0b1:97fe9cfd9f8, May 30 2020, 05:26:48)
...
>>> import io
>>> io.FileIO(0).name
0
>>>
Program received signal SIGSEGV, Segmentation fault.
_PyInterpreterState_GET () at ./Include/internal/pycore_pystate.h:100
100 return tstate->interp;
(gdb) bt
#0 _PyInterpreterState_GET () at ./Include/internal/pycore_pystate.h:100
#1 PyOS_InterruptOccurred () at ./Modules/signalmodule.c:1785
#2 0x0000000000673905 in my_fgets (buf=buf@entry=0xa40780 "8LJ\367\377\177", len=len@entry=100, fp=fp@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>) at Parser/myreadline.c:87
#3 0x000000000067397b in PyOS_StdioReadline (sys_stdin=sys_stdin@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>, sys_stdout=sys_stdout@entry=0x7ffff74a5620 <_IO_2_1_stdout_>,
prompt=prompt@entry=0x7ffff6f8d8e0 ">>> ") at Parser/myreadline.c:269
#4 0x0000000000673b4f in PyOS_Readline (sys_stdin=0x7ffff74a48e0 <_IO_2_1_stdin_>, sys_stdout=0x7ffff74a5620 <_IO_2_1_stdout_>, prompt=0x7ffff6f8d8e0 ">>> ") at Parser/myreadline.c:355
#5 0x00000000005d90d9 in tok_nextc (tok=0xa3fd30) at Parser/tokenizer.c:856
#6 0x00000000005dad02 in tok_get (tok=tok@entry=0xa3fd30, p_start=p_start@entry=0x7fffffffd508, p_end=p_end@entry=0x7fffffffd510) at Parser/tokenizer.c:1194
#7 0x00000000005dcb69 in PyTokenizer_Get (tok=0xa3fd30, p_start=p_start@entry=0x7fffffffd508, p_end=p_end@entry=0x7fffffffd510) at Parser/tokenizer.c:1842
#8 0x0000000000653c73 in _PyPegen_fill_token (p=0x7ffff6f8f540) at Parser/pegen/pegen.c:551
#9 0x0000000000670355 in statement_newline_rule (p=0x7ffff6f8f540) at Parser/pegen/parse.c:1143
#10 interactive_rule (p=0x7ffff6f8f540) at Parser/pegen/parse.c:725
#11 _PyPegen_parse (p=p@entry=0x7ffff6f8f540) at Parser/pegen/parse.c:19388
#12 0x0000000000654b32 in _PyPegen_run_parser (p=0x7ffff6f8f540) at Parser/pegen/pegen.c:1037
#13 0x0000000000654e4c in _PyPegen_run_parser_from_file_pointer (fp=fp@entry=0x70f74a48e0, start_rule=start_rule@entry=80, filename_ob=filename_ob@entry=0x7ffff6f84eb0,
enc=enc@entry=0x7ffff704ec60 "utf-8", ps1=ps1@entry=0x7ffff6f8d8e0 ">>> ", ps2=ps2@entry=0x90000000d0 <error: Cannot access memory at address 0x90000000d0>, flags=0x7fffffffd7b8,
errcode=0x7fffffffd6e4, arena=0x7ffff6ff6b30) at Parser/pegen/pegen.c:1097
#14 0x00000000005d6bea in PyPegen_ASTFromFileObject (fp=0x70f74a48e0, fp@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>, filename_ob=filename_ob@entry=0x7ffff6f84eb0, mode=80, mode@entry=256,
enc=enc@entry=0x7ffff704ec60 "utf-8", ps1=ps1@entry=0x7ffff6f8d8e0 ">>> ", ps2=0x90000000d0 <error: Cannot access memory at address 0x90000000d0>, ps2@entry=0x7ffff6f8dbe0 "... ",
flags=<optimized out>, errcode=<optimized out>, arena=<optimized out>) at Parser/pegen/peg_api.c:52
#15 0x00000000005460d9 in PyRun_InteractiveOneObjectEx (fp=fp@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>, filename=filename@entry=0x7ffff6f84eb0, flags=flags@entry=0x7fffffffd7b8) at Python/pythonrun.c:243
#16 0x000000000054631e in PyRun_InteractiveLoopFlags (fp=fp@entry=0x7ffff74a48e0 <_IO_2_1_stdin_>, filename_str=filename_str@entry=0x673e32 "<stdin>", flags=flags@entry=0x7fffffffd7b8)
at Python/pythonrun.c:122
#17 0x0000000000546d4c in PyRun_AnyFileExFlags (fp=0x7ffff74a48e0 <_IO_2_1_stdin_>, filename=0x673e32 "<stdin>", closeit=0, flags=0x7fffffffd7b8) at Python/pythonrun.c:81
#18 0x0000000000429fb7 in pymain_run_stdin (cf=0x7fffffffd7b8, config=0x9bd800) at Modules/main.c:467
#19 pymain_run_python (exitcode=0x7fffffffd7b0) at Modules/main.c:556
#20 Py_RunMain () at Modules/main.c:632
#21 0x000000000042a2d6 in pymain_main (args=0x7fffffffd8a0) at Modules/main.c:662
#22 Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:686
#23 0x00007ffff7100830 in __libc_start_main (main=0x41ef30 <main>, argc=1, argv=0x7fffffffd9f8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffd9e8)
at ../csu/libc-start.c:291
#24 0x0000000000429089 in _start ()
(gdb) Same happens with Python 3.9.0b1+ (heads/3.9:588efc29c5d, May 30 2020, 14:16:10) (current HEAD of the 3.9 branch). In previous versions of Python this would exit the interpreter but not segfault. |
Thanks for report, I can reproduce it |
Simpler reproducer: import os
os.close(0) |
FYI this change fix this issue. --- a/Modules/signalmodule.c
+++ b/Modules/signalmodule.c
@@ -1782,7 +1782,11 @@ PyOS_FiniInterrupts(void)
int
PyOS_InterruptOccurred(void)
{
- PyInterpreterState *interp = _PyInterpreterState_GET();
+ PyThreadState *tstate = _PyThreadState_GET();
+ if (!tstate) {
+ return 0;
+ }
+ PyInterpreterState *interp = tstate->interp;
if (!_Py_ThreadCanHandleSignals(interp)) {
return 0;
} |
The following change modified PyOS_InterruptOccurred() to require the hold the GIL: commit d831688
-- PyOS_InterruptOccurred() is part of the limited C API, but it's not even documented :-( So it's not easy to say if it's a backward incompatible change. At least, as the author of the change, I can say that the change was deliberate. bpo-40010 rationale is that only the main thread of the main interpreter must call Python signal handlers. So when another thread or another interpreter (running the main thread) asks "should the Python signal handlers be executed", the answer is always "no", even if yes, Python got a signal (which requires to execute at least one Python signal handler). The change is that PyOS_InterruptOccurred() now requires to hold the GIL to get current Python thread state: _PyThreadState_GET() returns NULL when the GIL is released. Modifying PyOS_InterruptOccurred() to always return 0 if the GIL is released (if _PyThreadState_GET() returns NULL) is wrong: if the function is called by the main thread of the main interpreter, it must return 1. -- By the way, I rewrote the C signal handler in Python 3.9 to not require to get the current Python thread state. In Python 3.8, it was modified and started to require the current Python thread if writing into the wakeup file descriptor failed to schedule a "pending call". commit b54a99d
|
This change impacted gdb:
gdbpy_check_quit_flag() called PyOS_InterruptOccurred() without holding the GIL. It worked in Python 3.8, but started to crash in Python 3.9 beta1. gdb had another issue, it started by always releasing the GIL and always called the Python C API with the GIL released (if I understood correctly). gdb was fixed by calling PyGILState_Ensure()/PyGILState_Release() when calling the Python C API, especially PyOS_InterruptOccurred(). Maybe the minimum would be to *document* that the GIL must be held to call PyOS_InterruptOccurred(), and that it's not a new requirements. |
I can reproduce PyOS_InterruptOccurred() crash in Python 3.8 if I remove readline.cpython-38d-x86_64-linux-gnu.so and I disable EINTR error checking in my_fgets(): diff --git a/Parser/myreadline.c b/Parser/myreadline.c
index 43e5583b8b..2712dedacd 100644
--- a/Parser/myreadline.c
+++ b/Parser/myreadline.c
@@ -73,7 +73,7 @@ my_fgets(char *buf, int len, FILE *fp)
clearerr(fp);
return -1; /* EOF */
}
-#ifdef EINTR
+#if 0
if (err == EINTR) {
int s;
PyEval_RestoreThread(_PyOS_ReadlineTState); vstinner@apu$ ./python
Python 3.8.3+ (heads/3.8-dirty:00a240bf7f, Jun 1 2020, 17:00:22)
[GCC 10.1.1 20200507 (Red Hat 10.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
^C
Erreur de segmentation (core dumped) I cannot reproduce the issue in Python 3.7. --- Python 3.7: int
PyOS_InterruptOccurred(void)
{
if (_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
if (PyThread_get_thread_ident() != main_thread)
return 0;
_Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
return 1;
}
return 0;
} Python 3.8: static int
is_main(_PyRuntimeState *runtime)
{
unsigned long thread = PyThread_get_thread_ident();
PyInterpreterState *interp = _PyRuntimeState_GetThreadState(runtime)->interp;
return (thread == runtime->main_thread
&& interp == runtime->interpreters.main);
}
int
PyOS_InterruptOccurred(void)
{
if (_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
_PyRuntimeState *runtime = &_PyRuntime;
if (!is_main(runtime)) {
return 0;
}
_Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
return 1;
}
return 0;
} is_main() function was added in Python 3.8 by: commit 64d6cc8
|
It seems like this issue was introduced by this change: commit 7281898
Include/internal/pycore_pyerrors.h | 2 + Before: static inline int
_Py_IsMainInterpreter(PyThreadState* tstate)
{
/* Use directly _PyRuntime rather than tstate->interp->runtime, since
this function is used in performance critical code path (ceval) */
return (tstate->interp == _PyRuntime.interpreters.main);
}
static inline int
_Py_ThreadCanHandleSignals(PyThreadState *tstate)
{
return (_Py_IsMainThread() && _Py_IsMainInterpreter(tstate));
}
static int
thread_can_handle_signals(void)
{
PyThreadState *tstate = _PyThreadState_GET();
return _Py_ThreadCanHandleSignals(tstate);
}
int
PyOS_InterruptOccurred(void)
{
if (_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
if (!thread_can_handle_signals()) {
return 0;
}
_Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
return 1;
}
return 0;
}
---
After:
---
static inline int
_Py_IsMainInterpreter(PyThreadState* tstate)
{
/* Use directly _PyRuntime rather than tstate->interp->runtime, since
this function is used in performance critical code path (ceval) */
return (tstate->interp == _PyRuntime.interpreters.main);
}
static inline int
_Py_ThreadCanHandleSignals(PyThreadState *tstate)
{
return (_Py_IsMainThread() && _Py_IsMainInterpreter(tstate));
}
int
PyOS_InterruptOccurred(void)
{
PyThreadState *tstate = _PyThreadState_GET();
if (!_Py_ThreadCanHandleSignals(tstate)) {
return 0;
}
if (!_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
return 0;
}
_Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
return 1;
} The difference is that tstate->interp is now checked *before* checking if SIGINT signal is tripped. |
So PyOS_InterruptOccurred() must be called with the GIL held since 3.8, it's just that the nobody noticed the bug before. If SIGGINT is tripped and the GIL is released, PyOS_InterruptOccurred() does also crash in Python 3.8. -- One way to see the bug in Python 3.8 without having to trip SIGINT. Apply this patch: diff --git a/Modules/signalmodule.c b/Modules/signalmodule.c
index 0c9a2671fe..b850af3163 100644
--- a/Modules/signalmodule.c
+++ b/Modules/signalmodule.c
@@ -1729,11 +1729,12 @@ PyOS_FiniInterrupts(void)
int
PyOS_InterruptOccurred(void)
{
+ _PyRuntimeState *runtime = &_PyRuntime;
+ if (!is_main(runtime)) {
+ return 0;
+ }
+
if (_Py_atomic_load_relaxed(&Handlers[SIGINT].tripped)) {
- _PyRuntimeState *runtime = &_PyRuntime;
- if (!is_main(runtime)) {
- return 0;
- }
_Py_atomic_store_relaxed(&Handlers[SIGINT].tripped, 0);
return 1;
} $ make
$ find -name "*readline*so" -delete
$ ./python
Python 3.8.3+ (heads/3.8-dirty:00a240bf7f, Jun 1 2020, 17:24:09)
[GCC 10.1.1 20200507 (Red Hat 10.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os; os.close(0)
>>>
Erreur de segmentation (core dumped) Reproduce the bug in Python 3.8 without modifying the code, using gdb to trigger events SIGINT at the right place: $ gdb ./python
(gdb) b my_fgets
Breakpoint 1 at 0x68b941: file Parser/myreadline.c, line 39. (gdb) run
Breakpoint 1, my_fgets (...) (gdb) delete 1 (gdb) signal SIGINT Program received signal SIGSEGV, Segmentation fault. |
I add "Python 3.8" version: it's also affected. |
See also bpo-40839: Disallow calling PyDict_GetItem() with the GIL released. |
Ok, I fixed bugs in the REPL on Linux and Windows in 3.8, 3.9 and master branches. I also modified PyOS_InterruptOccurred() to fail with a fatal error and a nice error message, rather than crashing with a core dump, if it's called with the GIL released. I changed in 3.9 and master branches. Thanks Jelle Zijlstra for the bug report! I'm now closing the issue. |
I haven't had a chance to look too deeply, but the final set of commits (starting with fa7ab6a) appear to be the common thread with all branches now failing with timeout exceptions in test_repl on the Windows 10 buildbot. The first instance was in the 3.x branch (https://buildbot.python.org/all/#/builders/129/builds/1191) but they all seem similar. -- David |
It appears the recent commit is causing a CRT exception dialog in test_close_stdin (test_repl.TestInteractiveInterpreter). The dialog can't get answered, which is what leads to the eventual timeout. The assertion is "_osfile(fh) & FOPEN" from osfinfo.cpp on line 258. It actually appears twice (if the first is ignored). If both are ignored the test passes. I was somewhat surprised to see the dialog, as I believe CRT exception dialogs are disabled during test runs (others are present in test logs). Perhaps the child interpreter used by test_repl doesn't inherit that behavior. If so, this was probably always a risk, there just weren't any assertions occurring before this in the child interpreter. |
The OS error mode is inherited, unless the child is created with CREATE_DEFAULT_ERROR_MODE flag. But the CRT error mode isn't inherited. (For the spawn* family, in principle the CRT could pass its error modes in reserved STARTUPINFO fields, like it does for file descriptors, but it doesn't do that, and subprocess wouldn't be privy to that protocol anyway.) For example: >>> import sys, subprocess
>>> from msvcrt import *
>>> SEM_FAILCRITICALERRORS, CRTDBG_MODE_FILE, CRTDBG_MODE_WNDW
(1, 1, 4)
>>> SetErrorMode(SEM_FAILCRITICALERRORS)
0
>>> CrtSetReportMode(CRT_ERROR, CRTDBG_MODE_FILE)
4 The return values are the previous values, which are respectively 0 (default) for the OS and CRTDBG_MODE_WNDW (message box) for the CRT. Now check the initial values in a child process:
>>> from msvcrt import *
>>> SetErrorMode(SEM_FAILCRITICALERRORS)
1
>>> CrtSetReportMode(CRT_ERROR, CRTDBG_MODE_FILE)
4 The OS error mode of the parent was inherited, but the parent's CRT error mode was not. libregrtest has a convenient function to suppress error reporting in the child. For example:
>>> import os
>>> from test.libregrtest import setup
>>> setup.suppress_msvcrt_asserts(0)
>>> os.close(0)
>>>
0 |
I am confused. Code which uses the closed file descriptor 0 uses "_Py_BEGIN_SUPPRESS_IPH":
and
Macros defined as: extern _invalid_parameter_handler _Py_silent_invalid_parameter_handler;
#define _Py_BEGIN_SUPPRESS_IPH { _invalid_parameter_handler _Py_old_handler = \
_set_thread_local_invalid_parameter_handler(_Py_silent_invalid_parameter_handler);
#define _Py_END_SUPPRESS_IPH _set_thread_local_invalid_parameter_handler(_Py_old_handler); } with PC/invalid_parameter_handler.c: static void __cdecl _silent_invalid_parameter_handler( _invalid_parameter_handler _Py_silent_invalid_parameter_handler = _silent_invalid_parameter_handler; The purpose of _Py_BEGIN_SUPPRESS_IPH is to suppress such popup, no? |
That macro suppresses the CRT's invalid parameter handler, which by default terminates abnormally with a fastfail (status code 0xC0000409), even in a release build. In a debug build, we still have message boxes from failed asserts that call _CrtDbgReportW. For example, the following stack trace shows _CrtDbgReportW called from _get_osfhandle:
_get_osfhandle validates that the fd is open via
which uses the following macro: #define _VALIDATE_CLEAR_OSSERR_RETURN(expr, errorcode, retexpr) \
{ \
int _Expr_val=!!(expr); \
_ASSERT_EXPR((_Expr_val), _CRT_WIDE(#expr)); \
if (!(_Expr_val)) \
{ \
_doserrno = 0L; \
errno = errorcode; \
_INVALID_PARAMETER(_CRT_WIDE(#expr)); \
return retexpr; \
} \
} In a debug build, _ASSERT_EXPR expands as: #define _ASSERT_EXPR(expr, msg) \
(void)( \
(!!(expr)) || \
(1 != _CrtDbgReportW(_CRT_ASSERT, _CRT_WIDE(__FILE__), __LINE__, NULL, L"%ls", msg)) || \
(_CrtDbgBreak(), 0) \
) which is where _CrtDbgReportW gets called. By default it reports the assertion failure with a message box. The suppression of this in libregrtest either ignores this report or sends it to stderr: for m in [msvcrt.CRT_WARN, msvcrt.CRT_ERROR, msvcrt.CRT_ASSERT]:
if verbose:
msvcrt.CrtSetReportMode(m, msvcrt.CRTDBG_MODE_FILE)
msvcrt.CrtSetReportFile(m, msvcrt.CRTDBG_FILE_STDERR)
else:
msvcrt.CrtSetReportMode(m, 0) msvcrt.CrtSetReportMode and msvcrt.CrtSetReportFile only exist in a debug build. |
So a script I use on the buildbot intended to prevent hanging on assertions was failing to work in this particular case, which I've corrected. That changes the failure mode for new Win10 buildbot runs. The test in question (test_close_stdin) still fails, but does so immediately (return code not 0, as the child process aborts) rather than timing out. If the assertion is expected, and something to be ignored, the test itself still needs to handle that. |
Thanks for the bug report David Bolen and thanks for the analysis Eryk Sun. The bug should be be fixed in the master branch, and backports to 3.8 and 3.9 are coming. |
Why doesn't SuppressCrashReport suppress the hard error dialog (i.e. SEM_FAILCRITICALERRORS)? This dialog gets created by the NtRaiseHardError system call. For example:
With the default OS error mode, the following raises a hard error dialog for STATUS_UNSUCCESSFUL (0xC0000001), with abort/retry/ignore options: >>> NtRaiseHardError(0xC000_0001, 0, 0, None, 0, response)
0
>>> response[0]
2 The response value is limited to abort (2), retry (7), ignore (4) -- or simply returned (0) when the process hard error mode is disabled: >>> msvcrt.SetErrorMode(msvcrt.SEM_FAILCRITICALERRORS)
0
>>> NtRaiseHardError(0xC000_0001, 0, 0, None, 0, response)
0
>>> response[0]
0 NtRaiseHardError also checks for an error-mode override flag (0x1000_0000) in the status code, which is used in cases such as WinAPI FatalAppExitW. For example, the following will raise the dialog regardless of the hard error mode: >>> NtRaiseHardError(0xC000_0001 | 0x1000_0000, 0, 0, None, 0, response)
0
>>> response[0]
2 |
Would you mind to open a separated issue? |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: