-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add os.waitstatus_to_exitcode() function #84275
Comments
os.wait() and os.waitpid() returns a "status" number which is not easy to return. It's made of two information: (how the process completed, value). The usual way to handle it is to use a code which looks like: if os.WIFSIGNALED(status):
self.returncode = -os.WTERMSIG(status)
elif os.WIFEXITED(status):
self.returncode = os.WEXITSTATUS(status)
elif os.WIFSTOPPED(status):
self.returncode = -os.WSTOPSIG(status)
else:
raise Exception("... put your favorite error message here ...") It's not convenient to have to duplicate this code each time we have to handle a wait status. Moreover, WIFSTOPPED() is commonly treated as "the process was killed by a signal", whereas the process is still alive but was only stopped. WIFSTOPPED() should only happen when the process is traced (by ptrace), or if waitpid() was called with WUNTRACED option. The common case is not to trace a process or to use WUNTRACED. Moreover, if WIFSTOPPED() is true, the process is still alive and can continue its execution. It's bad to consider it as completed. The subprocess module has such bug: Popen._handle_exitstatus() returns -os.WSTOPSIG(sts) if os.WIFSTOPPED(sts) is true. On the other side, the pure Python implementation os._spawnvef() calls again waitpid() if WIFSTOPPED() is true. That sounds like a better behavior. while 1:
wpid, sts = waitpid(pid, 0)
if WIFSTOPPED(sts):
continue
But I'm not sure how WIFSTOPPED() can be true, since this function creates a child process using os.fork() and it doesn't use os.WUNTRACED flag. I propose to add a private os._wait_status_to_returncode(status) helper function: Convert a wait() or waitpid() status to a returncode. If WIFEXITED(status) is true, return WEXITSTATUS(status). If the process is being traced or if waitpid() was called with WUNTRACED I'm not sure if it's a good idea to add the helper as a private function. Someone may discover it and starts to use it. If we decide to make it public tomorrow, removing os._wait_status_to_returncode() would break code. Maybe it's better to directly a public function? But I'm not sure if it's useful, nor if the function name is good, nor if good to helper an function function directly in the os module. Maybe such helper should be added to shutil instead which is more the "high-level" flavor of the os module? I chose to add it to the os module for different reasons:
What do you think?
|
Hum, I changed my mind and I think that it's worth it to make the function public. Moreover, I prefer "exitcode", since it is closer to "WEXITSTATUS" name than "returncode". So I renamed the function os.status_to_exitcode(). Advantages of the new new function compared to exiting code:
|
I modified my PR to add Windows support. On Windows, os.waitpid() status also requires an operation (shif right by 8 bits) to get an exitcode from the waitpid status. So IMO it's worth it to add it to Windows as well, which makes the function even more useful ;-) |
Interesting information about process "exit status code": |
The function can be used to convert the result of the following functions:
Note: waitid() has a different API, si_status can be used directly, its meaning depends on si_code. -- First, I proposed os._wait_status_to_returncode() name. I renamed it to os.status_to_exitcode(). Maybe the "status" term used alone is too general. I'm not sure neither if it's correct to write "exitcode" a single word. For example, Windows uses GetExitCodeProcess() name: "Exit Code" means that they are two separated words. The Python os documentation uses "exit code" and "exit status", but never "exitcode" or "exitstatus". The system() function manual page says: """ Python pty.spawn() documentation says: "(...) returns the status value from os.waitpid() on the child process". -- In my current PR, the function documentation is: "Convert an exit status to an exit code" Other name ideas:
The Wikipedia article is called "Exit status" and then tells about "exit status code" :-) So it can be surprising to have a function supposed to convert an "exit status" into an "exit code". It seems like "Convert a waitpid() wait status to an exit code" is the best documentation and so that wait_status_to_exit_code() is the most correct and least confusing name, even if it's longer than other name candidates. |
Well, anothe option is:
While the documentation uses "exit code", the code commonly uses "exitcode" or "returncode". Moreover, in the os module, underscore is not used to separated words in function names. Examples: "getenv" not "get_env", "setpriority" not "set_priority", etc. Using "_to_" in the function name reduces the risk of conflict with a future addition to the libc. It's rare that libc function names use "_". One of the few exception: get_current_dir_name() which is a glibc extension. |
Ok, I chose os.waitstatus_to_exitcode() name. I updated my PR. |
FWIW, I wouldn't recommend relying on os.waitpid to get the correct process exit status in Windows. Status codes are 32 bits and generally all bits are required. os.waitpid left shifts the exit status by 8 bits in a dubious attempt to return a result that's "more like the POSIX waitpid". In particular, a program may exit abnormally with an NTSTATUS 1 or HRESULT 2 code such as STATUS_DLL_NOT_FOUND (0xC000_0135) or STATUS_CONTROL_C_EXIT (0xC000_013A). |
Eryk:
os.waitpid() calls _cwait() on Windows and uses "status << 8". The result is a Python object. IMHO it's ok if the shifted result ("status") is larger than 32 bits. But I'm not sure that the current os.waitpid() implementation handles integer overflow correctly... When I look at GetExitCodeProcess() documentation, I don't see any distinction between "normal exit" and a program terminated by TerminateProcess(). The only different is the actual exit code: Do you suggest that os.waitstatus_to_exitcode() result should be negative if a process was terminated by TerminateProcess()? -- My PR 19201 is based on the current Python implementation and assumptions used in the current code. But I don't think that what you wrote can be an API issue. It's more the opposite, if tomorrow we want to encode the status of a terminated process differently, it will be easier if os.waitstatus_to_exitcode() is available, no? |
This new status-to-exitcode function applies to Windows waitpid() only due to a design choice in CPython -- not the operating system. The current waitpid() implementation assumes it's okay to discard the upper 8 bits of the exit status, which can lose important information. Maybe it's best to address Windows support in a new issue that also addresses the design of waitpid() in 3.9, in particular if this would change the design of the new function -- at the very least with regard to data type (e.g. Off topic: Despite the function name, waitpid in Windows takes a process handle, such as is returned by os.spawn*, and not a process ID, such as is required by os.kill. The current documentation is sometimes clear on this detail but sometimes confusingly mixes up "handle" and "id" in Windows-only sections.
The overflow problem could be addressed by using a 64-bit value for the status in os_waitpid_impl and elsewhere.
Returning a signed result is an interesting suggestion. The native process exit status is actually an NTSTATUS value, and NTSTATUS and HRESULT codes are signed, with failure codes (i.e. errors and warnings) reported as negative values. That said, the exit status gets handled as an unsigned value in the Windows API, e.g. ExitProcess, TerminateProcess, and GetExitCodeProcess.
In almost all cases a process terminates via TerminateProcess -- or rather the internal NTAPI function NtTerminateProcess. For a clean exit via ExitProcess (i.e. native RtlExitUserProcess), NtTerminateProcess gets called twice. The first time it gets called specially (with the process handle passed as NULL) in order to forcefully terminate all other threads in the process. Once the thread that calls ExitProcess is the last remaining thread, the loader shuts down the user-mode aspects of the process (e.g. it calls DLL entry points for process detach). Finally, the last thread makes a regular NtTerminateProcess call (with the current-process handle instead of NULL), which actually terminates the process. An abnormal termination just does the latter step, but it doesn't necessarily use an exit status value that clearly indicates an abnormal termination. Thus not all abnormal terminations can be identified as such. Also, nothing stops a normal termination via ExitProcess from using an NTSTATUS code. For example, the default control handler for a console process exits via ExitProcess with the status code STATUS_CONTROL_C_EXIT (0xC000_013A). This is similar to an abnormal exit, since the process is killed by the closest thing to a Unix 'signal' that Windows console applications support. Moreover, the same status code is used for a genuinely abnormal exit due to a Ctrl+Close event (i.e. the console window was closed) if the session server is forced to terminate a console process that doesn't exit gracefully in the allotted time (default 5 seconds). |
Eryk:
That's a bug which is independent of this issue.
Ok, so the current os.waitstatus_to_exitcode() design is fine. On Windows, we can just consider all exit code as a "normal" process exit code. And there is no need to modify os.waitpid() to return a negative value for values larger than (INT_MAX >> 8). We should "just" fix os.waitstatus_to_exitcode() to accept any Python integer and simply compute "x >> 8", whereas currently the argument is casted to a C int. I propose to fix os.waitpid() and os.waitstatus_to_exitcode() for "large" exit code on Windows in a second time. |
sys.exit() accepts negative number and values larger than 255. I checked with strace: Python calls Linux exit_group() syscall with the value passed to sys.exit(). But then os.waitid() (waitid, not waitpid!) returns the lower 8-bits of the exit code. In fact, the exit_group() syscall truncates the exit status: So on Linux, an exit code is always in the range [0; 255]. For example, exit_group(-1) syscall gives an exit code of 255. |
TODO:
Eryk Sun:
I created bpo-40138 "Windows implementation of os.waitpid() truncates the exit status (status << 8)". |
See also: "Appendix E. Exit Codes With Special Meanings" section of the Bash documentation |
This code path was added by bpo-29335. |
See also bpo-40155: "AIX: test_builtin.test_input_no_stdout_fileno() hangs". |
I created bpo-40364 for that. |
"""
Well, let's keep the status quo: leave os and subprocess modules unchanged. It can be revisited later if needed. My intent when I created this issue wasn't to change the behavior of other modules, just to add a new function to remove duplicated code ;-) |
The initial issue has been implemented: I added os.waitstatus_to_exitcode() function to Python 3.9. It's now well documented, I close the issue. See sub-issues like bpo-40364 (asyncio) for further cleanups. |
The internal exitcode handling of a process changed since python 3.8 (python/cpython#84275) which casued that when executing a pipeline through the UI the exit code from the process is 1 even when using sys.exit(0). When we use the same multiprocessing context this seems to be handled properly in python and we get the correct exit code.
The internal exitcode handling of a process changed since python 3.8 (python/cpython#84275) which caused that when executing a pipeline through the UI the exit code from the process is 1 even when using sys.exit(0). When we use the same multiprocessing context this seems to be handled properly in python and we get the correct exit code.
* fix exitcode of process issue since python 3.8 The internal exitcode handling of a process changed since python 3.8 (python/cpython#84275) which caused that when executing a pipeline through the UI the exit code from the process is 1 even when using sys.exit(0). When we use the same multiprocessing context this seems to be handled properly in python and we get the correct exit code. * typo * fix web UI execution, exitcode still wrong The exitcode is sometimes still zero, even I use the multiprocessing context. To get around that issue, we do not interpret the exitcode anymore but rely 100% on the status_queue: The status queue must return True, otherwise it is counted as unsuccessful. So, in the 'run' method the succeeded status should only be passed to the queue when we are sure that the task execution succeeded. * fix passing multiprocessing_context * add pipeline failed test
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: