Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subprocess.Popen: Performance regression on Linux since 124af17b6e [CVE-2023-6507] #112334

Closed
vain opened this issue Nov 23, 2023 · 3 comments · Fixed by #112617
Closed

subprocess.Popen: Performance regression on Linux since 124af17b6e [CVE-2023-6507] #112334

vain opened this issue Nov 23, 2023 · 3 comments · Fixed by #112617
Assignees
Labels
3.12 bugs and security fixes 3.13 new features, bugs and security fixes extension-modules C modules in the Modules dir performance Performance or resource usage release-blocker type-bug An unexpected behavior, bug, or error type-security A security issue

Comments

@vain
Copy link

vain commented Nov 23, 2023

Bug report

Bug description:

Apologies if this is a duplicate. I couldn’t find a similar report, though.

The issue and how to reproduce

We’re seeing a performance regression since 124af17. The best way to reproduce it is to spawn lots of processes from a ThreadPoolExecutor. For example:

#!/usr/bin/env python3


from concurrent.futures import ThreadPoolExecutor, wait
from subprocess import Popen


def target(i):
    p = Popen(['touch', f'/tmp/reproc-{i}'])
    p.communicate()


executor = ThreadPoolExecutor(max_workers=64)
futures = set()

for i in range(10_000):
    futures.add(executor.submit(target, i))

wait(futures)

Before, on 49cae39, it’s roughly this:

real    0m2.419s
user    0m4.524s
sys     0m0.976s

Since 124af17, it’s roughly this:

real    0m11.772s
user    0m10.287s
sys     0m14.409s

An attempt at an analysis and possible fix

strace shows that the new code doesn’t use vfork() anymore but clone(). I believe that the reason for this is an incorrect check of num_groups (or extra_group_size, as it is now called on main).

124af17 checks if extra_group_size is less than zero to determine if we can use vfork(). Is that correct? Maybe this should be equal to zero? I’m talking about these two locations (diff relative to main/9e56eedd018e1a46):

diff --git a/Modules/_posixsubprocess.c b/Modules/_posixsubprocess.c
index 2898eedc3e..fb6c235901 100644
--- a/Modules/_posixsubprocess.c
+++ b/Modules/_posixsubprocess.c
@@ -889,7 +889,7 @@ do_fork_exec(char *const exec_array[],
         /* These are checked by our caller; verify them in debug builds. */
         assert(uid == (uid_t)-1);
         assert(gid == (gid_t)-1);
-        assert(extra_group_size < 0);
+        assert(extra_group_size == 0);
         assert(preexec_fn == Py_None);

         /* Drop the GIL so that other threads can continue execution while this
@@ -1208,7 +1208,7 @@ subprocess_fork_exec_impl(PyObject *module, PyObject *process_args,
     /* Use vfork() only if it's safe. See the comment above child_exec(). */
     sigset_t old_sigs;
     if (preexec_fn == Py_None && allow_vfork &&
-        uid == (uid_t)-1 && gid == (gid_t)-1 && extra_group_size < 0) {
+        uid == (uid_t)-1 && gid == (gid_t)-1 && extra_group_size == 0) {
         /* Block all signals to ensure that no signal handlers are run in the
          * child process while it shares memory with us. Note that signals
          * used internally by C libraries won't be blocked by

extra_group_size is the result of the call to PySequence_Size(extra_groups_packed). If I understand the docs correctly, then this function only returns negative values to indicate errors. This error condition is already checked, right after the call itself:

extra_group_size = PySequence_Size(extra_groups_packed);

if (extra_group_size < 0)
    goto cleanup;

Later in the code, extra_group_size can never be less than zero. It can, however, be equal to zero if extra_groups is an empty list. I believe this is what was meant to be checked here.

I’ll happily open a PR for this if you agree that this is the way to go.

CPython versions tested on:

3.11, 3.12, 3.13, CPython main branch

Operating systems tested on:

Linux

Linked PRs

@vain vain added the type-bug An unexpected behavior, bug, or error label Nov 23, 2023
@vain
Copy link
Author

vain commented Nov 23, 2023

Gentle ping @arhadthedev because you authored the referenced commit 124af17. :)

@AlexWaygood AlexWaygood added performance Performance or resource usage 3.12 bugs and security fixes 3.13 new features, bugs and security fixes extension-modules C modules in the Modules dir labels Nov 23, 2023
@gpshead
Copy link
Member

gpshead commented Nov 23, 2023

Thanks for the bug and deep dive analysis! We'll take a look. (yes this code is a lot more complicated than any of us would like) 😅

@kfrydel
Copy link

kfrydel commented Dec 1, 2023

The change from 124af17 also impacts the performance if there are a lot of imports in the process spawning new processes. For example:

import timeit
import subprocess

# a lot of local imports here

def test():
    popen = subprocess.Popen(["python3.12", "-c", "1+1"])
    popen.communicate()


result = timeit.timeit(test, number=1000)
print(f"Timeit: {result}")

On my PC, it prints ~26s for 3.12 and ~24s for 3.11. When I remove all imports from # a lot of local imports here the time decreases to 22s on Python 3.12. Removing/adding imports does not impact execution time on Python 3.11. strace shows that P3.11 uses vfork, and 3.12 uses clone. And the fix proposed by @vain seems to fix this as well.

gpshead added a commit to gpshead/cpython that referenced this issue Dec 2, 2023
…roups=[]`.

Fixed a performance regression in 3.12's :mod:`subprocess` on Linux where it
would no longer use the fast-path ``vfork()`` system call when it could have
due to a logic bug, instead falling back to the safe but slower ``fork()``.

Also fixed a second 3.12.0 potential security bug.  If a value of
``extra_groups=[]`` was passed to :mod:`subprocess.Popen` or related APIs,
the underlying ``setgroups(0, NULL)`` system call to clear the groups list
would not be made in the child process prior to ``exec()``.

The security issue was identified via code inspection in the process of
fixing the first bug.  Thanks to @vain for the detailed report and
analysis in the initial bug on Github.

 * [ ] A regression test is desirable. I'm pondering a test that runs when
   `strace` is available and permitted which and confirms use of `vfork()`
   vs `clone()`...
gpshead added a commit that referenced this issue Dec 4, 2023
…[]` behavior (#112617)

Restore `subprocess`'s intended use of `vfork()` by default for performance on Linux;
also fixes the behavior of `extra_groups=[]` which was unintentionally broken in 3.12.0:

Fixed a performance regression in 3.12's :mod:`subprocess` on Linux where it
would no longer use the fast-path ``vfork()`` system call when it could have
due to a logic bug, instead falling back to the safe but slower ``fork()``.

Also fixed a security bug introduced in 3.12.0.  If a value of ``extra_groups=[]``
was passed to :mod:`subprocess.Popen` or related APIs, the underlying
``setgroups(0, NULL)`` system call to clear the groups list would not be made
in the child process prior to ``exec()``.

The security issue was identified via code inspection in the process of
fixing the first bug.  Thanks to @vain for the detailed report and
analysis in the initial bug on Github.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Dec 4, 2023
…roups=[]` behavior (pythonGH-112617)

Restore `subprocess`'s intended use of `vfork()` by default for performance on Linux;
also fixes the behavior of `extra_groups=[]` which was unintentionally broken in 3.12.0:

Fixed a performance regression in 3.12's :mod:`subprocess` on Linux where it
would no longer use the fast-path ``vfork()`` system call when it could have
due to a logic bug, instead falling back to the safe but slower ``fork()``.

Also fixed a security bug introduced in 3.12.0.  If a value of ``extra_groups=[]``
was passed to :mod:`subprocess.Popen` or related APIs, the underlying
``setgroups(0, NULL)`` system call to clear the groups list would not be made
in the child process prior to ``exec()``.

The security issue was identified via code inspection in the process of
fixing the first bug.  Thanks to @vain for the detailed report and
analysis in the initial bug on Github.

(cherry picked from commit 9fe7655)

Co-authored-by: Gregory P. Smith <greg@krypto.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
gpshead added a commit that referenced this issue Dec 4, 2023
…groups=[]` behavior (GH-112617) (#112731)

Restore `subprocess`'s intended use of `vfork()` by default for performance on Linux;
also fixes the behavior of `extra_groups=[]` which was unintentionally broken in 3.12.0:

Fixed a performance regression in 3.12's :mod:`subprocess` on Linux where it
would no longer use the fast-path ``vfork()`` system call when it could have
due to a logic bug, instead falling back to the safe but slower ``fork()``.

Also fixed a security bug introduced in 3.12.0.  If a value of ``extra_groups=[]``
was passed to :mod:`subprocess.Popen` or related APIs, the underlying
``setgroups(0, NULL)`` system call to clear the groups list would not be made
in the child process prior to ``exec()``.

The security issue was identified via code inspection in the process of
fixing the first bug.  Thanks to @vain for the detailed report and
analysis in the initial bug on Github.

(cherry picked from commit 9fe7655)

+ Reword NEWS for the bugfix/security release. (mentions the assigned CVE number)

Co-authored-by: Gregory P. Smith [Google LLC] <greg@krypto.org>
Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
gpshead added a commit to gpshead/cpython that referenced this issue Dec 4, 2023
@gpshead gpshead changed the title subprocess.Popen: Performance regression on Linux since 124af17b6e subprocess.Popen: Performance regression on Linux since 124af17b6e [CVE-2023-6507] Dec 8, 2023
@gpshead gpshead added the type-security A security issue label Dec 8, 2023
gpshead added a commit that referenced this issue Dec 9, 2023
Regression test that vfork is used when expected by subprocess.

This is written integration test style, it uses strace if it is present and appears to work to find out what system call actually gets used in different scenarios.

Test coverage is added for the default behavior and that of each of the specific arguments that must disable the use of vfork.  obviously not an entire test matrix, but it covers the most important aspects.

If there are ever issues with this test being flaky or failing on new platforms, rather than try and adapt it for all possible platforms, feel free to narrow the range it gets tested on when appropriate. That is not likely to reduce coverage.
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…roups=[]` behavior (python#112617)

Restore `subprocess`'s intended use of `vfork()` by default for performance on Linux;
also fixes the behavior of `extra_groups=[]` which was unintentionally broken in 3.12.0:

Fixed a performance regression in 3.12's :mod:`subprocess` on Linux where it
would no longer use the fast-path ``vfork()`` system call when it could have
due to a logic bug, instead falling back to the safe but slower ``fork()``.

Also fixed a security bug introduced in 3.12.0.  If a value of ``extra_groups=[]``
was passed to :mod:`subprocess.Popen` or related APIs, the underlying
``setgroups(0, NULL)`` system call to clear the groups list would not be made
in the child process prior to ``exec()``.

The security issue was identified via code inspection in the process of
fixing the first bug.  Thanks to @vain for the detailed report and
analysis in the initial bug on Github.

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
…ython#112734)

Regression test that vfork is used when expected by subprocess.

This is written integration test style, it uses strace if it is present and appears to work to find out what system call actually gets used in different scenarios.

Test coverage is added for the default behavior and that of each of the specific arguments that must disable the use of vfork.  obviously not an entire test matrix, but it covers the most important aspects.

If there are ever issues with this test being flaky or failing on new platforms, rather than try and adapt it for all possible platforms, feel free to narrow the range it gets tested on when appropriate. That is not likely to reduce coverage.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 new features, bugs and security fixes extension-modules C modules in the Modules dir performance Performance or resource usage release-blocker type-bug An unexpected behavior, bug, or error type-security A security issue
Projects
Development

Successfully merging a pull request may close this issue.

5 participants