Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-35537: use os.posix_spawn in subprocess #11242

Closed
wants to merge 4 commits into from
Closed

bpo-35537: use os.posix_spawn in subprocess #11242

wants to merge 4 commits into from

Conversation

nanjekyejoannah
Copy link
Member

@nanjekyejoannah nanjekyejoannah commented Dec 19, 2018

This patch uses os.posix_spawn in subprocess.

posix_spawn() is faster but it's also safer: it allows us to do a lot of "actions" before exec(), before executing the new program. For example, you can close files and control signals.

@vstinner this is the patch without any options.

https://bugs.python.org/issue35537

@vstinner
Copy link
Member

@nanjekyejoannah: Thanks! Can you please open an issue on bugs.python.org to get a bug number? Then add "bpo-xxx: " to the title of your PR.

Can you also please add a NEWS entry? You can use the "blurb" tool: https://devguide.python.org/committing/#what-s-new-and-news-entries

@nanjekyejoannah
Copy link
Member Author

@vstinner ack. Am doing this shortly.

@nanjekyejoannah nanjekyejoannah changed the title use os.posix_spawn in subprocess bpo-35537: use os.posix_spawn in subprocess Dec 19, 2018
and not pass_fds
and cwd is None
and env is None
and restore_signals is False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and not restore_signals" would be better. I prefer to avoid "is" to compare with True/False.

@@ -1390,6 +1390,11 @@ def _get_handles(self, stdin, stdout, stderr):
errread, errwrite)


def _posix_spawn(self, args, executable):
"""Execute program using os.posix_spawn()."""
env = dict(os.environ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really needeed to copy os.environ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to pass directly os.environ to os.posix_spawn().

@@ -0,0 +1 @@
subprocess.Popen can now use posix_spawn() in some cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better rendering (create links), I suggest:

Suggested change
subprocess.Popen can now use posix_spawn() in some cases.
:class:`subprocess.Popen` can now use :func:`os.posix_spawn` in some cases.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I will wait until minor remarks are addressed.

Lib/subprocess.py Show resolved Hide resolved
and env is None
and restore_signals is False
and not start_new_session):
return self._posix_spawn(args, executable)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to pass (executable, args) to follow os.posix_spawn() API. The function has not return value, so I suggest to write instead:

Suggested change
return self._posix_spawn(args, executable)
self._posix_spawn(executable, args)
return

@@ -1390,6 +1390,11 @@ def _get_handles(self, stdin, stdout, stderr):
errread, errwrite)


def _posix_spawn(self, args, executable):
"""Execute program using os.posix_spawn()."""
env = dict(os.environ)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to pass directly os.environ to os.posix_spawn().

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a sentence to the Optimization section of Doc/whatsnew/3.8.rst? Something like:

"subprocess.Popen is now up to 60x faster on Linux and 106x faster on FreeBSD when os.posix_spawn() can be used."

@serhiy-storchaka
Copy link
Member

60x and 106x are from particular examples. If you use different examples (allocate 1 GiB or 4 GiB of memory in the parent) you can get different numbers. Concrete numbers depend on a program and do not have meaning without referring to concrete program.

@vstinner
Copy link
Member

60x and 106x are from particular examples. If you use different examples (allocate 1 GiB or 4 GiB of memory in the parent) you can get different numbers. Concrete numbers depend on a program and do not have meaning without referring to concrete program.

It's hard to measure the average speedup, but IMHO it's nice to say that it can be way faster in some cases. So do you want to explain it differently? For example:

"subprocess.Popen is now up to 60x faster on Linux and 106x faster on FreeBSD when os.posix_spawn() can be used, the parent has large memory footprint (ex: 2 GiB), and the child process execution time is short."

@vstinner
Copy link
Member

@nanjekyejoannah: Until we agree on the sentence to be added to What's New in Python 3.8, don't add it. We can add it in the following PR.

@serhiy-storchaka
Copy link
Member

I suggest to not use any precise numbers: "subprocess.Popen is now significantly faster" or "subprocess.Popen can be now tens times faster".

@vstinner
Copy link
Member

@nanjekyejoannah It seems like you closed the PR. Was it on pupose? I'm unable to reopen the issue.

@nanjekyejoannah
Copy link
Member Author

@vstinner I will reopen did it by mistake.

@vstinner
Copy link
Member

@nanjekyejoannah: this PR is now empty, it doesn't contain any commit :-( It seems like your removed your commit. You can use "git reflog" or open .git/logs/HEAD to recover your removed commit.

@@ -1390,6 +1390,11 @@ def _get_handles(self, stdin, stdout, stderr):
errread, errwrite)


def _posix_spawn(self, args, executable):
"""Execute program using os.posix_spawn()."""
env = os.environ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: this variable is useless, just pass directly os.environ to posix_spawn().

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@vstinner
Copy link
Member

I will merge this PR tomorrow except if someone sees a good reason to not merge it: https://bugs.python.org/issue35537#msg332239

and not start_new_session
and os.path.dirname(executable)
and self.stdin is self.stdout is self.stderr is None):
self._posix_spawn(args, executable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking self.std* is still not enough. Those fields are for the parent's ends of pipes, but if a caller redirects to a descriptor, there is no such end. For example, subprocess.call(['/bin/sleep', '1000'], stdout=fd, close_fds=False, restore_signals=False) doesn't work properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@izbyshev: Oh right! I fixed the test: test (p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite) instead of (self.stdin, self.stdout, self.stderr). Is it better now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, LGTM now.

There is another potential problem with posix_spawn: asynchronous error notification. _posixsubprocess uses a pipe to notify the parent about all errors up to and including execve errors, so its style is synchronous. Glibc and musl use the pipe-based notification too, but they are not required to do that, and I don't know what other implementations do.

Test  (p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite)
instead of (self.stdin, self.stdout, self.stderr).
@vstinner
Copy link
Member

@nanjekyejoannah: I pushed directly a fix into your branch to fix the latest reported bug about standard streams ;-)

@vstinner
Copy link
Member

I will merge this PR tomorrow except if someone sees a good reason to not merge it: https://bugs.python.org/issue35537#msg332239

Hum, @izbyshev found another issue (already now fixed) with this PR. Since I will be offline next 2 weeks, I prefer to not merge yet since I would be unable to handle any issue caused by the change. I now prefer to wait until I'm back from holiday to merge the PR.

In the meanwhile, if another core dev feels brave enough to merge it, please go ahead ;-)

@nanjekyejoannah
Copy link
Member Author

okay noted.

@vstinner
Copy link
Member

vstinner commented Dec 21, 2018 via email

@izbyshev
Copy link
Contributor

I expect that posix_spawn() sets errno as does execv()

As I explained, this behavior is not required by POSIX, and the opposite behavior is allowed. Recent glibc- and musl-based Linux behave as you expect, but for glibc, this was added in 2017. In my test on Ubuntu 14.04, ./python -c "import subprocess; subprocess.call(['/xxx'], close_fds=False, restore_signals=False)" silently returns with zero exit code.

FreeBSD also seems not to use any synchronous notification.

Other systems where posix_spawn is a syscall (macOS?) probably behave as you expect, but I've not checked.

@vstinner
Copy link
Member

I merged PR #11452 instead.

@vstinner vstinner closed this Jan 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants