-
-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support POSIX_SPAWN_USEVFORK flag in posix_spawn #78844
Comments
In some systems, posix_spawn has an optional flag (POSIX_SPAWN_USEVFORK) that is GNU specific and allows the user to force posix_spawn to spawn the child using vfork instead of fork. This is very beneficial as it gives great speedups compare with normal fork+execv. |
I suggest to name the parameter "use_vfork", or maybe even "vfork". |
We discussed with Gregory P. Smith, and we agreed on the "use_vfork=True" name. He is a native english speaker, so I rely on him :-) Moreover, "use_vfork" is closer to POSIX_SPAWN_USEVFORK constant than "vfork". |
I discussed with Pablo (at the CPython sprint) about the case of FreeBSD. On FreeBSD, posix_spawn() always uses vfork, so it can be surprising to get an error when using use_vfork=True. But, the error message is now very explicit: it mentions that a specific constant is not available on FreeBSD. The documentation is also clear on that point. |
Given the FreeBSD default and seeming desirability of vfork for this use case, is there a good reason using vfork could not be the default behavior on any OS that supports it? |
If vfork() is used ob FreeBSD, wouldn't be better to make use_vfork=True the default on FreeBSD and raise an error on use_vfork=False? |
Gregory: on Linux, it does change the behavior. The parent blocks until the child is spawned. Python should let the developer decide to opt-in. Serhiy: IMHO it's better to make posix_spawn() as dumb as possible, a simple wrapper to the C call. use_vfork=True is currently designed for the GNU constant. posix_spawn(use_vfork=True) raises on FreeBSD: only pass use_vfork=True on Linux. What do you think? The problem of changing the default is that we don't know the full list of all platforms that use vfork() by default. If we have a shoet list, how do we know if it's complete? For example, macOS doesn't say anything about vfork. Does it use it? Maybe not. Maybe yes. What if the default changes on a platform? |
@vstinner: another option is to ignore "use_vfork" on platforms that don't have POSIX_SPAWN_USEVFORK. Using vfork or not is primarily a optimisation, ignoring the flag should not result in different behaviour (other than speed). |
If this is an optimization, what is the downside of always using vfork()? |
Good question. A comment on <https://stackoverflow.com/questions/2731531/faster-forking-of-large-processes-on-linux\> says that glibc already uses vfork where appropriate, explicitly using the flag may not be necessary. Note that I haven't verified if the comment is correct. |
The documentation says: The child process is created using vfork(2) instead of fork(2) when
So using the flag is necessary if you want to force the use of vfork when passing any of the flags specified in the second point. |
Serhiy Storchaka: "If this is an optimization, what is the downside of always using vfork()?" I don't know the vfork() function, but you can find articles like: But it's unclear to me if vfork() drawbacks also affect posix_spawn(). posix_spawn() is well defined: call vfork() and then immediately exec(). Another article: "First is that vfork pauses the parent thread while the child executes and eventually calls an exec family function, this is a huge latency problem for applications." |
vfork() is more dangerous than fork() because the parent and child processes share memory (not copy-on-write, but really the same memory). Whether or not this affects posix_spawn depends on its implementation (to give a very vague statement). Glibc already uses vfork() in a number of cases, I'd expect that those are the cases where it is safe to use vfork() in the implementation of posix_spawn in the context of glibc. I'd therefore carefully test the use of vfork() in other cases to make sure those don't affect the parent process. |
If glibc already uses vfork() if it is safe, I suppose that using vfork() in other cases is unsafe, and we shouldn't support this nonstandard option. |
You still need to allow the flag as being safe or unsafe depends on the user code as glibc cannot know about the details of what is going to be executed. That is the reason they have the flag, so the user can disambiguate if is safe or not. If we don't expose the flag, the user may know that what is going to do is safe or acceptable but then they cannot activate the use of vfork. |
Some interesting read: Go is using vfork/posix_spawn when possible: And it seems that have interesting results: https://about.gitlab.com/2018/01/23/how-a-fix-in-go-19-sped-up-our-gitaly-service-by-30x/ |
I get the impression that go uses vfork on Linux, not posix_spawn, because go doesn't use libc (based on reading the referenced issue, not on deep knowledge of go and its implementation). I do wonder why glibc's implementation of posix_spawn doesn't use vfork more often, and in which cases it is safe to explicitly use vfork even when glibc won't do so by default. Enabling the use of vfork without determining when it safe to do so is asking for problems, and hard to debug/reproduce ones at that (due to vfork semantics). Adding a "use_vfork" keyword argument to posix_spawn is IMHO not the right way to go, it would be better to determine when using vfork is safe and then unconditionally enable it. Otherwise users will have to do the research. |
I did some more research:
The start of the issue also contains some information on why glibc is (was?) so conservative about using vfork, and a possible work around (disable cancelation points around the call to posix_spawn).
It looks like the advise to use POSIX_SPAWN_USEVFORK is outdated, although I'm not 100% sure of my conclusion. A glibc expert should be able to confirm or refute this. @pablogsal: do you have more information on why you want to enable this flag? Do you have measurements that show that adding this flag helps? |
Oh wow, I didn't expect that exposing a constant would be a source of such deep debate! (I'm not saying that the debate is useless or negative, it's useful and constructive, just I'm surprised how system programming can be hard sometimes!) |
Give all of this, the lesson I'd take away is perhaps that we should just provide the constant in the os module when available at build time (a configure check) and let people who find a need to use it on their system check for its presence and use it in their code. The general theme of the posix / os module is intentionally low level: Expose the APIs and let people make decisions about when to use what at a higher level. It sounds like a use_vfork=XXX parameter is undesirable at this point unless you wanted to make it clear by its name that it is a glibc only thing feature glibc_use_vfork=XXX perhaps (if exposing the POSIX_SPAWN_USEVFORK flag name itself to be or'ed in is undesirable from a Pythonic API point of view). The problem with a parameter such named is that you then have to decide on error semantics and need a way for people to know when they can or can't use it prior to calling posix_spawn() itself. With a constant that you or into flags, you can use hasattr to check the presence of the constant to determine if you can try using it. (the libc call could still return an error which we'd turn into an OSError exception if the API call doesn't support that flag or otherwise doesn't like the combination of flags you passed - but that situation is always possible in any API) |
Note that the POSIX_SPAWN_USEVFORK may not do anything at this point (see the link in my previous message, if I read the code correctly POSIX_SPAWN_USEVFORK is no longer used in the implementation of posix_spawn(3)). Even if it did do something the user that uses the flag needs to evaluate whether or not it is safe to do so, and this requires inspecting the os.posix_spawn implementation and not just the Python code that calls it. IMHO we shouldn't expose or use this flag. If it were useful to do anything with the flag the os.posix_spawn implementation should do so automatically when it is safe to do so (which may require additional steps around calling posix_spawn(3)). |
I'm discussing with Pablo to see how to use posix_spawn() in the Python subprocess module. IMHO we should consider the specific case of subprocess. What is the expected API? kw = {}
if hasattr(os, 'POSIX_SPAWN_USEVFORK'):
kw['flags'] = os.POSIX_SPAWN_USEVFORK
posix_spawn(*args, **kw) or posix_spawn(*args, use_vfork=True) or kw = {}
if sys.platform == 'linux':
kw['use_vfork'] = True
posix_spawn(*args, **kw) ? For example, if we consider that it's safe to use POSIX_SPAWN_USEVFORK in all cases for posix_spawn(), maybe we should not add an option at the Python level, and hardcode the POSIX_SPAWN_USEVFORK flag in the C code? --
Ok, now I'm confused: what's the point of this issue if the flag became useless? :-) |
The removal of POSIX_SPAWN_USEVFORK in glibc was somewhat recent. There is also multiple implementations of posix_spawn in glibc source tree. vfork was introduced in The new implementation of posix_spawn for unix (/sysdeps/unix/sysv/linux/spawni.c) was introduced in The new implementation of posix_spawn for posix (sysdeps/posix/spawni.c) was introduced in In these two last commits, posix_spawn removed the flag and made it a no-op. But notice that these commits are "recent" (2016 and 2017). Any older version of glibc that uses python will not benefit from these changes and if it wants to use vfork, it needs the flag. At this point and after this interesting discussion I am not sure that we should expose the flag or not, but I still want to remark that even if glibc removed the flag for both implementations (unix and posix) we may want to expose the flag for older versions of glibc. I discover this problem when I was benchmarking posix_spawn in different platforms and systems and I notice a performance decrease in some older version of ubuntu from 2015. In that platform (older version of glibc) you need the flag to activate vfork and get the performance benefit. The glibc version for this systems is straightforward:
I am happy to close the issue if everyone agrees that we should look at the future and not care about these older versions of glibc < 2017/2016 (depending on posix/unix version) or this does not worth the hassle. I am still happy with this great and interesting discussion, so thank you everyone :) Complete story of posix_spawn in glibc: Whole story of posix_spawn: commit c181840c93d3a8fcb50991b2326c40f34eb5e82b |
I wouldn't bother with POSIX_SPAWN_USEVFORK on GNU/Linux. Current versions of glibc always use a vfork-style clone call, so there would be a difference on older versions only. But there, the vfork code has subtle bugs, so using POSIX_SPAWN_USEVFORK there is not a good idea, either. |
Ok, I am going to close the issue and the PR unless someone comments on this issue in 24h saying that we still need to expose the flag and providing an explanation. Thank you everyone for this interesting discussion :) |
I rely on Florian Weimer who maintains the glibc for Red Hat, and so I agree to close the issue. It seems like Pablo wants to keep the issue open 24h, so I let him close it later ;-) The good news is that calling posix_spawn() with no flag is safe in all glibc versions and it's fast by default on recent glibc versions ;-) |
Agreed on not exposing it. It seems obsolete in recent glibc and the older glibc implementations that had it may have made questionable decisions. :) Thanks for chiming in Florian, and thanks Pablo for your detailed investigation. :) |
I concur, very interesting talk ;-) |
Same here, even if I learned more about the implementation of posix_spawn than I should have to know ;-) |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: