Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
syscall: use posix_spawn (or vfork) for ForkExec when possible #5838
Comments
Some related discussion: https://groups.google.com/d/topic/golang-dev/66rHnYuMaeM/discussion |
jingweno
commented
Jul 7, 2013
Thanks for the info. I am pasting the implementation of the posix-spawn gem in case it's a helpful reference. Looks like it forces to use vfork on linux: https://github.com/rtomayko/posix-spawn/blob/master/ext/posix-spawn.c#L399-L404 https://github.com/rtomayko/posix-spawn/blob/master/ext/posix-spawn.c#L418 |
the problem of MADV_DONTFORK heap is that it's difficult for us to make sure ForkExec code doesn't use heap at all. i think posix_spawn is the way to go (vfork is removed in modern POSIX standard, so it should be avoided when possible). Labels changed: added priority-later, performance, removed priority-triage. Status changed to Accepted. |
jingweno
added
accepted
Performance
labels
Dec 4, 2013
tarndt
commented
Jan 23, 2015
|
I see this issue on a regular basis for machines that still have free physical memory (certainly enough for the process I would like to invoke), but are being called by a Go process with a very large virtual memory footprint using Go daemons with large virtual memory footprints needing to invoke small command-line utilities is a common use case. It would be nice to get this assigned to the 1.5 release. |
|
I took a short look. On GNU/Linux posix_spawn is a C library function, not a system call. vfork is a special case of clone: you pass the CLONE_VFORK flag. This means that a program that cares can already use vfork on GNU/Linux, by setting the Cloneflags field in os/exec.Cmd.SysProcAttr or os.ProcAttr.Sys. So while this would be nice to fix I don't see a pressing need. To fix we need to edit syscall/exec_linux.go to pass CLONE_VFORK when that is safe. It is pretty clearly safe if the only things the child needs to do after the fork is fiddle with file descriptors and call exec. If that is the case, as can be determined by looking at the sys fields, then we could add CLONE_VFORK to the clone flags. If somebody wants to try that out, that would be nice. |
tarndt
commented
Feb 4, 2015
@ianlancetaylor I see this issue on a regular basis, I will give you suggestion a try- Thanks! |
napsy
commented
Feb 13, 2015
|
I'm facing with a similar problem. I'm running a go app that allocates about 14GB of VM and can't spawn a simple 'ps' command despite having at leat 300 MB system RAM still available. It would be really great if this issue would be fixed in 1.5 |
|
Hmm, I gave this a quick try a few days ago, but gave up for now. I failed to determine why the child hangs after the clone syscall. And if the child hangs, the parent won't continue either in the CLONE_VFORK case. I only activated CLONE_VFORK, if everything in syscall.SysProcAttr was set to it's zero value. But even such simple cases are not so simple it seems. if someone want's to work on this with me, just ping me here. |
|
Did you pass CLONE_VM as well as CLONE_VFORK? I think that without CLONE_VM the parent won't be able to see when the child has exec'ed. Though I don't know why the child would hang. |
|
@ianlancetaylor yes, I passed both. But I guess the systems needs to be in a single thread mode for this to work, which Go doesn't seem to do at the moment. http://ewontfix.com/7/ has more info on this, if someone wants to continue here (e.g. my future self). |
tarndt
commented
Mar 6, 2015
|
@ianlancetaylor Edit: I even tried adding runtime.GOMAXPROCS(runtime.NumCPU()) and it still works. |
|
@tarndt CLONE_VM is not passed in your example. CLONE_VFORK without CLONE_VM will be If you add this, the go program calling execve hangs. Which is exactly what I have seen in my tests. My current plan is to use madvise(...,MADV_DONTFORK) with the heap, but I haven't figured out yet how to do the file descriptor juggling in a safe manner without affecting the parent process and only using stack. |
|
@tarndt If you use CLONE_VFORK without CLONE_VM, is that really any faster? If it is faster, and it works, then I suppose we could use it. |
|
There is one reason to not use vfork. It's when the child needs to dup a See This is an edge case, but still worth considering when switching to vfork. |
rsc
added this to the Unplanned milestone
Apr 10, 2015
rsc
removed
priority-later
labels
Apr 10, 2015
|
I've run into this same problem, a Go program with a large virtual memory footprint is failing to fork/exec despite plenty of free memory. My experiments with CLONE_VFORK|CLONE_VM end up with the parent being mysteriously zombified. With just CLONE_VM I get: fatal error: runtime: stack split at bad time |
redbaron
commented
Jan 13, 2016
|
For all those affected, temporary workaround on Linux until it is fixed properly can be one of following:
|
tarndt
commented
Jan 20, 2016
|
I can confirm the above works (if you buy your devops people enough |
nadermx
commented
Jan 27, 2016
|
Thank you so much, @redbaron your solution worked. |
napsy
commented
Mar 3, 2016
|
Hello, what's the follow up for this issue? I have a process at 20 GB VMM and ~ 4 GB RSS and I think spawning will become a problem for me very soon. |
|
@napsy There is no follow-up. Somebody needs to work on this and try out the various cases. I don't think it's going to matter very much for your program--your program should work OK either way. What we are talking about here is just a possible performance improvement to os.Exec. Of course, if that is not true, please do let us know. |
napsy
commented
Mar 3, 2016
|
@ianlancetaylor as far as I understand, spawning a new process from a parent with such a large memory footprint could cause problems. If I misunderstood the problem, please correct me. |
|
@napsy, you're assuming you will hit the problem. Instead of making it a
hypothetical, try making your process use more VMM and more RSS and see if
you do.
|
|
Wat. This is already a confirmed problem. It wants to be fixed regardless of whether it impacts this one user. |
|
@anacrolix Nobody is saying it should not be fixed. We're just saying that there is no reason for someone to simply assume they are encounter this. That said, it would be nice to have a simple test case showing the problem. We don't have that now. |
jd3nn1s
commented
Apr 21, 2016
|
I've run into this problem recently. Here's a test case that I made for it: https://gist.github.com/jd3nn1s/24896f55f20497a972914412f23ab23a As the allowed overcommit is some heuristic here's some details of my setup:
|
|
Here's a test case that works relatively well for me: https://gist.github.com/neelance/460f8a31f2391d2f3aafd7052348f66a
I've seen even worse latencies in production (up to several 100ms), but it is hard to simulate that in a test. Here's a patch that works for the test case and the core package tests, I still have to give it a try in production: neelance/go@b7edfba Test case output after patch:
@ianlancetaylor I can bring it upstream if desired. This stuff was partially new to me, so I'd like to get feedback. E.g. I picked the register |
|
Thanks for looking at this. Why do you need to preserve the return address? |
jd3nn1s
commented
Apr 29, 2016
|
http://ewontfix.com/7/ referred to above describes a possible security issue if setuid is called from another thread while vforking. Probably not a problem unless #1435 is resolved and setuid is implemented. |
|
@ianlancetaylor The return address needs to be preserved because |
sliverc
referenced this issue
in smira/aptly
Nov 1, 2016
Closed
Aptly publish: fork/exec /usr/bin/bzip2: cannot allocate memory #415
arya
commented
Dec 13, 2016
•
|
@neelance I tried your test (https://gist.github.com/neelance/460f8a31f2391d2f3aafd7052348f66a) with and without CLONE_VFORK (not CLONE_VM) and observe a significant speedup. Without CLONE_VFORK:
With CLONE_VFORK (still no CLONE_VM):
I'm doing this on Go 1.7 with GOARCH=amd64. Why is this the case? If the slowness is because calling clone without CLONE_VM will copy the memory, doesn't that mean that it should be slow even if you include CLONE_VFORK? One thing I noticed in the documentation for
And Go does set If not, is there any other reason why CLONE_VFORK alone should make it faster? Any downside to using it? |
|
Hmm, interesting observation. Yeah, maybe it is the parent process causing more COW work before the child process does its What changes did you test? Have you used neelance/go@b7edfba or have you simply specified It would also be interesting to have numbers for the |
|
@neelance, do you want to get this into Go 1.9 in Feb? |
|
@bradfitz Yes, I think it would be good to get this upstream. We should figure out if |
bradfitz
modified the milestones:
Go1.9Early,
Unplanned
Dec 13, 2016
bradfitz
assigned
neelance
Dec 13, 2016
arya
commented
Dec 13, 2016
|
@neelance my test was without your changes. It was just the gist I included with and without CLONE_VFORK. |
|
Wow nice, so the only change it does is to make the parent wait until the child does its Could you do a test run with my full patch to see if there is any additional gain in |
arya
commented
Dec 13, 2016
|
@neelance I feel super stupid. My first test was incorrect, but Unmodified Go 1.7
Unmodified Go 1.7, application code adds CLONE_VFORK
Go 1.7 with this patch applied: neelance/go@b7edfba
I'm surprised though that memory is still copied despite the the child_stack is set to zero. |
|
As far as I understand the manpage of
|
arya
commented
Dec 15, 2016
|
@neelance That makes sense to me. AFAICT the third option (your patch) is this most feasible and performant. The first option seems to be what's in master and suffers from a large amount of copying. CLONE_VFORK avoids some of the copies, but not much apparently. The second option seems to me (as a novice to the internals) much more difficult to get right given the nature of Go and its management of memory. Is that accurate? |
|
Yes, I also think that the second option is harder to implement. |
fasaxc
commented
Feb 24, 2017
|
I work on an app that makes heavy use of subprocesses to manipulate iptables and ipsets (since that's their only supported API). After observing poor performance when my process is using a lot of RAM, I found this issue. I tried adding FWIW, the previous version of our app was written in Python and we observed a dramatic improvement when we switched from Python's default fork/exec strategy to using |
fasaxc
commented
Feb 24, 2017
|
Measuring in an app that's under load with work going on in other threads, I see |
|
@fasaxc Yes, only using Would you mind applying the whole patch neelance/go@f207709 to your GOROOT, then do |
|
With "improve" I specifically mean the latency on high ram usage. You are right that in a low-RAM situation it may lower the throughput. Please check if it is still 50x when using the full patch. |
fasaxc
commented
Feb 24, 2017
|
That patch makes a dramatic improvement. I'm measuring 99%ile latency of 1ms vs 60ms before and a drop from 100% CPU to 20% CPU usage. |
|
Yey, I'm happy to hear that. Any downsides that you see? What about the low-memory situation? |
fasaxc
commented
Feb 24, 2017
|
@neelance It seems to improve latency at small VSS size too (~40MB): 800us 99th %ile vs 2600us |
added a commit
to fasaxc/go-build
that referenced
this issue
Feb 24, 2017
fasaxc
referenced this issue
in projectcalico/go-build
Feb 24, 2017
Merged
Add Go standard library patch to use VFORK for exec.Command(). #4
|
Cool. So there are no reasons for not bringing this upstream. I'll create a CL today or tomorrow. |
gopherbot
commented
Feb 25, 2017
|
CL https://golang.org/cl/37439 mentions this issue. |
added a commit
to fasaxc/go-build
that referenced
this issue
Feb 27, 2017
gopherbot
closed this
in
9e6b79a
Mar 22, 2017
jingweno
commented
Mar 23, 2017
|
@neelance nice work! |
jingweno commentedJul 5, 2013