New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os: StartProcess ETXTBSY race on Unix systems #22315
Comments
Change https://golang.org/cl/71570 mentions this issue: |
Change https://golang.org/cl/71571 mentions this issue: |
Userspace workarounds seem flawed or less than ideal. This is a kernel |
I'm going to repeat a hack I described elsewhere that I believe would work for pure Go programs.
The effect of this should be that when The disadvantages are that all forks are serialized, and that all forks waste time closing descriptors that will shortly be closed anyhow. Also, of course, forks temporarily block closes, but that is unlikely to be significant. |
I'm happy to have a go, but I'm a nobody on that list. I was assuming folks here might have the ear of a Google kernel developer or two in that area that would vet the idea and suggest it to the list if worthy. :-)
And Do |
Is there a place to file a feature request against the Linux kernel? I know nothing about the kernel development process. I hear it uses git. Agree about As far as I can see it doesn't matter if |
@ianlancetaylor thanks, yes, the explicit closes would solve the problem with slow execs, which would be nice. That might make this actually palatable. You also don't even need the extra pipe if you use vfork in this approach. I agree with @RalphCorderoy that there's a race between the "maintain the max" and "fork", in that Open might create a new fd, then fork runs in a different thread before Open can update the max. But since fds are created lowest-available, it should suffice for the child to assume that max is, say, 10 larger than it is. Also note that this need not be an RWMutex (and for that matter the current syscall.ForkMutex need not be an RWMutex either). It just needs to be an "either-or" mutex. An RWMutex allows N readers or 1 writer. The mutex we need would allow N of type A or N of type B, just never a mix. If we built that (not difficult, I don't think), then programs that never fork would not serialize any of their closes, and programs that fork a lot but don't close things would not serialize any of their forks. O_CLOFORK would require having fcntl F_SETFL/F_GETFL support for that bit too, and it would complicate fork a little more than it already is. An alternative that would be equally fine for us would be a "close all fd's above" or "tell me the maximum fd of my process" syscall. I don't know if a new bit or a new syscall is more likely. |
I should maybe also note that macOS fixes this problem by putting #if 0 around the ETXTBSY check in the kernel implementation of exec. That would be a third option for Linux although probably less likely than the other two. |
I've emailed linux-kernel@vger.kernel.org. Will reference an archive once it appears. |
linux-kernel mailing-list archive of post: https://marc.info/?l=linux-kernel&m=150834137201488 |
On modern Unix systems it is basically impossible for a multithreaded program to open a binary for write, close it, and then fork+exec that same binary. So don't write the binary if we're going to fork+exec it. This fixes the ETXTBSY flakes. Fixes #22220. See also #22315. Change-Id: I6be4802fa174726ef2a93d5b2f09f708da897cdb Reviewed-on: https://go-review.googlesource.com/71570 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
This hack existed because cmd/go used to install (write) and then run cmd/cgo in the same invocation, and writing and then running a program is a no-no in modern multithreaded Unix programs (see #22315). As of CL 68338, cmd/go no longer installs any programs that it then tries to use. It never did this for any program other than cgo, and CL 68338 removed that special case for cgo. Now this special case, added for #3001 long ago, can be removed too. Change-Id: I338f1f8665e9aca823e33ef7dda9d19f665e4281 Reviewed-on: https://go-review.googlesource.com/71571 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
What's the plan here for Go 1.10? @RalphCorderoy, looks like you never got a reply, eh? |
Looks like Solaris and macOS and OpenBSD have |
The failures reported in #1848 seem to be caused by the issue described in golang/go#22315 and the fact that we run tests in parallel.
A colleague pointed me to this bug in context of a wider discussion about The high-level algorithm for writing a file for execution is as follows:
If an fd opened in step 1 leaked to another process as a result of concurrent thread issuing a The diff to the program shown in the opening comment would be just: @@ -41,6 +44,20 @@ runner(void *v)
exit(2);
}
write(fd, script, strlen(script));
+ if (flock(fd, LOCK_EX) < 0) {
+ perror("flock");
+ exit(2);
+ }
+ close(fd);
+ fd = open(buf, O_RDONLY|O_CLOEXEC, 0777);
+ if(fd < 0) {
+ perror("open (readonly)");
+ exit(2);
+ }
+ if (flock(fd, LOCK_SH) < 0) {
+ perror("flock (readonly)");
+ exit(2);
+ }
close(fd);
pid = fork();
if(pid < 0) { |
@amonakov Thanks for the comment. That is an interesting suggestion. I guess that to make this work automatically in Go we would have to detect when an executable file is opened with write access. Unfortunately this would seem to require an extra But then there seems to be a race condition. The |
I don't think that would work: permission bits could be changed independently after
No, forked child shares the open file description with the parent, so a later |
@amonakov Thanks. For what it's worth, all files opened using the Go standard library have That said, personally I would not want to add new API to close an executable file. That seems awkward and hard to understand. I'd much rather persuade kernels to support |
Instead of always using std.testing.allocator, the test harness now follows the same logic as self-hosted for choosing an allocator - that is - it uses C allocator when linking libc, std.testing.allocator otherwise, and respects `-Dforce-gpa` to override the decision. I did this because I found GeneralPurposeAllocator to be prohibitively slow when doing multi-threading, even in the context of a debug build. There is now a second thread pool which is used to spawn each test case. The stage2 tests are passed the first thread pool. If it were only multi-threading the stage1 tests then we could use the same thread pool for everything. However, the problem with this strategy with stage2 is that stage2 wants to spawn tasks and then call wait() on the main thread. If we use the same thread pool for everything, we get a deadlock because all the threads end up all hanging at wait() and nothing is getting done. So we use our second thread pool to simulate a "process pool" of sorts. I spent most of the time working on this commit scratching my head trying to figure out why I was getting ETXTBSY when spawning the test cases. Turns out it's a fundamental Unix design flaw, already a known, unsolved issue by Go and Java maintainers: golang/go#22315 https://bugs.openjdk.org/browse/JDK-8068370 With this change, the following command, executed on my laptop, went from 6m24s to 1m44s: ``` stage1/bin/zig build test-cases -fqemu -fwasmtime -Denable-llvm ``` closes #11818
Instead of always using std.testing.allocator, the test harness now follows the same logic as self-hosted for choosing an allocator - that is - it uses C allocator when linking libc, std.testing.allocator otherwise, and respects `-Dforce-gpa` to override the decision. I did this because I found GeneralPurposeAllocator to be prohibitively slow when doing multi-threading, even in the context of a debug build. There is now a second thread pool which is used to spawn each test case. The stage2 tests are passed the first thread pool. If it were only multi-threading the stage1 tests then we could use the same thread pool for everything. However, the problem with this strategy with stage2 is that stage2 wants to spawn tasks and then call wait() on the main thread. If we use the same thread pool for everything, we get a deadlock because all the threads end up all hanging at wait() and nothing is getting done. So we use our second thread pool to simulate a "process pool" of sorts. I spent most of the time working on this commit scratching my head trying to figure out why I was getting ETXTBSY when spawning the test cases. Turns out it's a fundamental Unix design flaw, already a known, unsolved issue by Go and Java maintainers: golang/go#22315 https://bugs.openjdk.org/browse/JDK-8068370 With this change, the following command, executed on my laptop, went from 6m24s to 1m44s: ``` stage1/bin/zig build test-cases -fqemu -fwasmtime -Denable-llvm ``` closes #11818
Relatively recent (sad) thread re: linux O_CLOFORK: https://lore.kernel.org/lkml/20200525081626.GA16796@amd/T/ |
Thanks to reading the linux-kernel thread @spoerri mentions above, I see POSIX has added FD_CLOFORK and O_CLOFORK: https://www.austingroupbugs.net/view.php?id=1318 |
I see that some of the Linux kernel developers are pretty skeptical about the need for this; is anybody reading this issue able to point them to the problem described here? It's not a common problem but it's not at all specific to Go. Thanks. |
Hi Ian, Yes, I had a go yesterday by telling the linux-kernel list about this Go issue and the Java one to show it wasn't system(3) specific and has a wide long-standing impact. Matthew Wilcox, who implies he's a Googler, has replied so far:
Perhaps the first thing is to get agreement it's the kernel's issue to fix and then move on to an implementation with a cost they find acceptable. At the moment, the kernel seems to be faulty but fast. Edited to add link: https://lore.kernel.org/lkml/20200525081626.GA16796@amd/T/#m5b8b20ea6e4ac1eb3bc5353c150ff97b8053b727 |
Instead of always using std.testing.allocator, the test harness now follows the same logic as self-hosted for choosing an allocator - that is - it uses C allocator when linking libc, std.testing.allocator otherwise, and respects `-Dforce-gpa` to override the decision. I did this because I found GeneralPurposeAllocator to be prohibitively slow when doing multi-threading, even in the context of a debug build. There is now a second thread pool which is used to spawn each test case. The stage2 tests are passed the first thread pool. If it were only multi-threading the stage1 tests then we could use the same thread pool for everything. However, the problem with this strategy with stage2 is that stage2 wants to spawn tasks and then call wait() on the main thread. If we use the same thread pool for everything, we get a deadlock because all the threads end up all hanging at wait() and nothing is getting done. So we use our second thread pool to simulate a "process pool" of sorts. I spent most of the time working on this commit scratching my head trying to figure out why I was getting ETXTBSY when spawning the test cases. Turns out it's a fundamental Unix design flaw, already a known, unsolved issue by Go and Java maintainers: golang/go#22315 https://bugs.openjdk.org/browse/JDK-8068370 With this change, the following command, executed on my laptop, went from 6m24s to 1m44s: ``` stage1/bin/zig build test-cases -fqemu -fwasmtime -Denable-llvm ``` closes #11818
Instead of always using std.testing.allocator, the test harness now follows the same logic as self-hosted for choosing an allocator - that is - it uses C allocator when linking libc, std.testing.allocator otherwise, and respects `-Dforce-gpa` to override the decision. I did this because I found GeneralPurposeAllocator to be prohibitively slow when doing multi-threading, even in the context of a debug build. There is now a second thread pool which is used to spawn each test case. The stage2 tests are passed the first thread pool. If it were only multi-threading the stage1 tests then we could use the same thread pool for everything. However, the problem with this strategy with stage2 is that stage2 wants to spawn tasks and then call wait() on the main thread. If we use the same thread pool for everything, we get a deadlock because all the threads end up all hanging at wait() and nothing is getting done. So we use our second thread pool to simulate a "process pool" of sorts. I spent most of the time working on this commit scratching my head trying to figure out why I was getting ETXTBSY when spawning the test cases. Turns out it's a fundamental Unix design flaw, already a known, unsolved issue by Go and Java maintainers: golang/go#22315 https://bugs.openjdk.org/browse/JDK-8068370 With this change, the following command, executed on my laptop, went from 6m24s to 1m44s: ``` stage1/bin/zig build test-cases -fqemu -fwasmtime -Denable-llvm ``` closes ziglang#11818
There is a race that causes callout scripts to occasionally fail to execute during the test suite. This is caused by the fact that the test suite is run in multiple threads and the fact that callout scripts are copied into a temporary test environment directory. See golang/go#22315 for a thorough description of the root cause of a similar problem encountered by the 'go' compiler. Fixes: mdevctl#64. Signed-off-by: Jonathon Jongsma <jjongsma@redhat.com>
I observed what I believe is an equivalent race for pipes when investigating #36107. The race can cause a The mechanism is the same:
This race can be observed by running |
Change https://go.dev/cl/458015 mentions this issue: |
Change https://go.dev/cl/458016 mentions this issue: |
I made this test parallel in CL 439196, which exposed it to the fork/exec race condition described in #22315. The ETXTBSY errors from that race should resolve on their own, so we can simply retry the call to get past them. Fixes #56811. Updates #22315. Change-Id: I2c6aa405bf3a1769d69cf08bf661a9e7f86440b4 Reviewed-on: https://go-review.googlesource.com/c/go/+/458016 Reviewed-by: Ian Lance Taylor <iant@google.com> Run-TryBot: Bryan Mills <bcmills@google.com> Auto-Submit: Bryan Mills <bcmills@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
- Use testenv.Command instead of exec.Command to try to get more useful timeout behavior. - Parallelize tests that appear not to require global state. (And add explanatory comments for a few that are not parallelizable for subtle reasons.) - Consolidate some “Helper” tests with their parent tests. - Use t.TempDir instead of os.MkdirTemp when appropriate. - Factor out subtests for repeated test helpers. For #36107. Updates #22315. Change-Id: Ic24b6957094dcd40908a59f48e44c8993729222b Reviewed-on: https://go-review.googlesource.com/c/go/+/458015 Reviewed-by: Ian Lance Taylor <iant@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Bryan Mills <bcmills@google.com> Auto-Submit: Bryan Mills <bcmills@google.com>
Modern Unix systems appear to have a fundamental design flaw in the interaction between multithreaded programs, fork+exec, and the prohibition on executing a program if that program is open for writing.
Below is a simple multithreaded C program. It creates 20 threads all doing the same thing: write an exit 0 shell script to /var/tmp/fork-exec-N (for different N), and then fork and exec that script. Repeat ad infinitum. Note that the shell script fds are opened O_CLOEXEC, so that an fd being written by one thread does not leak into the fork+exec's shell script of a different thread.
On my Linux workstation, this program produces a never-ending stream of ETXTBSY errors. The problem is that O_CLOEXEC is not enough. The fd being written by one thread can leak into the forked child of a second thread, and it stays there until that child calls exec. If the first thread closes the fd and calls exec before the second thread's child does exec, then the first thread's exec will get ETXTBSY, because somewhere in the system (specifically, in the child of the second thread), there is an fd still open for writing the first thread's shell script, and according to modern Unix rules, one must not exec a program if there exists any fd anywhere open for writing that program.
Five years ago this bit us because cmd/go installed cmd/cgo (that is, copied the binary from a temporary location to somewhere/bin/cgo) and then executed it. To fix this we put a sleep+retry loop around the fork+exec of cgo when it gets ETXTBSY. Now (as of last week or so) we don't ever install cmd/cgo and execute it in the same cmd/go process, so that specific race is gone, although as I write this cmd/go still has the sleep+retry loop, which I intend to remove.
Last week this bit us again because cmd/go updated a build stamp in the binary, closed it, and executed it. The resulting flaky ETXTBSY failures were reported as #22220. A pending CL fixes this by not updating the build stamp in temporary binaries, which are the main ones we execute. There's still one case where we write+execute a program, which is
go test -cpuprofile x.prof pkg
. The cpuprofile flag (and a few others) cause cmd/go to leave the pkg.test in the current directory for debugging purposes but also run the test. Luckily running the test is currently the final thing cmd/go does, and it waits for any other fork+exec'ed programs to finish before fork+exec'ing the test. So the race cannot happen in this case.In general this race is going to happen every time anyone writes a program that both writes and executes a program. It's easy to imagine other build systems running into this, but also programs that do things like unzip a zip file and then run a program inside it - think a program supervisor or mini container runtime. As soon as there are multiple threads doing fork+exec at the same time, and one of them is doing fork+exec of a program that was previously open for write in the same process, you have a mysterious flaky problem.
It seems like maybe Go should take care of this, if possible. We've now hit it twice in cmd/go, five years apart, and at least this past time it took the better part of a day to figure out. (I don't remember how long it took five years ago, in part because I don't remember anything about discovering it five years ago. I also don't want to rediscover all this five years from now.)
There are a few hacks we could use:
None of these seem great. The ETXTBSY sleep, up to 1 second, might be the best option. It would certainly reduce the flake rate and in many cases would probably make it undetectable. It would not help exec of very slow-to-load programs, but that's not the common case.
I wondered how Java deals with this, and the answer seems to be that Java doesn't deal with this. https://bugs.openjdk.java.net/browse/JDK-8068370 was filed in 2014 and is still open.
The text was updated successfully, but these errors were encountered: