Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: tests break testing package if testlog file is not file descriptor 3 #24285

Open
siebenmann opened this issue Mar 7, 2018 · 17 comments

Comments

Projects
None yet
8 participants
@siebenmann
Copy link

commented Mar 7, 2018

What version of Go are you using (go version)?

go version devel +2c0c68d621 Tue Mar 6 23:33:28 2018 +0000 linux/amd64

(This is the current git tip as I write this.)

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOOS="linux"

What did you do?

On one of my slower machines I'm getting frequent but not completely consistent test failures in the test phase of all.bash during testing of os.exec. This reproduces when testing just os/exec on its own:

$ go test os/exec
PASS
testing: can't write /tmp/go-build685356926/b001/testlog.txt: write /tmp/go-build685356926/b001/testlog.txt: broken pipe
FAIL    os/exec 0.400s

This failure happens on an i5-2500 (running Fedora 27). A Fedora 27 Ryzen 1800X machine consistently passes. This only started happening on recent versions of master (initially I wrote it off as flakiness on my machine).

It's possible that this is related to #24050.

@andybons

This comment has been minimized.

Copy link
Member

commented Mar 7, 2018

@andybons andybons added this to the Go1.11 milestone Mar 7, 2018

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2018

This is a strange error. The file is an ordinary file, not a pipe. Why would a write fail with EPIPE? I don't see any pipe here.

Can you run running go test -c os/exec and then ./exec.test -test.short -test.testlogfile=/tmp/x? If you can recreate the problem that way, please attach the output of strace -f ./exec.test -test.short -test.testlogfile=/tmp/x. Thanks.

@siebenmann

This comment has been minimized.

Copy link
Author

commented Mar 7, 2018

I don't know if I'm reproducing it, but I'm having something unusual happen:

$ ./exec.test -test.short -test.testlogfile=/tmp/x
--- FAIL: TestCatGoodAndBadFile (0.00s)
        exec_test.go:141: expected test code; got "Error: open exec_test.go: no such file or directory\n" (len 52)
FAIL
testing: can't write /tmp/x: write /tmp/x: bad file descriptor

Here is the strace output: strace-out.txt.

@siebenmann

This comment has been minimized.

Copy link
Author

commented Mar 7, 2018

Okay, if I generate and run exec.test from src/os/exec, it reports that it passes and then has the testing: error. Here's the strace output for that all-passing case, in case it makes a difference:
strace-out.txt

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2018

Thanks. Judging by the strace output, file descriptor 4 is already open in your environment when the process starts. The test does not expect that. Are you aware of anything that would cause the process to be started with file descriptor already open?

@siebenmann

This comment has been minimized.

Copy link
Author

commented Mar 7, 2018

I'm not aware of anything in my environment that would cause file descriptor 4 to already be open, but the next time I have this test failure I'll see what lsof says about my shell's file descriptors and so on.

(However, looking at the strace output appears to show some use of file descriptor 4 when opening things, which seems odd if it's in use when the test run starts. There are quite a lot of uses of openat that return fd 4, along with eg socket and fcntl being used to duplicate fds.)

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2018

Those cases where openat or socket return descriptor 4 are after a long string of close calls that close all open file descriptors starting at 4. That string of close calls is from closeUnexpectedFds in os/exec/exec_test.go. Before that string of close calls, calls to openat, etc., are returning 5, not 4.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2018

(I've been focusing on file descriptor 4 for no good reason, but it seems that file descriptor 3 is also unexpectedly open.)

@siebenmann

This comment has been minimized.

Copy link
Author

commented Mar 7, 2018

I've now discovered what I do that causes the unexpectedly opened file descriptors, including the pipe. My sequence is:

  1. git pull to update to the latest master
  2. git log <a>...<b> to read the changes, which starts less as my pager.
  3. inside less, use !xterm & to start a new xterm where I will run a build while I'm reading the changes.
  4. this creates a shell inside the new xterm that has inherited already opened file descriptors from less; it appears that fd 3 is /dev/tty and fd 4 is the pipe from git.
  5. start building with all.bash.

It doesn't appear to matter if the pipe has been closed by git by the time the Go build reaches testing os/exec; either way things fail.

This doesn't happen consistently because I don't always start a new xterm from inside the less. Sometimes I read everything before starting the build, and sometimes I start a new xterm from my window manager (doing that is slightly less convenient because I then have to cd to the Go source directory, which is why I often start it from inside less).

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2018

I don't see a clean way to avoid this problem. Fundamentally we want to check that the file descriptor of the testlog file is 3, but I don't see a good way to find that out.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.11, Unplanned Mar 7, 2018

@gopherbot gopherbot removed the NeedsFix label Mar 7, 2018

@ianlancetaylor ianlancetaylor changed the title testing: overall test failure in os/exec with a broken pipe error os/exec: tests break testing package if testlog file is not file descriptor 3 Mar 7, 2018

NexediGitlab pushed a commit to SlapOS/slapos.core that referenced this issue Apr 4, 2018

slapgrid: Do not leak file descriptors to subprocesses when running '…
…node software'


Some context: I'm trying to build Go 1.10 under slapos and get failures
like:

	ok   os      0.782s
	PASS
	testing: can't write /tmp/go-build511454695/b682/testlog.txt: write /tmp/go-build511454695/b682/testlog.txt: broken pipe
	FAIL os/exec 0.947s
	...
	ok   cmd/vet/internal/cfg    0.002s
	2018/04/03 18:13:29 Failed: exit status 1
	golang1.10: Command failed with exit code 1:   cd src && ls -l /proc/self/fd && ./all.bash && cp -alf .. /srv/slapgrid/slappart5/srv/runner/software/099a7368550e95d120639cb03b6d47c8/parts/golang1.10
	golang1.10: Compilation error. The package is left as is at /srv/slapgrid/slappart5/srv/runner/software/099a7368550e95d120639cb03b6d47c8/parts/golang1.10__compile__/go where you can inspect what went wrong
	While:
	  Installing golang1.10.
	Error: System error

This was traced to golang/go#24285, which in turn was
traced by me to the fact that golang os/exec tests close all file descriptors >
4 assuming that only stdin, stdout, stderr, epoll and testlog file descriptors
are there:

	https://github.com/golang/go/blob/26e0e8a840/src/os/exec/exec_test.go#L402
	https://github.com/golang/go/blob/26e0e8a840/src/os/exec/exec_test.go#L415

However when go build is run under slapos it starts with the following
descriptors inherited in the environment:

	lr-x------ 1 slapuser5 slapuser5 64 Apr  3 17:57 0 -> pipe:[2969903608]
	l-wx------ 1 slapuser5 slapuser5 64 Apr  3 17:57 1 -> pipe:[2969903609]
	l-wx------ 1 slapuser5 slapuser5 64 Apr  3 17:57 2 -> pipe:[2969903609]
	l-wx------ 1 slapuser5 slapuser5 64 Apr  3 17:57 3 -> /srv/slapgrid/slappart5/srv/runner/software.log

i.e. the file descriptor for software.log (under `slapos node software`) is
passed opened to spawned buildout commands -> and oops - go build failure.

I'm not saying go behaviour is correct. However for slapos it is also not good
to leak e.g. log file descriptors to spawned processes (e.g. what if spawned
child writes there by mistake?).

Since fixing on Go side is not very easy, because it will probably require
non-small testing infrastructure refactoring, I decided to fix on SlapOS side,
because it is a) easier, and b) correct & good to do in any way.

----

To fix we just pass close_fds=True to subprocess.Popen run from under
SlapPopen, which means that popen will close all file descriptors besides
stdin, stdout & stderr for child. close_fds default is already True for python3

	https://www.python.org/dev/peps/pep-0433/#subprocess-close

for the reason that it is frequent mistake for programmers to not leak file
descriptors to spawned processes.

However on python2, which we are still using, close_fds default is False.

So let's fix the default on our side.

/reviewed-by @jerome, @rafael
/reviewed-on https://lab.nexedi.com/nexedi/slapos.core/merge_requests/42
@navytux

This comment has been minimized.

Copy link
Contributor

commented Apr 4, 2018

For the reference I'm also having this problem when building Go1.10 under SlapOS: the builder was (wrongly) leaking opened descriptor to log file and this is causing the same

testing: can't write /tmp/go-build685356926/b001/testlog.txt: write /tmp/go-build685356926/b001/testlog.txt: broken pipe

failure.

Things are now fixed on SlapOS side, but it will take time for the fix to propagate everywhere.

On the other hand for the Go side maybe it makes more sense to not assume fixed fd number for the testlog.txt file and do things similarly to how os/exec test is already checking for file descriptor used by runtime for polling:

func closeUnexpectedFds(t *testing.T, m string) {
        for fd := basefds(); fd <= 101; fd++ {
                if fd == poll.PollDescriptor() {          // <- NOTE here
                        continue
                }
                err := os.NewFile(fd, "").Close()
                if err == nil {
                        t.Logf("%s: Something already leaked - closed fd %d", m, fd)
                }
        }
}

(https://github.com/golang/go/blob/26e0e8a840/src/os/exec/exec_test.go#L415)

because after all it is a regression compared to Go1.9 which did not had this problem.

Thanks beforehand,
Kirill

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Apr 12, 2018

Unfortunately I don't think that approach will work, as the test will still fail. It will fail because the child process will see the testlog argument but will not have interpreted it so the actual testlog file will not be known. And it won't work because it will still see an unexpectedly open file descriptor, although at least in that case the error message will be more clear.

@ar4mirez

This comment has been minimized.

Copy link

commented May 20, 2018

This same issue happen to me and I found that its related with the terminal, for example I was using hyper terminal and once I tried in iterm2 and the regular mac terminal worked.

I hope this helps, cheers.

@jefferai

This comment has been minimized.

Copy link

commented Aug 31, 2018

Seeing this issue on an Ubuntu 18.04 machine (using Konsole, FWIW) when running the 1.11 all.bash file.

@dhobsd

This comment has been minimized.

Copy link
Contributor

commented Feb 8, 2019

This is breaking our internal builders for our host toolchain for Fuchsia as well. Our CI/CD hooks up an extra pipe into the subprocess it calls, and there's not a lovely workaround for this.

@ianlancetaylor What if this test has an init function that runs on systems that provide /proc/self/fd and use the largest fd number returned from there? And then basefds returns that value instead of its current dance if that value is nonzero?

@dhobsd

This comment has been minimized.

Copy link
Contributor

commented Feb 8, 2019

(Or alternatively just closed fd 3 in init.)

@dhobsd

This comment has been minimized.

Copy link
Contributor

commented Feb 8, 2019

Oh, right, none of that works, ignore me. I'll see what I can do about closing the descriptor before running run.bash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.