Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: temporary directories left behind on windows-amd64-2012 builder #30789

Closed
bcmills opened this issue Mar 12, 2019 · 14 comments

Comments

Projects
None yet
4 participants
@bcmills
Copy link
Member

commented Mar 12, 2019

The logs for the windows-amd64-2012 builder (and only that builder, as far as I can tell — CC @alexbrainman) show many flakes at the end of the cmd/go tests, of the form:

2019/03/12 20:59:55 unexpected files left in tmpdir: [go-build421361794 go-build697015714]

(https://build.golang.org/log/177e49c04af866fec137d316334a80a1a5137376)

The test otherwise seems to pass.

That log line comes from here (CC @ianlancetaylor):

log.Fatalf("unexpected files left in tmpdir: %v", names)

The fact that the temporary directories start with the prefix go-build suggests that they come from here:

tmp, err := ioutil.TempDir(os.Getenv("GOTMPDIR"), "go-build")

I'm not sure what to make of the fact that those flakes are specific to the one builder. Perhaps there is a kernel quirk on that specific version of Windows that causes it not to clean up completely on exit?

(CC @jayconrod)

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Mar 13, 2019

The logs for the windows-amd64-2012 builder (and only that builder

Strange. I cannot explain why windows-amd64-2012 builder is different from others.

The fact that the temporary directories start with the prefix go-build suggests that they come from here:

I did not debug this failure. But, generally, you cannot delete directory, if some process have open files in that directory. Same if some process current directory is set to directory being deleted or any directory under.

I suppose someone needs to reproduce this, and then somehow debug this. Once we know how it happens, we might understand better why only windows-amd64-2012 have the problem.

Alex

@bradfitz

This comment has been minimized.

Copy link
Member

commented Apr 5, 2019

I spent some time debugging this with gomote and logging in cmd/go TestMan & temporary hooks/logs in io/ioutil.TempDir.

It seems that whatever is making those directories that don't get deleted do NOT come from the same process that runs the test (running the TestMain). I see tons of temp dirs get created from the cmd/go (TestMain) process, but not the ones that cause failures, so I suspect it's a child process (where my logging hook isn't installed) that is creating the directory.

Time for more logging, I guess.

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Apr 5, 2019

I spent some time debugging this with gomote

I also would not be surprised that this bug is not Go related.

Maybe antivirus is running (or some other Windows program, for example, something that indexes files on disk) and locking our executables at random times. If windows-amd64-2012 computer is configured in a slightly different way than others, that would explain why we see this problem on windows-amd64-2012 only.

Alex

@bcmills bcmills self-assigned this Apr 11, 2019

@gopherbot

This comment has been minimized.

Copy link

commented Apr 11, 2019

Change https://golang.org/cl/171764 mentions this issue: cmd/go: log failures to remove workdir

@bcmills

This comment has been minimized.

Copy link
Member Author

commented Apr 11, 2019

I'm getting tired of builder flakes. Time to find this and either fix it or work around it.

gopherbot pushed a commit that referenced this issue Apr 11, 2019

cmd/go: log failures to remove workdir
Failures here don't otherwise affect the build, but they do cause a
slow file leak in the user's temp directory. The user deserves at
least a cursory warning that something may be amiss.

Updates #30789

Change-Id: Id0e72b1967e7f7c88cdc94d532554496653f264b
Reviewed-on: https://go-review.googlesource.com/c/go/+/171764
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Jay Conrod <jayconrod@google.com>
@alexbrainman

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

I'm getting tired of builder flakes. Time to find this and either fix it or work around it.

I did some exercise #25965 (comment) maybe related to this issue. And I can see vastly different results between my Windows 7 and Windows 10 computers. Maybe similar effect can explain what we see with windows-amd64-2012 builder.

Alex

@bradfitz

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

With Bryan's latest debugging CL, they log Access is denied now:

https://build.golang.org/log/47129350051d5b9c0600f9421c2680d7610bd4cd

ok  	unicode/utf8	0.035s
ok  	cmd/addr2line	1.594s
ok  	cmd/api	0.063s
ok  	cmd/asm/internal/asm	0.383s
ok  	cmd/asm/internal/lex	0.023s
ok  	cmd/compile	0.026s
go: failed to remove work dir: remove C:\Users\gopher\AppData\Local\Temp\1\go-build328544146\b001\compile.test.exe: Access is denied.
ok  	cmd/compile/internal/gc	13.031s
ok  	cmd/compile/internal/ssa	0.380s
ok  	cmd/compile/internal/syntax	0.039s
ok  	cmd/compile/internal/test	0.060s [no tests to run]
ok  	cmd/compile/internal/types	0.043s
ok  	cmd/cover	2.564s
ok  	cmd/doc	0.109s
ok  	cmd/fix	7.416s
go test proxy starting
go proxy_test: invalid module path encoding "example.com/invalidpath/v1": malformed module path "example.com/invalidpath/v1": invalid version
go test proxy running at GOPROXY=http://127.0.0.1:50896/mod
go proxy: no archive golang.org/x/text/language 14c0d48
go proxy: no archive golang.org/x/text/language 14c0d48
go proxy: no archive golang.org/x/text/language 14c0d48
go proxy: no archive golang.org/x/text/foo 14c0d48
go proxy: no archive golang.org/x 14c0d48
go proxy: no archive golang.org 14c0d48
go proxy: no archive rsc.io v1.5.0
go proxy: no archive example.com/split/subpkg v1.0.0
go proxy: no archive this.domain.is.invalid/somemodule v1.0.0
go proxy: no archive this.domain.is.invalid/somemodule v1.0.0
PASS
2019/04/11 22:08:38 unexpected files left in tmpdir: [go-build125570794 go-build372752938 go-build463765546 go-build666810130]
FAIL	cmd/go	181.572s
2019/04/11 22:08:39 Failed: exit status 1
@bcmills

This comment has been minimized.

Copy link
Member Author

commented Apr 12, 2019

So perhaps we should check for the Access is denied error code specifically and retry (a few times or for a short while in a loop) before we give up and return with the directory leaked.

We probably don't want to stall forever if the user has a really slow antivirus installed or something, but a wait of ~500ms or so seems reasonable to me. (Any opinions from the other folks following?)

@bradfitz

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

Retrying for a bit on Access is denied SGTM. Another option might be to only do it if we're in a test, but that might be surprising.

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

Retrying for a bit on Access is denied SGTM.

I second that.

wait of ~500ms or so seems reasonable to me

I agree. My experiment #25965 (comment) has -sleep parameter that puts sleep between process exiting and executable deleting, and having -sleep=500ms makes my test pass.

Unrelated, but should we also remove

defer time.Sleep(5 * time.Millisecond)

from windows version of (Process).wait now that it serves no purpose (see #25965 for details)? I find that code is mostly not needed on Windows 10. That code is only needed if you are trying to delete executable immediately after its execution on older Windows version.

Alex

@alexbrainman

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

but should we also remove

defer time.Sleep(5 * time.Millisecond)

Forget that suggestion - I am getting exited easily now days. We still have plenty of tests that build executable in a temp directory, run the executable and then delete the directory. These all will fail without time.Sleep. Probably many other developers have tests like that too.

Alex

@gopherbot

This comment has been minimized.

Copy link

commented Apr 16, 2019

Change https://golang.org/cl/172337 mentions this issue: cmd/go: retry RemoveAll(workdir) for up to 500ms

gopherbot pushed a commit that referenced this issue Apr 16, 2019

cmd/go: retry RemoveAll(workdir) for up to 500ms
On some configurations of Windows, directories containing executable
files may be locked for a while after the executable exits (perhaps
due to antivirus scans?). It's probably worth a little extra latency
on exit to avoid filling up the user's temporary directory with leaked
files.

Updates #30789

Change-Id: Iae7fcdd07fb9ecfb05967cfe0c8833db646d2f85
Reviewed-on: https://go-review.googlesource.com/c/go/+/172337
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@bcmills

This comment has been minimized.

Copy link
Member Author

commented Apr 16, 2019

So far so good. Will keep an eye on the builders today to verify that this is resolved.

@bcmills bcmills closed this Apr 16, 2019

@bcmills

This comment has been minimized.

Copy link
Member Author

commented Apr 16, 2019

Five oks in a row. I'm calling it fixed. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.