New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: cache not invalidated if testdata files are changed while the test is running #26790

Open
bcmills opened this Issue Aug 3, 2018 · 5 comments

Comments

Projects
None yet
2 participants
@bcmills
Member

bcmills commented Aug 3, 2018

I was editing cmd/go/testdata/mod_test.txt for #26722, and saw an unexpected (cached) test result.

I run tests pretty aggressively while editing, so I suspect that I saved the file while a previous test run was in progress. We appear to have cached the result based on the contents of the testdata directory after the run completed, not the contents at the start of the run.

~/go/src$ go test cmd/go -run=TestScript
ok      cmd/go  (cached)

~/go/src$ GODEBUG=gocacheverify=1 go test cmd/go -run=TestScript
go test proxy starting
go test proxy running at GOPROXY=http://127.0.0.1:38657/mod
go proxy: no archive w.1 v1.2.0
go proxy: no archive x.1 v1.0.0
go proxy: no archive z.1 v1.2.0
go proxy: no archive golang.org/x/text/language 14c0d48
go proxy: no archive golang.org/x/text/language 14c0d48
go proxy: no archive golang.org/x/text/language 14c0d48
go proxy: no archive golang.org/x/text/foo 14c0d48
go proxy: no archive sub.1 v1.0.0
go proxy: no archive badsub.1 v1.0.0
go proxy: no archive versioned.1 v1.0.0
go proxy: no archive versioned.1 v1.1.0
--- FAIL: TestScript (0.00s)
    --- FAIL: TestScript/mod_test (1.28s)
        script_test.go:150:
            # A test in the module's root package should work. (1.107s)
            # A test with the "_test" suffix in the module root should also work. (0.171s)
            > cd ../b/
            $WORK/gopath/src/b
            > go test
            [stderr]
            build github.com/user/b_test: cannot find module for path github.com/user/b_test
            [exit status 1]
            FAIL: testdata/script/mod_test.txt:10: unexpected command failure

FAIL
FAIL    cmd/go  9.231s

~/go/src$ go test cmd/go -run=TestScript
ok      cmd/go  (cached)

~/go/src$ go version
go version devel +fc1a18c452 Fri Aug 3 11:36:32 2018 -0400 linux/amd64

@bcmills bcmills added this to the Go1.12 milestone Aug 3, 2018

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Aug 3, 2018

That is indeed how it works. We log the files that the test opens as the test runs. Then when the test is complete we go back and read those files while building the cache. See runCache.saveOutput in cmd/go/internal/test/test.go.

This approach works well because all we have to do is log what the test actually does, which we do by, e.g., having os.OpenFile call testlog.Open. In order to avoid the problem you describe, I think we would have to not just call testlog.Open, but have the test actually read and hash the contents of the file. I am concerned that adding these extra operations would perturb the behavior of some tests.

@bcmills

This comment has been minimized.

Member

bcmills commented Aug 3, 2018

@heschik suggested that we could hash the files both before and after the test, and refuse to cache the results if the hash changed during the run.

Ideally, though, we only want to pre-hash the files that will actually be accessed. We could speculate on that set based on previous cached results, but if we missed an addition we'd need to re-run the test to be sure.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Aug 3, 2018

I don't understand how we can hash the files before the test when we don't yet know which files will be opened. For example, the tests in image/gif read files in image/testdata, not image/gif/testdata.

I suppose we could base it on previous test runs, but then we don't catch this problem if this is the first time we're running the test. If we're going for a complex solution it would be nice to have a complete one.

@bcmills

This comment has been minimized.

Member

bcmills commented Aug 9, 2018

I don't understand how we can hash the files before the test when we don't yet know which files will be opened.

I've been thinking about this a bit more, and it seems to me that the solution is pretty straightforward. We can add another entry to the cache for “the set of input file and directory names”. We would read that entry and hash the files and directories in the list, then run the test, and finally hash the files after the run and compare both the list and the hashes.

If the actual list of files accessed is not (a subset of) the cached list, we can cache only the updated set of filenames and not the actual test results. We don't know whether the test results themselves are up-to-date, and don't particularly need to care, because we can always just run the test again the next time it is requested — and, assuming that it is reasonably deterministic, we'll be able to cache that result.

Under that scheme, we'll miss the cache for the first two runs of any new test that accesses external files, but that doesn't seem like that big a deal: it only happens ~once per user per test, and many tests don't read external files anyway.

If it's important to cache the very first run, I suppose we could add some mechanism for users to add the set of test inputs to source control (something along the lines of .d files), but I'd rather not add that unless we really need it.

@bcmills

This comment has been minimized.

Member

bcmills commented Oct 25, 2018

@rsc suggests that we could record the timestamp at the start of the test, then check mtimes when we hash file contents: if the file was modified at or after the start of the test, then we probably shouldn't cache it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment