New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/go: test cache hash inputs include ModTime, often causing cache misses on CI #58571
Comments
isn't this sort of necessary if go doesn't track the contents of external files? |
@mvdan Would it be possible for you to first run some script that sets the mtime of the relevant files to the last commit time of that file? |
see also #22593 |
@seankhliao You're right that any opened files don't currently get their content tracked in the test cache inputs:
However, I think that the logic is somewhat flawed and could be improved. For example, we're already getting the file information, so if a file is small (say, less than 256KiB), we could read and hash its contents rather than using its modified time for the cache input. This would automatically solve almost all
I could, yes, but having to do that for every project is quite a lot of work. Plus, I didn't even notice that CI on multiple of my projects was slow due to this bug for years. I think it's unfortunate for Go's test cache to not work well on CI systems by default. There's also the problem with scripting CI in general. The majority of these projects test on Linux, Mac, and Windows. Even writing a POSIX Shell or Bash script to run on all of these is rather tricky. |
It looks like there is a third-party |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, verified with Go 1.20.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Many of the projects I contribute to use
go test ./...
on GitHub Actions. A majority of those also use GitHub's https://github.com/actions/cache to persist Go'sGOCACHE
andGOMODCACHE
, so that consecutive CI runs should avoid much of the work if little code has changed.This works for the most part, but I've noticed that commits which do not change any code at all (e.g. only modify the README) only show some packages as
(cached)
, whereas other packages are always run from scratch. For example, see cue-lang/cue#2253.After investigating quite a bit with
GODEBUG=gocachehash=1
and itstestInputs
hashes, and reading throughsrc/cmd/go/internal/test/test.go
, I think the cause is modified times. Whenever a test opens a file, Go's automatic test caching tracks that file's path, file info, and contents to know whether it should re-run the test in the future. The file info includes the modified time:Including the modified time may be needed for the test cache's own purposes, and in general it's not a problem when continuously developing on a machine. However, CI is a bit special: it's quite common to run in a temporary environment and start by fetching the code. In particular, the way GitHub Actions is used as CI tends to start with https://github.com/actions/checkout, which uses
git clone
to download the repository to a path like/home/runner/work/foobar/foobar
. Note that the clone directory location does not change between runs; I first suspected that was the problem, but it is not.The problem is that
git
does not store modified times in any way, andgit clone
creates new files when it's checking out a repository, so the cloned files end up having the current time as their modified time.As a reproducer and demo, I've written https://github.com/mvdan/nowt/tree/master/test-cache-open-modtime. It contains a single Go test which opens a testdata file, and a script which reproduces the situation:
What did you expect to see?
I expected modified times to not affect Go's test caching hit vs miss decisions. Particularly since Go largely allows its developers to not be aware of modified times for incremental builds to work well, unlike other build systems like
make
.What did you see instead?
The way most people use CI on popular platforms like GitHub Actions causes unnecessary Go test cache misses, and it's caused by modified times.
I could fix this for my projects by following each
actions/checkout
step with a script to walk all cloned files and reset their modified times to a static timestamp. However, that feels very unfortunate, and I imagine many Go users might not even notice that their CI is slower than it should be to begin with.We could also try to convince third parties like
git clone
oractions/checkout
to leave reproducible modified times on files, but I don't think that is likely to happen.The text was updated successfully, but these errors were encountered: