Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/go: 'go test' fails to cache test result due to modified .git directory #52573

Open
matthewmueller opened this issue Apr 26, 2022 · 7 comments
Open
Labels
WaitingForInfo

Comments

@matthewmueller
Copy link

@matthewmueller matthewmueller commented Apr 26, 2022

What version of Go are you using (go version)?

$ go version
go version go1.18 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/m/Library/Caches/go-build"
GOENV="/Users/m/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/m/dev/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/m/dev"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.18"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/m/dev/src/github.com/matthewmueller/hack/test-cache/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/4f/tcxcr6_55v9bp38d8g4hjlf80000gn/T/go-build1711552986=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

This took me awhile to whittle down. My tests were running unbearably slow and never seeming to cache. I'm still investigating the slowness (this issue describes the symptoms), but I'm able to reproduce the caching issue:

1. Create the following directory tree:

.
└── one_test.go

With the following one_test.go:

package test

import (
	"os"
	"testing"
)

func TestUncacheable(t *testing.T) {
	_, err := os.ReadDir(".")
	if err != nil {
		t.Fatal(err)
	}
}
  1. Run go mod init
  2. Run git init
  3. Run go test ./one_test.go multiple times
$ go test ./one_test.go 
ok      command-line-arguments  0.094s
$ go test ./one_test.go 
ok      command-line-arguments  0.093s

What did you expect to see?

I expected to see it cache after the second run:

$ go test ./one_test.go 
ok      command-line-arguments  (cached)

What did you see instead?

It never caches. If you remove .git it will start caching again. Oddly adding other types of directories like a/b or .a/b still cache. It seems there's something special with .git.

Why do you need this?

My test suite needs to lookup the go.mod of the current project directory. With this logic in place, my tests never cache.

@matthewmueller
Copy link
Author

@matthewmueller matthewmueller commented Apr 26, 2022

Narrowing it down a bit further, I ran GODEBUG=gocachetest=1 go test ./one_test.go -run "TestUncacheable", then removed the irrelevant logs.

Here's a diff between runs. The modtime changes everytime.

  HASH[open]: "go1.18"
  HASH[open]: "stat 160 800001ed 2022-04-26 12:07:04.587272446 -0500 CDT true\n"
  HASH[open]: "file .git "
- HASH[open]: "stat 288 800001ed 2022-04-26 12:07:22.744082806 -0500 CDT true\n"
+ HASH[open]: "stat 288 800001ed 2022-04-26 12:07:42.335762299 -0500 CDT true\n"
  HASH[open]: "file go.mod "
  HASH[open]: "stat 58 1a4 2022-04-26 11:34:12.067721337 -0500 CDT false\n"
  HASH[open]: "file one_test.go "
  HASH[open]: "stat 144 1a4 2022-04-26 11:53:56.630154997 -0500 CDT false\n"
- HASH[open]: 8e895090d73fda1474da8cd9819d3196677d2ffd2cc3e6f7a401b2c6cc09abf4
- HASH[testInputs]: "open /Users/m/dev/src/github.com/matthewmueller/hack/test-cache 8e895090d73fda1474da8cd9819d3196677d2ffd2cc3e6f7a401b2c6cc09abf4\n"
- HASH[testInputs]: 397c0fa5c27c402f4b5be6f2ce32780bbb718ae956556f493dc111fb38089581
+ HASH[open]: 53356a05f8ddd1917cffb837b75f5de58859b18675069e571929e60b2ff7aa5e
+ HASH[testInputs]: "open /Users/m/dev/src/github.com/matthewmueller/hack/test-cache 53356a05f8ddd1917cffb837b75f5de58859b18675069e571929e60b2ff7aa5e\n"
+ HASH[testInputs]: c43d67ffb3fff2baa47ae3a2caef13367214dcf1f3217124453bc035daa4db87

It looks like the logic is in here:

fmt.Fprintf(h, "file %s ", f.Name())

@seankhliao
Copy link
Contributor

@seankhliao seankhliao commented Apr 26, 2022

can you try -buildvcs=false and/or 1.18.1?

@matthewmueller
Copy link
Author

@matthewmueller matthewmueller commented Apr 26, 2022

Update: could be related to my ~/.zshrc config since I'm not able to reproduce this in /bin/sh or in docker

Edit: thanks for the message @seankhliao, I'll try that next if my current lead doesn't pan out.

@matthewmueller
Copy link
Author

@matthewmueller matthewmueller commented Apr 26, 2022

Welp, that was a tough one, it was related to my git prompt.

# Add git to the terminal prompt
git_prompt() {
  # Don't go any further if we're not in a git repo
#  git rev-parse --is-inside-work-tree &> /dev/null || return

  # Stylize
  echo -n "$color_red"
#  echo -n "$(git_branch)"
  echo -n "$color_reset"
  echo -n "$color_purple"
#  echo -n "$(git_progress)"
  echo -n "$color_reset"
  echo -n "$color_yellow"
  echo -n "$(git_dirty)"
  echo -n "$color_reset"
}


# fastest possible way to check if repo is dirty
git_dirty() {
  # If the git status has *any* changes (e.g. dirty), echo our character
  if [[ -n "$(git status --porcelain 2> /dev/null)" ]]; then
    echo " !"
  fi
}

It appears that git status --porcelain changes the modtime on the .git directory.

Thanks for suggestion @seankhliao!

@matthewmueller
Copy link
Author

@matthewmueller matthewmueller commented Apr 27, 2022

I'd actually like to re-open this one, but perhaps more like a feature request. Why is go test looking at .git in the first place?

I might be missing something but I'd expect go test to cache to work something like this:

"Walk the Go files starting from the test, then its imports, imports of imports, hashing all the files it encounters. Now that we have embedded files, we can discover those as well".

A similar solution is how go mod tidy expects to find dependencies that it can reach. If you've got tools, you need to manually add link them into the dependency tree.

I can share more code but something here's the recursive function to hopefully show what I mean:

func findDeps(fset *fileSet, module *gomod.Module, dir string) (err error) {
	imported, err := build.Default.Import(".", dir, build.ImportMode(0))
	if err != nil {
		return err
	}
	// Add all the Go files
	for _, path := range imported.GoFiles {
		fset.Add(filepath.Join(dir, path))
	}
	// Add all the embeds
        // TODO: resolve patterns
	for _, path := range imported.EmbedPatterns {
		fset.Add(filepath.Join(dir, path))
	}
	// Traverse imports and compute a hash
	eg := new(errgroup.Group)
	for _, importPath := range imported.Imports {
		importPath := importPath
		eg.Go(func() error {
			if !shouldWalk(module, importPath) {
				return nil
			}
			dir, err := module.ResolveDirectory(importPath)
			if err != nil {
				return err
			}
			relPath, err := filepath.Rel(module.Directory(), dir)
			if err != nil {
				return err
			}
			if err := findDeps(fset, module, relPath); err != nil {
				return err
			}
			return nil
		})
	}
	return eg.Wait()
}

Feel free to close again or I can open a different issue. Just after spending some time debugging the test cache, it feels like it could be a lot faster.

@bcmills
Copy link
Member

@bcmills bcmills commented Apr 28, 2022

The stat of .git is AFAICT a direct consequence of the test calling os.ReadDir:
https://cs.opensource.google/go/go/+/master:src/os/dir_unix.go;l=128;drc=master

The git prompt may be #51767, which should be fixed in Go 1.18.1.

@matthewmueller, does that explain the behavior you were seeing? Is there anything left to be done here?

@bcmills bcmills added the WaitingForInfo label Apr 28, 2022
@bcmills bcmills changed the title go/test: Uncacheable Test cmd/go: 'go test' fails to cache test result due to modified .git directory Apr 28, 2022
@matthewmueller
Copy link
Author

@matthewmueller matthewmueller commented Apr 29, 2022

Thanks for the response @bcmills!

Yep, I eventually learned that os.ReadDir(".") is stating .git. I also learned that .git gets modified every time you run git status.

The reason I re-opened this is because I still wonder:

Why does go test care about the modtime of .git (or any non-dependency) in its caching decision? I feel like the Go team took a very pragmatic approach with go mod tidy, where it discovers all dependencies by traversing imports. Any unknown tool needs to be explicitly depended upon in a tools.go. Would it make sense to do the same with go test? I understand that go test caching came before go modules, but is it worth proposing this as a future improvement?

While the above seems wasteful, it isn't my top-level problem. My top-level problem is that by considering all these files in the caching decision, the setup and tear down of these tests take longer than the test itself. I commented on that in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WaitingForInfo
Projects
None yet
Development

No branches or pull requests

3 participants