Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/go: Need an easy/obvious way to copy testdata from module cache to a test-running directory #50312

Open
dr2chase opened this issue Dec 22, 2021 · 12 comments

Comments

@dr2chase
Copy link
Contributor

@dr2chase dr2chase commented Dec 22, 2021

To test, benchmark, or fuzz-test downloaded software modules, the current recipe (at least, the one that I learned) is

mkdir foo
cd foo
go mod init foo
go get -d -t -v somepackage@someversion
go test -c -o foo somepackage
./foo -test.bench=BenchmarkSomething

The problem here is that test and benchmarking often depend on files in testdata. It is often possible to run the test in the module cache, but this depends on the test/benchmark not writing to the testdata directory, and also depends on knowing the directory in the module cache (see workaround below). Fuzzing that finds a failure does write to the testdata directory, so this just doesn't work for new fuzzing.

One workaround is

cp -rp `go list -f {{.Dir}} somepackage`/testdata .

but this is not something that people find or figure out easily.

I propose adding an option to go get or a subcommand to go mod to copy testdata, if it exists in the downloaded package, into the current directory. For example, go get -d -t -T -v somepackage@someversion or go mod testdata.

@gopherbot gopherbot added this to the Proposal milestone Dec 22, 2021
@prattmic
Copy link
Member

@prattmic prattmic commented Dec 22, 2021

Rather than building the test binary (go test -c) and running it separately, I presume if you used go test -bench=BenchmarkSomething somepackage then cmd/go would just Do The Right Thing.

Could you elaborate on why you don't do that?

My guess would be to exclude time taken to build the test from running the test. If so, then perhaps (just throwing out ideas) an alternate approach would be for go test -c to save the test binary in the cache directory so that subsequent go test runs don't need to build.

@dr2chase
Copy link
Contributor Author

@dr2chase dr2chase commented Dec 23, 2021

I don't do that (in bent) because

  1. Bent does repeated runs (25) of A, B, A, B, benchmarking of baseline and tip
  2. Bent does repeated timed builds (sometimes 25), also A, B, A, B (optionally completely randomized, I need to do the same for benchmark running) to get a benchmark of the time to build the binary.
  3. To guard against user-error when compiler people (e. g. me) are running benchmarks of compiler modifications and with experimental flags, bent repeatedly clears the cache.

On the other hand, bent already deals with the testdata problem. I definitely need at least the interleaved run order unless benchmarks are run on a very controlled machine, "stuff" just happens sometimes. (I.e., the purpose of bent is to get rid of all my benchmarking and testing speedbumps and footguns -- and immutable testdata in the module cache is one of those speedbumps).

That also doesn't help fuzzing, and some benchmarks/tests assume (incorrectly, and I sometimes submit pull requests to fix them) that they can write to testdata.

An alternate approach for fuzzing (because bent has solved the problem in its case) might be to have fuzzing copy testdata out of the module cache into the current directory?

@katiehockman
Copy link
Member

@katiehockman katiehockman commented Dec 23, 2021

@josharian
Copy link
Contributor

@josharian josharian commented Dec 24, 2021

//go:embed seems like a useful tool here. Maybe cmd/go and/or package testing could make that easier. As a straw man, cmd/go could embed testdata by default and package testing could expose a fs.FS with their contents.

@FiloSottile
Copy link
Contributor

@FiloSottile FiloSottile commented Jan 4, 2022

Strong +1 on providing an fs.FS with the contents of testdata. go test -c would use embed, while go test would just use the filesystem. I think I would use that even if not moving the binary, as it's more obviously correct than relying on the cwd.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals Jan 5, 2022
@rsc rsc moved this from Incoming to Active in Proposals Jan 12, 2022
@rsc
Copy link
Contributor

@rsc rsc commented Jan 12, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

@rsc rsc commented Jan 19, 2022

When you run go test foo it runs in the module directory.
Why not do that?
Why copy the data out of the module directory?
In general it's not clear exactly what to copy.
For example compress/gzip etc use the compress/testdata directory.
So in general it's not 1:1.

You can do what go test does using cd $(go list -f '{{.Dir}}' pkg).

@josharian
Copy link
Contributor

@josharian josharian commented Jan 19, 2022

When you run go test foo it runs in the module directory. Why not do that?

A common workflow for me (to catch flaky tests) is to go test -c and then run, often with a wrapper. And it's an annoyance that I then have to move around that binary to the right place.

In general it's not clear exactly what to copy.

Sure, but that doesn't mean we can't make the common case (testdata subdir) easy.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 19, 2022

cp -r $(go list -f '{{.Dir}}' pkg)/testdata .

@josharian
Copy link
Contributor

@josharian josharian commented Jan 19, 2022

Yes, you can also copy the testdata instead of moving the binary. This is a question of "just works"/"batteries included", vs not.

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 19, 2022

Sorry, I thought this issue was about copying the testdata.

Oh, now I see that @dr2chase suggested the cp approach earlier. Are we just talking about a more convenient way to do that?

@dr2chase
Copy link
Contributor Author

@dr2chase dr2chase commented Jan 20, 2022

The current way of doing it requires knowing the where's-my-module-cache incantation, which I think is a too-large speedbump.

With the benefit of additional hindsight, -ALSO_PUT_A_COPY_IN_THE_CURRENT_DIRECTORY, which also places a copy of all the files and subdirectories for the referenced package into the current directory, might be a good choice.

mkdir foo
cd foo
go mod init foo
go get -d -t -v -ALSO_PUT_A_COPY_IN_THE_CURRENT_DIRECTORY somepackage@someversion
go test -c -o foo somepackage
./foo -test.bench=BenchmarkSomething

The reason for this is that tests already exist that variously

  1. write files into testdata
  2. read files from other subdirectories
  3. read files from the source directory
  4. fuzzing writes into testdata

As far as I know, none of our tools will complain at people for doing any of these things, therefore, they have already done it, and will continue to do it. I am open to suggestions about a better name for the option. The benchmark runner (bent) contains (or will contain, once CLs are submitted) workarounds for all these things.

For the commands above, where the current directory is treated as containing "build instructions", the go files in the current directory are ignored. Building "." instead of somepackage@someversion generates a request to "go mod tidy" and then builds a same-behaving binary that does not compare equal. (I am generating these answers by running experiments, figuring that there would be questions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants