Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: testing: add testing.T.Volatile to exempt test functions from result caching #67568

Closed
Merovius opened this issue May 22, 2024 · 12 comments
Labels
Milestone

Comments

@Merovius
Copy link
Contributor

Merovius commented May 22, 2024

Proposal Details

While generally, a test function should be self-contained and isolated to prevent flakes, there are cases where a test needs to interact with state not reflected in the build artifact. For example an integration test, that tests interaction with a CLI tool or relies on a separate database server. For such tests, the current behavior of the Go test cache is counterproductive: It might conclude that since the code itself has not changed and re-use a cached test run, even if the external resource has changed and thus the test needs to be re-run.

It is possible to pass -count=1 to force a re-run, but this is flawed as well, as it applies globally. If you want to run all tests in a large project and run go test -count=1 ./..., that will re-run all tests, even though most of them might be perfectly cacheable. There is a reason, after all, that the test cache was introduced.

So it might be useful to have a way to indicate that one specific test function depends on external state, that makes it non-cacheable.

Thus, I propose to add a method to *testing.T to exempt a function from result caching:

// Volatile marks the current test function as exempt from result caching.
// A test function marked as volatile is re-run, even if none of the Go source it uses has been modified.
func (t *T) Volatile()

The call would just be recorded in the *T and after the test function is finished, its result would not be stored in the cache if Volatile was called. Thus no other tests are affected by use of Volatile.

One drawback of this mechanism is that it again runs all volatile tests on every invocation of go test, even if the external resources did not change. Especially if Volatile is overused, that would again increase test running times. Theoretically, it would be possible to prevent that by allowing the test function to include a token that encapsulates the external state (e.g. by hashing the version of the database server or the contents of the called binary), though this might be too finicky, ultimately:

// Volatile controls results-caching for the current test function.
// The given tag is persisted with the test result in the test cache. If a future run calls
// Volatile with the same tag, it behaves like `Skip`, except that the cached test result is output.
func (t *T) Volatile(tag uint64)

I'm not terribly sure this is a good idea. It came up in a discussion on Mastodon and it doesn't seem like a completely unreasonable compromise. So I wanted to propose it formally, to gather some feedback.

@gopherbot gopherbot added this to the Proposal milestone May 22, 2024
@seankhliao
Copy link
Member

related #23799

@ianlancetaylor ianlancetaylor changed the title proposal: testing: add testing.T.Volatile to excempt test functions from result caching proposal: testing: add testing.T.Volatile to exempt test functions from result caching May 22, 2024
@rsc
Copy link
Contributor

rsc commented May 23, 2024

I still agree with my old comment #23799 (comment): some marker is fine but the default behavior of 'go test' should still be to cache things. We could have 'go test -volatile' to explicitly rerun volatile tests even when the result is cached from last time, but I think that's indistinguishable from -count=1.

@rsc
Copy link
Contributor

rsc commented May 23, 2024

Also reading the Mastodon thread it seems like the first thing to understand is https://hachyderm.io/@elan@publicsquare.global/112481172747959065. That has nothing to do with volatile tests.

@rsc
Copy link
Contributor

rsc commented May 23, 2024

I posted https://hachyderm.io/@rsc/112491348314936122 over on Mastodon about the potential build/test cache misbehavior.

@rsc
Copy link
Contributor

rsc commented May 23, 2024

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@apparentlymart
Copy link

Making this require a command line option feels strange to me, because a testing being volatile is a property of the test, rather than a property of the test run or a personal preference of the person running the test suite. Relying on a cached result of a volatile test is a mistake.

I could imagine a compromise where volatile tests don't run at all by default; e.g. perhaps t.Volatile() behaves like t.Skip("volatile tests not enabled") unless the -volatile option is present. That would avoid implying that cached results are valid for that test, but also seems surprising.

I have some prior art of something similar to that in some of the software I maintain in my dayjob: we have some integration tests that access external network services to verify correct behavior, and currently have those tests call t.Skip if an opt-in environment variable isn't set. That then allows someone running the entire test suite to not be bothered by them and their potential flakiness, but still makes it relatively easy to include the integration tests in the full run when that's appropriate.

However, the primary motivation for that is not that the tests are "volatile" but rather than they can only work when the person running the test has a suitable network connection to access the relevant services. They are volatile as defined by this proposal, but that's not the main motivation for skipping them by default, so this example might not be relevant.

@rsc
Copy link
Contributor

rsc commented May 23, 2024

Ultimately, beliefs about what is correct here depend on why you want to run the test: is it to test that the code works, or is it to test that the external, volatile dependencies still work?

Most of the time, you run tests to check the source code in the repo where you are working. In that context, if you've run go test ./... and they all pass, and then you edit one source file, and then you run go test ./... again, you only want it to run tests that could have been affected by that source code change. Tests that are literally the exact same binary before and after your source code change don't need to run again. Only re-running tests affected by recent source code changes is the whole purpose of the test cache. Defaulting to rerunning some tests always completely defeats the purpose, slows down development, and scales especially badly in large repositories. The volatile tests are also often the slowest ones. Those are the last ones you want to run when you haven't been editing anything related to the test.

Sometimes you are running tests to check a full end-to-end behavior, including that external, volatile dependencies are still working as expected. In that case, using -volatile might be helpful, but it is very close to -count=1.

@apparentlymart
Copy link

apparentlymart commented May 23, 2024

I think perhaps the main point of contention here may be whether "volatile tests" are common enough and useful enough to deserve different treatment than unit tests?

For what it's worth, I agree that the caching mechanism as currently implemented is a fine answer for unit tests, and that unit tests are likely the most common situation and therefore it makes sense for the default behavior to prioritize those.

However, I read this proposal as extending Go's testing facilities to better support integration-type tests (aka "end-to-end tests") so that those who wish to write them and use them don't need to fight the test harness to use them effectively. Thinking that you've got a passing integration test suite when really all you've done is recalled an older positive result from the cache is a real hazard that I've encountered myself and have seen others on my team encounter.

I do find your argument convincing @rsc -- particularly the part of only wanting to run these relatively-expensive tests once you've got fully-passing unit tests. To me, that seems to favor the compromise of making uncacheable tests not run by default, but for -volatile to (unlike -count=1) still use the test cache to avoid unnecessarily re-executing the cacheable tests. The fact that skipped tests are not very visible in the output by default would probably still lead to someone thinking the tests are all passing when they aren't, though, so I'm not sure that really solves the problem either.

(I could also perhaps accept the argument that go test is primarily designed for unit tests and that integration tests are different enough to deserve their own entry point, separate from go test. They also tend to require special configuration that's awkward to provide through the go test wrapper, which our projects tend to stuff awkwardly into environment variables but could instead be a configuration file or plain CLI arguments passed to a normal program -- a package main -- that runs the integration tests. 🤔 )


EDIT: I came back to this some days later and found myself more receptive to the idea that go test is only for tests of kinds that benefit from caching -- typically self-contained tests that don't depend on anything other than the code that the Go toolchain can "see" -- and that a different strategy would be more appropriate for testing systems outside of the codebase, or testing using systems outside of the codebase.

However, the go test behavior of automatically collecting up all of the test functions and generating an entry point that calls them is pretty convenient, vs. hand-writing a func main() that explicitly calls a bunch of integration tests.

Perhaps a different way to look at this, then, is to expose that test-program-building behavior callable from outside of the go test command, so that those who want to write an integration test suite can write their own tool to call it, separate from go test.

I would imagine the test-program-building helper allowing its caller to customize the filename suffix to search for and the function name prefixes to search for, so that e.g. an integration test suite could use functions named with IntegrationTest as the prefix instead of Test, and thus the two suites could coexist in the same codebase but run separately.

I realize this is a totally separate proposal, and so am sharing this only to say that my thinking evolved since I originally wrote this comment. I would suggest against discussing this alternative proposal here (whether in favor or against) to avoid derailing the current discussion, but if anyone is feeling a certain way about what I wrote here then perhaps they'd like to use the 👍 or 👎 reactions to signal that, and if it seems like it's an interesting direction I will write up a separate proposal for it.

@rsc
Copy link
Contributor

rsc commented May 29, 2024

I think we understand the relevant issues. We could implement -volatile but it doesn't seem like the right answer here. For now it seems like -count=1 is as good as anything else.

@rsc
Copy link
Contributor

rsc commented May 29, 2024

(There is the separate issue of the Mastodon report that test caching was horribly broken and not noticing source file edits. That's a separate discussion if they can reproduce it.)

@rsc
Copy link
Contributor

rsc commented May 30, 2024

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Jun 5, 2024

No change in consensus, so declined.
— rsc for the proposal review group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Declined
Development

No branches or pull requests

5 participants