Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: Feature Request: Better flaky test support #27181

Closed
hklai opened this issue Aug 23, 2018 · 8 comments

Comments

Projects
None yet
4 participants
@hklai
Copy link

commented Aug 23, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.10 darwin/amd64.

Does this issue reproduce with the latest release?

N/A

What operating system and processor architecture are you using (go env)?

go version go1.10 darwin/amd64
hklai-macbookpro2:e2e hklai$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/hklai/Library/Caches/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/hklai/go"
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/3q/dk8r61l944q42b8g7d2bq57r005d_5/T/go-build136728580=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

N/A

What did you expect to see?

Better flaky test support, as outlined in the following areas:

  1. Provide a way to annotate flaky tests. For example, there can be a flaky() method in the testing package, and go developers can add a line like "testing.flaky()" in test methods to indicate the tests are flaky. Obviously, this can be done using other mechanisms as well but this needs to be express-able in the test method level (i.e. build tag will not work).

  2. "go test" retries flaky tests natively, if the tests are marked as flaky.

  3. Test options to control the above retry behavior (i.e. number of retries until test passes, delay between retry, etc).

  4. Test retry should work with the -json option, such that downstream processing can determine if a test had been retried, and/or how many times a test is retired.

I understand bazel provides some form of flaky test support, but it does not work well for us because:

  1. the bazel flaky support is per test rule which is not granular enough.
  2. We consciously decided against the use of bazel in our project.

In absence of the above, our only option is to implement a wrapper program that calls "go test", and then run "go test" again on the failed tests that are known to be flaky.

What did you see instead?

N/A

@hklai

This comment has been minimized.

Copy link
Author

commented Aug 23, 2018

And to be clear, flaky tests are bad and they should be fixed ideally. However, flaky tests do happen from time to time and feature like this is important for productivity and velocity reasons.

And here is the use case I have in mind:

  1. When a test starts to flake, developers can quickly annotate it and "go test" can take care of retrying it, to reduce of chance of test job failing, which can potentially block other developers from merging code. This reduces disruption to dev cycle as developers do not need to manually re-run these flaky test jobs, and their presubmit jobs are more likely to pass.

  2. The developers who are responsible for the flaky tests will fix them and eventually remove the flaky annotation.

Another approach is to completely disable flaky tests in CI but it clearly reduces test coverage, and these tests can be further broken before they can be restored.

CI can be more resilient to flaky tests and developers won't get blocked just because a test starts to flake.

This is also why test retry should be noted so that people can take action to address them as well.

@ianlancetaylor ianlancetaylor changed the title Feature Request: Better flaky test support testing: Feature Request: Better flaky test support Aug 23, 2018

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2018

I disagree pretty strongly. The point of a test is to detect an error. A flaky test is worthless, because you don't learn anything when it fails. It just wastes time. If a test is potentially flaky, then it should be rewritten. A typical simple rewrite is to loop enough times to make it extremely unlikely that all iterations will fail for flaky reasons.

With that attitude, I think the only support the testing package needs for flaky tests is t.Skip, and we already have that.

@hklai

This comment has been minimized.

Copy link
Author

commented Aug 23, 2018

As mentioned above, t.Skip() (i.e. disable test) is an alternative, but it reduces coverage. The related feature can be broken by other changes and it will be more difficult to reenable the test.

A flaky test is bad, but it is not always worthless. As the project and number of developers grow, a 10% flaky test affects more and more developers, but the fact that it passes 90% of the time is still a valuable signal. I totally agree that all flaky tests need to be fixed/addressed, but until then, we also want minimize the impact to the 10% developers.

Correct me if I am wrong, but I think the simple rewrite suggestion won't work in case TestMain() is involved. A test may require setup/cleanup steps in TestMain() that are not available in the test method itself.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2018

Yes, if TestMain is involved using a loop is harder, but it's still possible.

If your goal for "things are OK" is that your test passes 90% of the time, then you should write your test that way. Don't write it so that go test fails 10% of the time. At least, that's how I see it.

@hklai

This comment has been minimized.

Copy link
Author

commented Aug 23, 2018

No I am not saying tests passing 90% is OK.

Having flaky tests is not an end state. Over the course of development, some tests are going to become flaky for various reasons. The end goal is to fix the cause of flakiness and make the tests not flaky, but until then, we are hoping to minimize impact to developers (i.e. presubmit failing) without having to remove the test.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 24, 2018

Just as the language does not permit unused imports or unused variables, I don't see a compelling reason that the testing package should provide additional mechanisms for permitting flaky tests.

@mark-rushakoff

This comment has been minimized.

Copy link
Contributor

commented Aug 24, 2018

I don't see why this couldn't be solved in a third party package.

func TestThingThatMayFlake(t *testing.T) {
  otherpkg.RerunOnFlake(t, func(t testing.TB) {
    if err := ThingThatMayFlake(); err != nil {
      t.Fatal(err)
    }
  })
}
@andybons

This comment has been minimized.

Copy link
Member

commented Sep 4, 2018

I am in agreement with @ianlancetaylor. I don't believe we should be adding additional functionality to the go tool that placates (and inherently encourages) flaky tests.

@andybons andybons closed this Sep 4, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.