Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable flaky unit tests [NET-5042] #18365

Closed
wants to merge 3 commits into from
Closed

disable flaky unit tests [NET-5042] #18365

wants to merge 3 commits into from

Conversation

nfi-hashicorp
Copy link
Collaborator

@nfi-hashicorp nfi-hashicorp commented Aug 2, 2023

Description

Postulate: flaky tests are net-negative value. Can we even really trust that their positive results are valid?

We should ideally fix flaky tests, obviously. But IME tests are flaky due to hard-to-reproduce timing bugs that only happen in CI.

What should the cutoff be to consider a test "flaky"? I arbitrarily decided 10%.

image

Dashboard

Abandoned Approaches

In #18424, I actually tried to fix these flaky tests by throwing some retries at them without really understanding the problem. Many are still flaky.

In https://github.com/hashicorp/consul/compare/retry-flaky (branched from a former commit of this branch), I tried to just wrap the test function body in a retry. That doesn't quite work for a few reasons, and would actually require more extensive edits.

PR Checklist

  • updated test coverage
  • external facing docs updated
  • appropriate backport labels added
  • not a security concern

@nfi-hashicorp nfi-hashicorp added pr/no-changelog PR does not need a corresponding .changelog entry backport/1.14 backport/1.15 This release series is no longer active on CE. Use backport/ent/1.15. backport/1.16 This release series is no longer active on CE. Use backport/ent/1.16. labels Aug 2, 2023
@github-actions github-actions bot added the theme/cli Flags and documentation for the CLI interface label Aug 2, 2023
@nfi-hashicorp nfi-hashicorp changed the title skip tests that are >= 10% flaky on main retry tests that are >= 10% flaky on main Aug 3, 2023
@nfi-hashicorp
Copy link
Collaborator Author

Hm, I'm not sure this retry approach will work. Whereas gotestsum reruns the test by rerunning the binary, we're doing it with a function. The problem is t.Fail; we can't unmark a failed test. Maybe we do each rerun in a subtest?

@nfi-hashicorp nfi-hashicorp marked this pull request as draft August 8, 2023 19:16
@nfi-hashicorp
Copy link
Collaborator Author

In 846d383 (#18365), I attempted to contain the FailNow with a t.Run. That doesn't work because a FailNow causes the parent to fail, and we want the parent to pass if any of its children pass.

In 5772d44 (#18365), I attempted to avoid the retry lib to get tighter control over the defer/recover handling. Unfortunately I don't think it will work. testing.T#FailNow doesn't panic; it calls runtime.Goexit, which we cannot catch. So we'd need to pass the real test func some testing.T-compatible thing with a fake FailNow. retry.R does this. But we'd need to modify the tests (and called helpers) to work with that, or some interface. A non-trivial change vs just wrapping with gotestsum. :(

Given these issues, I'm going back to draft

@nfi-hashicorp
Copy link
Collaborator Author

I made an attempt to convert flaky tests to using retry.R as much as possible in 2f26b83, but it's not trivial. There are some instances where you have to think about when a retry.R is what you want vs the real testing.T. Some of these tests are kind of wacky, not sure what the intention is in some spots. By this point I probably could have just improved the tests to be less flaky. Probably not tenable to apply in a general way.

@nfi-hashicorp nfi-hashicorp mentioned this pull request Aug 9, 2023
4 tasks
@nfi-hashicorp nfi-hashicorp changed the title retry tests that are >= 10% flaky on main disable flaky unit tests Aug 10, 2023
@nfi-hashicorp
Copy link
Collaborator Author

Back to disable only approach.

@nfi-hashicorp nfi-hashicorp changed the title disable flaky unit tests disable flaky unit tests [NET-5042] Aug 10, 2023
@github-actions
Copy link

This pull request has been automatically flagged for inactivity because it has not been acted upon in the last 60 days. It will be closed if no new activity occurs in the next 30 days. Please feel free to re-open to resurrect the change if you feel this has happened by mistake. Thank you for your contributions.

@github-actions github-actions bot added the meta/stale Automatically flagged for inactivity by stalebot label Oct 10, 2023
Copy link

Closing due to inactivity. If you feel this was a mistake or you wish to re-open at any time in the future, please leave a comment and it will be re-surfaced for the maintainers to review.

@github-actions github-actions bot closed this Nov 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.15 This release series is no longer active on CE. Use backport/ent/1.15. backport/1.16 This release series is no longer active on CE. Use backport/ent/1.16. meta/stale Automatically flagged for inactivity by stalebot pr/no-changelog PR does not need a corresponding .changelog entry theme/cli Flags and documentation for the CLI interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant