Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measure flakiness of new tests #3541

Open
GeoffreyBooth opened this issue Oct 25, 2023 · 4 comments
Open

Measure flakiness of new tests #3541

GeoffreyBooth opened this issue Oct 25, 2023 · 4 comments

Comments

@GeoffreyBooth
Copy link
Member

As discussed in nodejs/TSC#1457, could we somehow have a way for CI to measure the flakiness of new tests before they land? Something like:

  • For every PR, identify tests that are added by the PR (probably tests that run in the PR’s branch that didn’t run for main).
  • Run measure-flakiness on them.
  • Fail CI unless the new tests pass the flakiness cutoff, on all platforms.

This obviously won’t help for existing flaky tests, but I would think that it should prevent most new flaky tests from landing on main; and it would highly motivate contributors to improve their tests, because their PRs would be blocked from landing until they did so.

It also wouldn’t help if a test becomes flaky because of changes to the API that it tests after the test has landed. But still, I think this is better than the status quo.

Related: #3056 cc @nodejs/tsc

@RafaelGSS
Copy link
Member

I think there are two kinds of flakiness.

  1. When the test relies on the environment, for instance, writing to disk. In this case, the flakiness comes when the machine doesn't satisfy the requirements of the test - in this case, the disk is full.
  2. When the test relies on timers or things that can suffer TOCTOU issues.

While I believe we can measure flakiness on the second option, the first one should be very hard to reproduce. AFAIK most of our flaky tests are related to the first type of flakiness.

@GeoffreyBooth
Copy link
Member Author

AFAIK most of our flaky tests are related to the first type of flakiness.

Really? So then there are lots of tests marked as flaky where there’s nothing wrong with the test itself, it just happened to get marked flaky during a rough patch in the life of the machine running our test suite?

If so, wouldn’t the solution there be to improve the environment itself? Like rather than long-running machines that need rebooting and so on, run our tests within Docker containers or EC2 instances that are created just for each run and then discarded. Or at least have some kind of automatic maintenance on the machines, like automatically restart them every few hours or automatically clear their disk space after each run, something along those lines.

@RafaelGSS
Copy link
Member

Well, I haven't looked at the tests marked as flaky (some of them are quite old), I'm speaking based on my experience handling some PRs, and that might not be an assertive statement.

If so, wouldn’t the solution there be to improve the environment itself?

Possibly. As a non-build team member, I may lack some context, but I assume it will require upgrading machines and increasing nodes - both come with a cost and need someone to champion.

I will wait for someone from the build team to jump in and correct me if I'm wrong.

@GeoffreyBooth
Copy link
Member Author

I will wait for someone from the build team to jump in and correct me if I’m wrong.

I would love if you’re right: it’s much easier to just increase machines’ capacity than it is to refactor tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants