Automatically detect and track flaky tests #11578

jasonrudolph · 2021-06-22T11:13:33Z

tl;dr This pull request proposes configuring tooling to automatically detect and track flaky tests. As I understand it, this has historically required manual effort for Homebrew maintainers. By automating the process, we hopefully reduce the time that maintainers have to spend on the thankless and laborious task of manually identifying flaky tests. 🤞

/cc @MikeMcQuaid for review

Pull request template

Have you followed the guidelines in our Contributing document?
Have you checked to ensure there aren't other open Pull Requests for the same change?
Have you added an explanation of what your changes do and why you'd like us to include them?
Have you written new tests for your changes? Here's an example.
No. I don't think this is applicable to this pull request, but if it is, please let me know. 🙇
Have you successfully run brew style with your changes locally?
Have you successfully run brew typecheck with your changes locally?
Have you successfully run brew tests with your changes locally?

How it works

At a high-level, setting things up works as follows:

Enhance the test suite to emit JUnit XML test reports.

This is accomplished in e163eb8. Please see the commit message for details.
Enhance the CI workflow to send those test reports to buildpulse.io for analysis.

This is accomplished in 006299d.
Maintainers can access an always-up-to-date list of flaky tests, along with stats on the disruptiveness of each test, how recently it has occurred, etc. Maintainers can also choose to receive daily digests in Slack and weekly digests via email, so that they don't have to remember to check yet another tool. 😅

A few things to have in mind

🐣 To date, BuildPulse has been used primarily by teams working in private repositories. Homebrew/brew would be the first significant open source project using BuildPulse, so we'll probably discover some speedbumps due to the differences in workflows between open source projects versus private projects. I hope you'll bring those speedbumps to my attention (jason@buildpulse.io), because I'd love to find a good way to address them.
🍴 For GitHub Actions, workflows currently need to use GitHub Actions Secrets in order to authenticate when sending test results to BuildPulse. Since forks can't access secrets, this means that forks can't yet send test results to BuildPulse for analysis. Other tools (like CodeCov) have a solution for this kind of thing, and I hope to add support for forks in BuildPulse in the future as well.
🍎 This pull request enables flaky test detection exclusively for the macOS CI job.

CI also runs tests in three other contexts: "tests (no-compatibility mode)", "tests (generic OS)", "tests (Linux)". Before enabling BuildPulse for the other contexts, BuildPulse will need to evolve to accept additional metadata to let you distinguish which context reported a given test result.

Right now, the XML reports produced by rspec_junit_reporter don't distinguish one context's results from another context's results. For reliably detecting flaky tests, if we're sending BuildPulse results for multiple contexts, it will be important to distinguish which context the test is running in.

For example, let's assume that a given test (x) runs in both the "tests (Linux)" context and the "test everything (macOS)" context. And let's say we're running the build multiple times at commit 0abcdef. If test x always passes in the macOS job, and always fails in the Linux job, the test isn't flaky; it's just broken on Linux. 😬

But if we don't know the context associated with each test result, all we'd see is that test x sometimes passes and sometimes fails at commit 0abcdef, and we'd wrongly conclude that the test is flaky. 🙈

So, for now, we need to pick one context to use for sending test results to BuildPulse. @MikeMcQuaid recommended using the macOS job, since it runs the largest number of tests (e.g. the brew cask ones).

Verifying these changes

Since this pull request makes changes to the macOS CI job, and since that job doesn't run in forks, we can't immediately see the effect of these changes. To work around that limitation (so that reviewers can see the effect of these changes), 1888739 temporarily updates the macOS job to run in my fork. We can see that the job succeeds (https://github.com/jasonrudolph/brew/runs/2877954175) and that test results were successfully sent to BuildPulse. [screenshot]

In preparation for detecting flaky tests with BuildPulse, this commit sets up the rspec_junit_formatter gem to output JUnit XML reports of the test suite, which is the format used by BuildPulse and various other tooling that interprets test results. Because the test suite uses the parallel_tests gem, this commit incorporates some related changes to make all the parallel_tests gem and the rspec_junit_formatter gem to cooperate with each other. rspec_junit_formatter writes everything to a single XML file. That works fine when there's only one process writing to the file. By default, whatever process finishes last will write to the file and clobber the output of all the other processes that wrote to the file. 🙈 To prevent this issue, the parallel_tests wiki recommends adding a `.rspec_parallel` file to specify its RSpec options (https://github.com/grosser/parallel_tests/wiki#with-rspec_junit_formatter----by-jgarber), then the project can specify different files for each process to write to like so: --format RspecJunitFormatter --out tmp/rspec<%= ENV['TEST_ENV_NUMBER'] %>.xml However, prior to this commit, the Homebrew/brew test suite specified its RSpec options via the command line. Unfortunately though, there's no way (AFAICT) to set the equivalent of these options via the command line: --format RspecJunitFormatter --out tmp/rspec<%= ENV['TEST_ENV_NUMBER'] %>.xml So, we need to use a `.rspec_parallel` file to specify these options ☝️. However, it appears that RSpec allows you to specify formatters _either_ in an options file (like `.rspec_parallel`) _or_ via command-line args. But if you specify any formatters via command-line args, then all formatters in the options file are ignored. (I suspect that's somehow related to this bit of code in rspec-core: https://github.com/rspec/rspec-core/blob/v3.10.0/lib/rspec/core/configuration_options.rb#L64.) With that in mind, in order to have the RspecJunitFormatter configured in `.rspec_parallel`, we need to move the other formatters into `.rpsec_parallel` as well, instead of passing them as command-line args. Therefore, this commit moves all the formatters into a `.rspec_parallel` file.

MikeMcQuaid

These changes all seem good/reasonable to me and I'd love to see us use BuildPulse for this. I think we want a way to guard this to only PRs from Not Forks and master branch builds.

@Homebrew/brew any thoughts or objections on using BuildPulse for this?
@Homebrew/core I think this might be a good tool for us to eventually keep track of flaky brew audit and brew test runs for formulae, too.

MikeMcQuaid · 2021-06-22T13:57:23Z

.github/workflows/tests.yml

@@ -320,3 +320,13 @@ jobs:
      - run: brew test-bot --only-formulae --test-default-formula

      - uses: codecov/codecov-action@29386c70ef20e286228c72b668a06fd0e8399192
+
+      - name: Upload test results to BuildPulse for flaky test detection
+        if: '!cancelled()' # Run this step even when the tests fail. Skip if the workflow is cancelled.


Suggested change

if: '!cancelled()' # Run this step even when the tests fail. Skip if the workflow is cancelled.

# Only run this step for PRs where where we have access to secrets.

# Run this step even when the tests fail. Skip if the workflow is cancelled.

if: github.event.pull_request.head.repo.full_name == github.repository && !cancelled()

It looks like this conditional prevented this step also from running on the master branch (https://github.com/Homebrew/brew/runs/2893711700?check_suite_focus=true), presumably because there was no pull request present for that build. I think we want this step to always run for builds on the master branch, so I'll open a pull request to adjust this conditional.

That'd be great, thanks @jasonrudolph!

Cool. I pushed up #11586 with the proposed change.

carlocab · 2021-06-22T14:17:56Z

These changes all seem good/reasonable to me and I'd love to see us use BuildPulse for this. I think we want a way to guard this to only PRs from Not Forks and master branch builds.

Agreed. I don't think we have the bandwidth to make use of fork data anyway, even if there were was information there that we can't get from CI runs here.

@Homebrew/brew any thoughts or objections on using BuildPulse for this?

❤️

@Homebrew/core I think this might be a good tool for us to eventually keep track of flaky brew audit and brew test runs for formulae, too.

❤️ for this too. We'll probably have to teach test-bot to produce test output in a format BuildPulse can consume.

Co-authored-by: Mike McQuaid <mike@mikemcquaid.com>

.github/workflows/tests.yml

This formula fails the audit, but we may have need of it in Homebrew/brew (cf. Homebrew/brew#11578), so I think this will still be useful to add to homebrew-core.

.github/workflows/tests.yml

MikeMcQuaid · 2021-06-23T10:07:05Z

Thanks so much for your first contribution! Without people like you submitting PRs we couldn't run this project. You rock, @jasonrudolph!

bayandin · 2021-06-23T11:16:58Z

@Homebrew/core I think this might be a good tool for us to eventually keep track of flaky brew audit and brew test runs for formulae, too.

❤️ for this too. We'll probably have to teach test-bot to produce test output in a format BuildPulse can consume.

That would be great!
I guess we'll need to bring back junit output for homebrew-test-bot (was removed Homebrew/homebrew-test-bot#355)

MikeMcQuaid · 2021-06-23T11:20:28Z

I guess we'll need to bring back junit output for homebrew-test-bot (was removed Homebrew/homebrew-test-bot#355)

@bayandin Yes! It's relatively self-contained so shouldn't be too tricky. Is this something you might be able to pick up?

bayandin · 2021-06-23T11:21:45Z

I guess we'll need to bring back junit output for homebrew-test-bot (was removed Homebrew/homebrew-test-bot#355)

@bayandin Yes! It's relatively self-contained so shouldn't be too tricky. Is this something you might be able to pick up?

Will do!

MikeMcQuaid · 2021-06-23T11:24:51Z

@bayandin 😍 I'd consider limiting the output just to e.g. audit and test for now (or anything else that's flaky, maybe install?).

bayandin · 2021-06-23T12:35:22Z

@bayandin 😍 I'd consider limiting the output just to e.g. audit and test for now (or anything else that's flaky, maybe install?).

Yep, I think I'll start with audit and test and then it should be pretty easy to adjust (at least I hope it will be so)

jasonrudolph added 2 commits June 21, 2021 13:14

Update job to send macOS test results to BuildPulse for analysis

006299d

MikeMcQuaid reviewed Jun 22, 2021

View reviewed changes

Only send to BuildPulse when we have access to secrets

0eec1d5

Co-authored-by: Mike McQuaid <mike@mikemcquaid.com>

SMillerDev reviewed Jun 22, 2021

View reviewed changes

.github/workflows/tests.yml Outdated Show resolved Hide resolved

carlocab mentioned this pull request Jun 22, 2021

buildpulse-test-reporter 0.15.0 (new formula) Homebrew/homebrew-core#79806

Closed

6 tasks

MikeMcQuaid reviewed Jun 23, 2021

View reviewed changes

.github/workflows/tests.yml Outdated Show resolved Hide resolved

workflows/tests: use buildpulse-test-reporter formula.

5379ff1

MikeMcQuaid approved these changes Jun 23, 2021

View reviewed changes

MikeMcQuaid enabled auto-merge June 23, 2021 09:38

MikeMcQuaid merged commit 41803eb into Homebrew:master Jun 23, 2021

MikeMcQuaid deleted the flaky-test-detection branch June 23, 2021 10:06

jasonrudolph mentioned this pull request Jun 23, 2021

Upload test results to BuildPulse for non-PR pushes #11586

Merged

6 tasks

bayandin mentioned this pull request Jun 24, 2021

Add JUnit output format Homebrew/homebrew-test-bot#621

Merged

github-actions bot added the outdated PR was locked due to age label Jul 24, 2021

github-actions bot locked as resolved and limited conversation to collaborators Jul 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically detect and track flaky tests #11578

Automatically detect and track flaky tests #11578

jasonrudolph commented Jun 22, 2021

MikeMcQuaid left a comment

MikeMcQuaid Jun 22, 2021

jasonrudolph Jun 23, 2021

MikeMcQuaid Jun 23, 2021

jasonrudolph Jun 23, 2021

carlocab commented Jun 22, 2021

MikeMcQuaid commented Jun 23, 2021

bayandin commented Jun 23, 2021 •

edited

MikeMcQuaid commented Jun 23, 2021

bayandin commented Jun 23, 2021

MikeMcQuaid commented Jun 23, 2021

bayandin commented Jun 23, 2021

Automatically detect and track flaky tests #11578

Automatically detect and track flaky tests #11578

Conversation

jasonrudolph commented Jun 22, 2021

Pull request template

How it works

A few things to have in mind

Verifying these changes

MikeMcQuaid left a comment

Choose a reason for hiding this comment

MikeMcQuaid Jun 22, 2021

Choose a reason for hiding this comment

jasonrudolph Jun 23, 2021

Choose a reason for hiding this comment

MikeMcQuaid Jun 23, 2021

Choose a reason for hiding this comment

jasonrudolph Jun 23, 2021

Choose a reason for hiding this comment

carlocab commented Jun 22, 2021

MikeMcQuaid commented Jun 23, 2021

bayandin commented Jun 23, 2021 • edited

MikeMcQuaid commented Jun 23, 2021

bayandin commented Jun 23, 2021

MikeMcQuaid commented Jun 23, 2021

bayandin commented Jun 23, 2021

bayandin commented Jun 23, 2021 •

edited