Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically detect and track flaky tests #11578

Merged
merged 4 commits into from Jun 23, 2021

Conversation

jasonrudolph
Copy link

tl;dr This pull request proposes configuring tooling to automatically detect and track flaky tests. As I understand it, this has historically required manual effort for Homebrew maintainers. By automating the process, we hopefully reduce the time that maintainers have to spend on the thankless and laborious task of manually identifying flaky tests. 馃

/cc @MikeMcQuaid for review

Pull request template

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
    No. I don't think this is applicable to this pull request, but if it is, please let me know. 馃檱
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

How it works

At a high-level, setting things up works as follows:

  1. Enhance the test suite to emit JUnit XML test reports.

    This is accomplished in e163eb8. Please see the commit message for details.

  2. Enhance the CI workflow to send those test reports to buildpulse.io for analysis.

    This is accomplished in 006299d.

  3. Maintainers can access an always-up-to-date list of flaky tests, along with stats on the disruptiveness of each test, how recently it has occurred, etc. Maintainers can also choose to receive daily digests in Slack and weekly digests via email, so that they don't have to remember to check yet another tool. 馃槄

A few things to have in mind

  • 馃悾 To date, BuildPulse has been used primarily by teams working in private repositories. Homebrew/brew would be the first significant open source project using BuildPulse, so we'll probably discover some speedbumps due to the differences in workflows between open source projects versus private projects. I hope you'll bring those speedbumps to my attention (jason@buildpulse.io), because I'd love to find a good way to address them.

  • 馃嵈 For GitHub Actions, workflows currently need to use GitHub Actions Secrets in order to authenticate when sending test results to BuildPulse. Since forks can't access secrets, this means that forks can't yet send test results to BuildPulse for analysis. Other tools (like CodeCov) have a solution for this kind of thing, and I hope to add support for forks in BuildPulse in the future as well.

  • 馃崕 This pull request enables flaky test detection exclusively for the macOS CI job.

    CI also runs tests in three other contexts: "tests (no-compatibility mode)", "tests (generic OS)", "tests (Linux)". Before enabling BuildPulse for the other contexts, BuildPulse will need to evolve to accept additional metadata to let you distinguish which context reported a given test result.

    Right now, the XML reports produced by rspec_junit_reporter don't distinguish one context's results from another context's results. For reliably detecting flaky tests, if we're sending BuildPulse results for multiple contexts, it will be important to distinguish which context the test is running in.

    For example, let's assume that a given test (x) runs in both the "tests (Linux)" context and the "test everything (macOS)" context. And let's say we're running the build multiple times at commit 0abcdef. If test x always passes in the macOS job, and always fails in the Linux job, the test isn't flaky; it's just broken on Linux. 馃槵

    But if we don't know the context associated with each test result, all we'd see is that test x sometimes passes and sometimes fails at commit 0abcdef, and we'd wrongly conclude that the test is flaky. 馃檲

    So, for now, we need to pick one context to use for sending test results to BuildPulse. @MikeMcQuaid recommended using the macOS job, since it runs the largest number of tests (e.g. the brew cask ones).

Verifying these changes

Since this pull request makes changes to the macOS CI job, and since that job doesn't run in forks, we can't immediately see the effect of these changes. To work around that limitation (so that reviewers can see the effect of these changes), 1888739 temporarily updates the macOS job to run in my fork. We can see that the job succeeds (https://github.com/jasonrudolph/brew/runs/2877954175) and that test results were successfully sent to BuildPulse. [screenshot]

In preparation for detecting flaky tests with BuildPulse, this commit
sets up the rspec_junit_formatter gem to output JUnit XML reports of the
test suite, which is the format used by BuildPulse and various other
tooling that interprets test results.

Because the test suite uses the parallel_tests gem, this commit
incorporates some related changes to make all the parallel_tests gem and
the rspec_junit_formatter gem to cooperate with each other.

rspec_junit_formatter writes everything to a single XML file. That works
fine when there's only one process writing to the file. By default,
whatever process finishes last will write to the file and clobber the
output of all the other processes that wrote to the file. 馃檲

To prevent this issue, the parallel_tests wiki recommends adding a
`.rspec_parallel` file to specify its RSpec options
(https://github.com/grosser/parallel_tests/wiki#with-rspec_junit_formatter----by-jgarber),
then the project can specify different files for each process to write
to like so:

  --format RspecJunitFormatter
  --out tmp/rspec<%= ENV['TEST_ENV_NUMBER'] %>.xml

However, prior to this commit, the Homebrew/brew test suite specified
its RSpec options via the command line. Unfortunately though, there's no
way (AFAICT) to set the equivalent of these options via the command
line:

  --format RspecJunitFormatter
  --out tmp/rspec<%= ENV['TEST_ENV_NUMBER'] %>.xml

So, we need to use a `.rspec_parallel` file to specify these options 鈽濓笍.

However, it appears that RSpec allows you to specify formatters _either_
in an options file (like `.rspec_parallel`) _or_ via command-line args.
But if you specify any formatters via command-line args, then all
formatters in the options file are ignored.  (I suspect that's somehow
related to this bit of code in rspec-core:
https://github.com/rspec/rspec-core/blob/v3.10.0/lib/rspec/core/configuration_options.rb#L64.)

With that in mind, in order to have the RspecJunitFormatter configured
 in `.rspec_parallel`, we need to move the other formatters into
 `.rpsec_parallel` as well, instead of passing them as command-line
 args. Therefore, this commit moves all the formatters into a
 `.rspec_parallel` file.
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes all seem good/reasonable to me and I'd love to see us use BuildPulse for this. I think we want a way to guard this to only PRs from Not Forks and master branch builds.

@Homebrew/brew any thoughts or objections on using BuildPulse for this?
@Homebrew/core I think this might be a good tool for us to eventually keep track of flaky brew audit and brew test runs for formulae, too.

@@ -320,3 +320,13 @@ jobs:
- run: brew test-bot --only-formulae --test-default-formula

- uses: codecov/codecov-action@29386c70ef20e286228c72b668a06fd0e8399192

- name: Upload test results to BuildPulse for flaky test detection
if: '!cancelled()' # Run this step even when the tests fail. Skip if the workflow is cancelled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if: '!cancelled()' # Run this step even when the tests fail. Skip if the workflow is cancelled.
# Only run this step for PRs where where we have access to secrets.
# Run this step even when the tests fail. Skip if the workflow is cancelled.
if: github.event.pull_request.head.repo.full_name == github.repository && !cancelled()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this conditional prevented this step also from running on the master branch (https://github.com/Homebrew/brew/runs/2893711700?check_suite_focus=true), presumably because there was no pull request present for that build. I think we want this step to always run for builds on the master branch, so I'll open a pull request to adjust this conditional.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be great, thanks @jasonrudolph!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I pushed up #11586 with the proposed change.

@carlocab
Copy link
Member

These changes all seem good/reasonable to me and I'd love to see us use BuildPulse for this. I think we want a way to guard this to only PRs from Not Forks and master branch builds.

Agreed. I don't think we have the bandwidth to make use of fork data anyway, even if there were was information there that we can't get from CI runs here.

@Homebrew/brew any thoughts or objections on using BuildPulse for this?

鉂わ笍

@Homebrew/core I think this might be a good tool for us to eventually keep track of flaky brew audit and brew test runs for formulae, too.

鉂わ笍 for this too. We'll probably have to teach test-bot to produce test output in a format BuildPulse can consume.

Co-authored-by: Mike McQuaid <mike@mikemcquaid.com>
carlocab added a commit to carlocab/homebrew-core that referenced this pull request Jun 22, 2021
This formula fails the audit, but we may have need of it in
Homebrew/brew (cf. Homebrew/brew#11578), so I think this will still be
useful to add to homebrew-core.
BrewTestBot pushed a commit to Homebrew/homebrew-core that referenced this pull request Jun 23, 2021
This formula fails the audit, but we may have need of it in
Homebrew/brew (cf. Homebrew/brew#11578), so I think this will still be
useful to add to homebrew-core.
@MikeMcQuaid MikeMcQuaid merged commit 41803eb into Homebrew:master Jun 23, 2021
@MikeMcQuaid MikeMcQuaid deleted the flaky-test-detection branch June 23, 2021 10:06
@MikeMcQuaid
Copy link
Member

Thanks so much for your first contribution! Without people like you submitting PRs we couldn't run this project. You rock, @jasonrudolph!

@bayandin
Copy link
Member

bayandin commented Jun 23, 2021

@Homebrew/core I think this might be a good tool for us to eventually keep track of flaky brew audit and brew test runs for formulae, too.

鉂わ笍 for this too. We'll probably have to teach test-bot to produce test output in a format BuildPulse can consume.

That would be great!
I guess we'll need to bring back junit output for homebrew-test-bot (was removed Homebrew/homebrew-test-bot#355)

@MikeMcQuaid
Copy link
Member

I guess we'll need to bring back junit output for homebrew-test-bot (was removed Homebrew/homebrew-test-bot#355)

@bayandin Yes! It's relatively self-contained so shouldn't be too tricky. Is this something you might be able to pick up?

@bayandin
Copy link
Member

I guess we'll need to bring back junit output for homebrew-test-bot (was removed Homebrew/homebrew-test-bot#355)

@bayandin Yes! It's relatively self-contained so shouldn't be too tricky. Is this something you might be able to pick up?

Will do!

@MikeMcQuaid
Copy link
Member

@bayandin 馃槏 I'd consider limiting the output just to e.g. audit and test for now (or anything else that's flaky, maybe install?).

@bayandin
Copy link
Member

@bayandin 馃槏 I'd consider limiting the output just to e.g. audit and test for now (or anything else that's flaky, maybe install?).

Yep, I think I'll start with audit and test and then it should be pretty easy to adjust (at least I hope it will be so)

@github-actions github-actions bot added the outdated PR was locked due to age label Jul 24, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
outdated PR was locked due to age
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants