New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatically detect and track flaky tests #11578
Automatically detect and track flaky tests #11578
Conversation
In preparation for detecting flaky tests with BuildPulse, this commit sets up the rspec_junit_formatter gem to output JUnit XML reports of the test suite, which is the format used by BuildPulse and various other tooling that interprets test results. Because the test suite uses the parallel_tests gem, this commit incorporates some related changes to make all the parallel_tests gem and the rspec_junit_formatter gem to cooperate with each other. rspec_junit_formatter writes everything to a single XML file. That works fine when there's only one process writing to the file. By default, whatever process finishes last will write to the file and clobber the output of all the other processes that wrote to the file. 馃檲 To prevent this issue, the parallel_tests wiki recommends adding a `.rspec_parallel` file to specify its RSpec options (https://github.com/grosser/parallel_tests/wiki#with-rspec_junit_formatter----by-jgarber), then the project can specify different files for each process to write to like so: --format RspecJunitFormatter --out tmp/rspec<%= ENV['TEST_ENV_NUMBER'] %>.xml However, prior to this commit, the Homebrew/brew test suite specified its RSpec options via the command line. Unfortunately though, there's no way (AFAICT) to set the equivalent of these options via the command line: --format RspecJunitFormatter --out tmp/rspec<%= ENV['TEST_ENV_NUMBER'] %>.xml So, we need to use a `.rspec_parallel` file to specify these options 鈽濓笍. However, it appears that RSpec allows you to specify formatters _either_ in an options file (like `.rspec_parallel`) _or_ via command-line args. But if you specify any formatters via command-line args, then all formatters in the options file are ignored. (I suspect that's somehow related to this bit of code in rspec-core: https://github.com/rspec/rspec-core/blob/v3.10.0/lib/rspec/core/configuration_options.rb#L64.) With that in mind, in order to have the RspecJunitFormatter configured in `.rspec_parallel`, we need to move the other formatters into `.rpsec_parallel` as well, instead of passing them as command-line args. Therefore, this commit moves all the formatters into a `.rspec_parallel` file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes all seem good/reasonable to me and I'd love to see us use BuildPulse for this. I think we want a way to guard this to only PRs from Not Forks and master
branch builds.
@Homebrew/brew any thoughts or objections on using BuildPulse for this?
@Homebrew/core I think this might be a good tool for us to eventually keep track of flaky brew audit
and brew test
runs for formulae, too.
.github/workflows/tests.yml
Outdated
@@ -320,3 +320,13 @@ jobs: | |||
- run: brew test-bot --only-formulae --test-default-formula | |||
|
|||
- uses: codecov/codecov-action@29386c70ef20e286228c72b668a06fd0e8399192 | |||
|
|||
- name: Upload test results to BuildPulse for flaky test detection | |||
if: '!cancelled()' # Run this step even when the tests fail. Skip if the workflow is cancelled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if: '!cancelled()' # Run this step even when the tests fail. Skip if the workflow is cancelled. | |
# Only run this step for PRs where where we have access to secrets. | |
# Run this step even when the tests fail. Skip if the workflow is cancelled. | |
if: github.event.pull_request.head.repo.full_name == github.repository && !cancelled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this conditional prevented this step also from running on the master
branch (https://github.com/Homebrew/brew/runs/2893711700?check_suite_focus=true), presumably because there was no pull request present for that build. I think we want this step to always run for builds on the master branch, so I'll open a pull request to adjust this conditional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'd be great, thanks @jasonrudolph!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I pushed up #11586 with the proposed change.
Agreed. I don't think we have the bandwidth to make use of fork data anyway, even if there were was information there that we can't get from CI runs here.
鉂わ笍
鉂わ笍 for this too. We'll probably have to teach |
Co-authored-by: Mike McQuaid <mike@mikemcquaid.com>
This formula fails the audit, but we may have need of it in Homebrew/brew (cf. Homebrew/brew#11578), so I think this will still be useful to add to homebrew-core.
This formula fails the audit, but we may have need of it in Homebrew/brew (cf. Homebrew/brew#11578), so I think this will still be useful to add to homebrew-core.
Thanks so much for your first contribution! Without people like you submitting PRs we couldn't run this project. You rock, @jasonrudolph! |
That would be great! |
@bayandin Yes! It's relatively self-contained so shouldn't be too tricky. Is this something you might be able to pick up? |
Will do! |
@bayandin 馃槏 I'd consider limiting the output just to e.g. |
Yep, I think I'll start with |
tl;dr This pull request proposes configuring tooling to automatically detect and track flaky tests. As I understand it, this has historically required manual effort for Homebrew maintainers. By automating the process, we hopefully reduce the time that maintainers have to spend on the thankless and laborious task of manually identifying flaky tests. 馃
/cc @MikeMcQuaid for review
Pull request template
No. I don't think this is applicable to this pull request, but if it is, please let me know. 馃檱
brew style
with your changes locally?brew typecheck
with your changes locally?brew tests
with your changes locally?How it works
At a high-level, setting things up works as follows:
Enhance the test suite to emit JUnit XML test reports.
This is accomplished in e163eb8. Please see the commit message for details.
Enhance the CI workflow to send those test reports to buildpulse.io for analysis.
This is accomplished in 006299d.
Maintainers can access an always-up-to-date list of flaky tests, along with stats on the disruptiveness of each test, how recently it has occurred, etc. Maintainers can also choose to receive daily digests in Slack and weekly digests via email, so that they don't have to remember to check yet another tool. 馃槄
A few things to have in mind
馃悾 To date, BuildPulse has been used primarily by teams working in private repositories. Homebrew/brew would be the first significant open source project using BuildPulse, so we'll probably discover some speedbumps due to the differences in workflows between open source projects versus private projects. I hope you'll bring those speedbumps to my attention (jason@buildpulse.io), because I'd love to find a good way to address them.
馃嵈 For GitHub Actions, workflows currently need to use GitHub Actions Secrets in order to authenticate when sending test results to BuildPulse. Since forks can't access secrets, this means that forks can't yet send test results to BuildPulse for analysis. Other tools (like CodeCov) have a solution for this kind of thing, and I hope to add support for forks in BuildPulse in the future as well.
馃崕 This pull request enables flaky test detection exclusively for the macOS CI job.
CI also runs tests in three other contexts: "tests (no-compatibility mode)", "tests (generic OS)", "tests (Linux)". Before enabling BuildPulse for the other contexts, BuildPulse will need to evolve to accept additional metadata to let you distinguish which context reported a given test result.
Right now, the XML reports produced by
rspec_junit_reporter
don't distinguish one context's results from another context's results. For reliably detecting flaky tests, if we're sending BuildPulse results for multiple contexts, it will be important to distinguish which context the test is running in.For example, let's assume that a given test (x) runs in both the "tests (Linux)" context and the "test everything (macOS)" context. And let's say we're running the build multiple times at commit
0abcdef
. If test x always passes in the macOS job, and always fails in the Linux job, the test isn't flaky; it's just broken on Linux. 馃槵But if we don't know the context associated with each test result, all we'd see is that test x sometimes passes and sometimes fails at commit
0abcdef
, and we'd wrongly conclude that the test is flaky. 馃檲So, for now, we need to pick one context to use for sending test results to BuildPulse. @MikeMcQuaid recommended using the macOS job, since it runs the largest number of tests (e.g. the
brew cask
ones).Verifying these changes
Since this pull request makes changes to the macOS CI job, and since that job doesn't run in forks, we can't immediately see the effect of these changes. To work around that limitation (so that reviewers can see the effect of these changes), 1888739 temporarily updates the macOS job to run in my fork. We can see that the job succeeds (https://github.com/jasonrudolph/brew/runs/2877954175) and that test results were successfully sent to BuildPulse. [screenshot]