Clone this wiki locally
Watchflakes is a program that triages apparent test flakes on the build.golang.org dashboards.
An apparent test flake is a failure that:
- is not on a completely failing builder.
- is not on an excluded builder.
- is not running a commit that failed on 4 or more builders.
- is not part of a run of 4 or more failing commits on its builder.
Watchflakes posts every apparent test flake to an issue in the Test Flakes project.
Every issue description in the Test Flakes project starts with a pattern for the failures relevant to that issue: For example, the markdown for #55260's description starts with:
``` #!watchflakes post <- pkg == "cmd/go" && test == "" && `unexpected files left in tmpdir` ```
Watchflakes matches every apparent test flake against the patterns in the issues:
- If a flake matches a pattern in an issue, it is posted to that issue.
- If a flake matches a pattern in multiple issues, it is posted to the lowest-numbered issue.
- If a flake does not match a pattern in any issue, watchflakes creates a new issue with a pattern matching the package and test case that failed.
The newly created issue's pattern is often too broad and should be edited to make it more specific to the actual failure. Sending a failure to the lowest-numbered matching issue ensures that creating a broad default pattern for a new failure does not “steal” failures from earlier issues, nor does it spam the new issue with unrelated failures in the same test that are already separately tracked.
Watchflakes places newly created issues in the Test Flakes project and adds the NeedsInvestigation label. These issues start out with no status (not Active, not Done). Issues with no status need to be inspected by a person, who should usually refine the pattern to capture the salient information about the failure. Issues that have been checked can then be moved to Active. GitHub automatically moves issues from Active to Done when they are closed.
Watchflakes considers issues of any status when matching a new failure. If it finds a new failure for a closed issue, it will post the failure and reopen the issue. So it is okay to close an issue when a fix lands, instead of having to wait a few weeks to see if the failure is really gone: if a new failure arrives, the issue will be reopened automatically.
Watchflakes maintains no state of its own: all the state is in the GitHub issues. Every time it runs, it considers the past 60 days of build dashboard failures and makes sure that every apparent flake is accounted for in the Test Flakes project. If a failure matching an issue has already been posted to that issue, watchflakes doesn't post it again, of course. And if an issue is edited to update its pattern to exclude certain failures, watchflakes doesn't remove its old posts, but it does look for a different matching issue for those failures, including possibly creating a new one.
The watchflakes stanza in each issue must appear at the top of the issue description.
It must be a code block (either fenced with
``` or indented), and the first line must be
to keep watchflakes from misinterpreted unrelated code blocks.
The rest of the block is a small watchflakes script. Comments to the end of the line are introduced with
The script is a sequence of rules, each of which has the form
action <- pattern
(send matches for pattern to the action).
The actions are:
postposts the failure to the issue in which the script appears.
skipignores the failure, throwing it on the floor. This action should be used only rarely (for example, to set policy like in #55166).
defaultis a lower-priority version of post. If an issue has a
skipmatching the failure, watchflakes does that instead. But if there are no other matches, watchflakes considers the
defaultpattern matches. (And then if there aren't any
defaultmatches, watchflakes creates a new issue.)
The input to the pattern is a record with named fields, each of which has a string value:
pkgis the full import path of the package that failed to build or that failed its test.
testis the name of the test function in the package that failed.
testdepending on whether this is a build failure or a test failure.
outputis the output from the failing test. This output stops just before the final
FAILline printed when the test binary exits. It does not include output from other test cases that also failed in the same run, nor any context that was printed by all.bash or the buildlet before the test started.
logis the entire failed build log.
snippetis the shortened form of
outputthat will be posted to the issue itself. Matches should almost always use
builderis the name of the builder that ran the test (like
repois the name of the repo being tested (
goosis the GOOS value (
goarchis the GOARCH value (
dateis the date of the commit being tested, in the form
2006-01-02T15:04:05. There is no date comparison logic; use string comparisons instead. Comparing dates should be used rarely.
sectionis the section of the build log in which the failure occurred. In all.bash output, the section is introduced by
#####, and each of the
Buildinglines during bootstrap is considered its own section as well. In subrepos, the
:: Runninglines each introduce a section named for the go command being run (for example
go test golang.org/x/tools/...).
Most patterns don't need to use
section. It is most helpful for tests in the main repo that rerun tests with an alternate execution environment.
The pattern is a boolean expression in a Go-like syntax allowing ||, &&, !, (, and ) for building complex expressions; ==, !=, <, <=, >, and >= for comparing fields against against string literals; and ~ and !~ for matching against regular expressions.
All string comparisons must have a field name on the left and a double-quoted string literal on the right, as in
builder == "linux-amd64-alpine" or `goos == "
All regular expression matches must have a field name on the left and a back-quoted string literal on the right, as in
builder ~ `corellium`.
A back-quoted string literal by itself is taken to be a comparison against the
which is appropriate for the vast majority of regular expressions in patterns.
Putting this all together, here are some example scripts.
#!watchflakes post <- pkg == "net/http" && test == "TestHandlerAbortRacesBodyRead"
This script in #55277 was created automatically by watchflakes in response
to a build run that failed in http.TestHandlerAbortRacesBodyRead.
The specific failure that prompted the issue creation was a timeout.
If more failures with different root cause were found in that test, it might become
appropriate to add
&& `panic: test timed out` or otherwise refine the pattern.
#!watchflakes post <- goos == "openbsd" && `unlinkat .*: operation not permitted`
This script in #49751 collects failures on openbsd caused by unexpected EPERM
errors from os.Remove calling unlinkat. These failures cause problems in a variety of tests,
so there is no condition on
#!watchflakes post <- pkg ~ `^cmd/go` && `appspot.com.*: 503`
This script in #54608 tracks network problems with 503 responses from appspot.com in any tests in the cmd/go/... package hierarchy, not just cmd/go itself.
#!watchflakes post <- goos == "windows" && (`dnsquery: DNS server failure` || `getaddrinfow: This is usually a temporary error`)
This script in #55165 matches specific DNS failures in any test on builders running Windows.
#!watchflakes post <- builder == "darwin-arm64-12" && pkg == "" && test == ""
This script in #55312 was created automatically by watchflakes to track failures on the darwin-arm64-12 builder that happen before a specific package test can run.
#!watchflakes # note: sometimes the URL is printed with one / default <- `(Get|read) "https://?(goproxy.io|proxy.golang.com.cn|goproxy.cn)`
This script in #55163 matches errors using certain non-standard Go proxies.
default to allow other issues to take ownership of more specific failures caused by these proxies.
Failures not matching other issues go to #55163 instead of creating new issues.
#!watchflakes default <- `: internal compiler error:`
This script in #55257 matches compiler failures in any build, no matter what package or repo is being tested.
default for the same reasons as the previous example: so that issues matching specific
compiler errors can still be filed, but failures not matching other issues are grouped into #55257
instead of creating new issues assigned to the specific test that happened to trigger the problem.