Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build/cmd/release{,bot}: include long tests in pre-release testing #29252

Open
bcmills opened this issue Dec 14, 2018 · 24 comments
Open

x/build/cmd/release{,bot}: include long tests in pre-release testing #29252

bcmills opened this issue Dec 14, 2018 · 24 comments

Comments

@bcmills
Copy link
Member

@bcmills bcmills commented Dec 14, 2018

I tested CL 154101 and the subsequent security patches using a combination of go test -run TestScript/[…] and all.bash. Unfortunately, significant parts of go get (including path-to-repository-resolution) are only exercised in non-short tests, and all.bash by default only runs the short tests, despite the name. (I remember that latter point occasionally — but apparently not frequently enough.)

Even more unfortunately, releasebot suggests all.bash for security releases as well, and release runs the same all.bash commands as the regular builders.

As a result, a significant regression (#29241) made it all the way through development, code review, and release building without running the existing tests that should have caught it.

We should ensure that the commands release executes and the instructions releasebot prints for both kinds of releases include the non-short tests on at least one platform.

(CC @bradfitz @dmitshur @FiloSottile)

@gopherbot gopherbot added this to the Unreleased milestone Dec 14, 2018
@gopherbot gopherbot added the Builders label Dec 14, 2018
@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Dec 14, 2018

Additionally: normally we also look at build.golang.org for a happy row of "ok" before doing a release, but because this was a security release without any of our normal infrastructure, we didn't have build.golang.org and didn't have the existing linux-amd64-longtest builder.

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Dec 14, 2018

The longtest builder is highlighted here and shows the breakage, which we would've been much more likely to see using the normal infrastructure:

screen shot 2018-12-14 at 6 03 12 am

@bcmills
Copy link
Member Author

@bcmills bcmills commented Dec 14, 2018

Yep, there were a lot of factors at play. This one seems like a relatively easy win, though: we don't cut all that many releases, so “run all of the tests one last time to be sure” seems like a reasonable step in the release process.

@bcmills
Copy link
Member Author

@bcmills bcmills commented Aug 21, 2019

@bcmills bcmills modified the milestones: Unreleased, Go1.14 Aug 21, 2019
@bcmills
Copy link
Member Author

@bcmills bcmills commented Aug 21, 2019

Marking release-blocker for 1.14. We really ought to be running all of the tests before we cut a release.

@toothrot
Copy link
Contributor

@toothrot toothrot commented Dec 11, 2019

@bcmills Would it be sufficient for release to query the dashboard/coordinator for longtest status? On one hand, I would love to run longtest at build time, but on the other hand I am hesitant to increase the duration of our release process significantly.

@bradfitz
Copy link
Contributor

@bradfitz bradfitz commented Dec 11, 2019

I also vote we just query the dashboard to gate releases. cmd/release is extra slow in all.bash mode as-is (without adding long tests) because it doesn't shard test execution over N machines. Adding long tests just makes a slow situation even worse.

Now that the build system has a scheduler, we can even tweak the scheduler to make sure that of release-branch HEADs are highest priority. (It might already be doing close to that as-is, actually)

We could even go a step further and have cmd/release not even run make.bash and instead just pick up the artifacts from the previous build (which are already known to be good artifacts if all the tests passed). But that's for another day. (Even further: run cmd/release on every release and then releasebot just downloads them)

@bcmills
Copy link
Member Author

@bcmills bcmills commented Dec 11, 2019

Querying the dashboard seems fine for regular releases, as long as we're checking the result for the actual commit that we're about to release.

I think that still leaves a testing gap for security releases, though.

@bcmills
Copy link
Member Author

@bcmills bcmills commented Dec 11, 2019

FWIW, it looks like we released 1.12.14 and 1.13.5 with failing longtest builds again. 😞

https://golang.org/cl/205438 and https://golang.org/cl/205439 need to be reviewed and merged before the next point releases.

@toothrot
Copy link
Contributor

@toothrot toothrot commented Jan 9, 2020

@bcmills As of today, this is still a manual step in our process. I've noticed another possible brittle test appear in the longtest failures for 1.12, and we'll look into addressing that.

We'll still do the effort to have our release automation query the branch status before tagging a release.

The build dashboards for 1.13 and 1.12 have some ports that are consistently failing. It seems like we should reconsider the validity of some ports based on their builder status.

@gopherbot
Copy link

@gopherbot gopherbot commented Jan 13, 2020

Change https://golang.org/cl/214433 mentions this issue: dashboard: upsize freebsd-amd64-race builder to 7.2 GB RAM

gopherbot pushed a commit to golang/build that referenced this issue Jan 14, 2020
Start using n1-highcpu-8 machine type instead of n1-highcpu-4
for the freebsd-amd64-race builder.

The freebsd-amd64-race builder has produced good test results
for the x/tools repo for a long time, but by now it has started
to consistently fail for reasons that seem connected to it having
only 3.6 GB memory. The Windows race builders needed to be bumped
from 7.2 GB to 14.4 GB to run successfully, so this change makes
a small incremental step to bring freebsd-amd64-race closer in
line with other builders. If memory-related problems continue to
occur with 7.2 GB, the next step will be to go up to 14.4 GB.

The freebsd-amd64-race builder is using an older version of FreeBSD.
We may want to start using a newer one for running tests with -race,
but that should be a separate change so we can see the results of
this change without another confounding variable.

Also update all FreeBSD builders to use https in buildletURLTmpl,
because it's expected to work fine and will be more consistent.

Updates golang/go#36444
Updates golang/go#34621
Updates golang/go#29252
Updates golang/go#33986

Change-Id: Idfcefd1c91bddc9f70ab23e02fcdca54fda9d1ac
Reviewed-on: https://go-review.googlesource.com/c/build/+/214433
Run-TryBot: Carlos Amedee <carlos@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
@cagedmantis cagedmantis self-assigned this Feb 7, 2020
@cagedmantis
Copy link
Contributor

@cagedmantis cagedmantis commented Feb 7, 2020

Assigned the issue to myself since I will be tracking progress.

@cagedmantis
Copy link
Contributor

@cagedmantis cagedmantis commented Feb 25, 2020

Moving this to the next major milestone. The long tests have been run manually.

@cagedmantis cagedmantis modified the milestones: Go1.14, Go1.15 Feb 25, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented Apr 10, 2020

Change https://golang.org/cl/227859 mentions this issue: dashboard: upsize freebsd-amd64-race builder to 16 vCPUs, 14.4 GB mem

@gopherbot
Copy link

@gopherbot gopherbot commented Apr 10, 2020

Change https://golang.org/cl/227859 mentions this issue: dashboard: upsize freebsd-amd64-race builder to 16 vCPUs, 14.4 GB RAM

gopherbot pushed a commit to golang/build that referenced this issue Apr 10, 2020
This is a followup to CL 214433. Start using n1-highcpu-16 machine type
instead of n1-highcpu-8 for the freebsd-amd64-race builder.

Increasing the RAM from 3.6 GB to 7.2 GB has helped golang/go#36444
significantly: the builder stopped failing consistently on x/tools
and resulted in many data races being uncovered in golang/go#36605.

However, by now, it has started to fail consistently again. This
time it seems to be due to low performance, causing the tests in
golang.org/x/tools/internal/lsp/regtest package to fail due with
"context deadline exceeded" errors.

FreeBSD is one of the ports that stays visible when "show only first-
class ports" is checked on build.golang.org. The other -race builders
have all been upgraded to the n1-highcpu-16 machine type by now out
of necessity.

It seems fair to provide the FreeBSD port with an equal amount of
resources, even if the increased memory isn't strictly required yet.
Once this change is applied, if the failures persist, we can be more
confident that the problem is due to the code or the port, rather
than due to this -race builder having 2𝗑 less CPU and RAM resources
compared to other -race builders.

An alternative is to increase timeout for this builder type, but I'm
opting to defer exploring that after equalizing the machine type.

For golang/go#36444.
For golang/go#34621.
For golang/go#29252.
For golang/go#33986.

Change-Id: I41f149365128c7bc6f576c778ac07618acc04612
Reviewed-on: https://go-review.googlesource.com/c/build/+/227859
Reviewed-by: Alexander Rakoczy <alex@golang.org>
@dmitshur dmitshur self-assigned this May 12, 2020
@gopherbot
Copy link

@gopherbot gopherbot commented May 13, 2020

Change https://golang.org/cl/233898 mentions this issue: cmd/dist: use GO_TEST_SHORT value more consistently

@gopherbot
Copy link

@gopherbot gopherbot commented May 13, 2020

Change https://golang.org/cl/233901 mentions this issue: dashboard: use more consistent definition of longtest builder

@dmitshur
Copy link
Member

@dmitshur dmitshur commented May 18, 2020

I've been looking into this issue the last few days. We've made progress in various areas that have made it more viable to resolve it now. For instance, work was done to apply fixes to release-branch.go1.13, and the windows-amd64-longtest builder is passing on it now. The addition of SlowBots made it possible to request longtest builders as part of pre-submit trybot testing.

I explored options such as: querying test results from the dashboard, using SlowBots, modifying the tests to make them long instead of short.

The first two don't work for security releases, and the third proved tricky due to #39054.

I ended up iterating to a fourth option, which is an adjustment to the third. The idea is to re-use longtest builders for the purpose running tests only. This has favorable properties including working for security releases, and being closer to what build.golang.org reports. It's a purely additive change: it keeps all the existing (short) tests we have unmodified. Finally, it doesn't increase time to prepare a release significantly, as the longtest builders use more powerful VMs, and will run in parallel with the other builds. This approach makes it easier to also add race builders in future changes.

During this investigation, I noticed it's important to try to reuse as much existing infrastructure and configuration needed to run long tests as possible, so that seeing passing results in post-submit builders at build.golang.org translates to the same outcome during a release. This is especially important now, since we don't have nightly releases, so any deviation from post-submit builders means it'll be caught on release day rather than in advance. This is something we're planning to improve on in future release automation work (/cc @toothrot).

gopherbot pushed a commit to golang/build that referenced this issue May 18, 2020
Previously, it was possible to define a builder whose IsLongTest method
would report positively, such that it'd test golang.org/x repos in long
test mode, but the main Go repository in short test mode. This would be
the case for any builder with "-longtest" suffix if it did not manually
include GO_TEST_SHORT=0 in its environment configuration.

It's unlikely we would want to do that intentionally, so this refactor
makes such misconfiguration less likely by inserting the GO_TEST_SHORT
environment variable assignment into the output from Env automatically.

Now, a longtest builder is defined in one consistent way: it must have
a "-longtest" suffix, so that the IsLongTest method reports positively.

For golang/go#39054.
For golang/go#29252.
For golang/go#12508.

Change-Id: Ic24df7b3b7de7db728bba6dc6ad4dd38a2e61e82
Reviewed-on: https://go-review.googlesource.com/c/build/+/233901
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
@gopherbot
Copy link

@gopherbot gopherbot commented May 19, 2020

Change https://golang.org/cl/234531 mentions this issue: cmd/releasebot, cmd/release: include long tests in release process

@dmitshur
Copy link
Member

@dmitshur dmitshur commented May 21, 2020

CL 234531 adds automatic long tests to the release process. Once submitted, releasebot -mode=prepare will run tests on the linux-amd64-longtest and windows-amd64-longtest builders, and it will not silently proceed if there is a regression caught by a long test.

While it increases safety, we want to be careful not to introduce a situation where a release cannot be made because long tests are not passing on the day of the release. Also, long tests are more likely to be flaky, so there can be issues with false positives.

As part of submitting that CL, I plan to make take these steps:

  1. Make progress on #37827 to reduce the chance of a CL being submitted to a release branch that causes a long test to fail.
  2. Add a flag to releasebot to skip long tests. It is meant to be used in case a long test proves to be flaky, and enough confidence can be gained through testing elsewhere that the failure is not a regression caused by a change merged to the release branch.
  3. Improve releasebot -mode=prepare -dry-run to make it possible to use it to run tests in advance of a release (works for beta releases now, other release types in the future, see #40214).

There is also room for improvement for us to consolidate the mechanism used to execute long tests between post-submit builders and the release command. This is something we will consider in long term improvements to release automation.

This issue is currently marked as a release blocker for 1.15. I don't believe it needs to block the 1.15 release if it's not fully resolved in time, and I believe we've made good progress on it this cycle, so I plan to unmark it release blocker if isn't ready in time. This will let us spend more time on other issues that do block the release and cannot be worked around with manual work in the release process.

@gopherbot
Copy link

@gopherbot gopherbot commented May 26, 2020

Change https://golang.org/cl/235338 mentions this issue: dashboard: include longtest builders for trybots on release branches

@cagedmantis cagedmantis removed their assignment May 27, 2020
gopherbot pushed a commit to golang/build that referenced this issue Jun 2, 2020
The goal of this test policy change is to decrease the chance of a
Go repository release branch from getting into a state where some
longtest post-submit builders are failing, as this will interfere
with the ability to issue releases after golang/go#29252 is fixed.

There are not that many CLs sent to release branches, so the extra
latency caused by slower longtest builders being a part of trybots
should not be very disruptive.

For golang/go#37827.
For golang/go#29252.

Change-Id: I819b120bb7b763df4acf4583375cdb840793d323
Reviewed-on: https://go-review.googlesource.com/c/build/+/235338
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Jun 2, 2020
The goal of this change is to reduce the chance of issuing a release
with an unintended regression that would be caught by running long
tests. This change adds long tests that are run during the -prepare
step, in addition to all the existing short tests that are run.

Executing the long tests is implemented by adding two new test-only
release targets. For a release to be considered complete, all release
targets must complete.

These test-only targets are built only for the purpose of verifying
their tests succeed, or to block the release otherwise.
They do not produce release artifacts.

The new test-only targets are named after the builder which is used
to perform their tests, and they are:

• linux-amd64-longtest
• windows-amd64-longtest

More builders may be added in the future, but care must be taken
to ensure the test execution environment is as close as possible
to that of build.golang.org post-submit builders, in order to
avoid false positives and false negatives.

As part of a gradual rollout, this change also adds a flag to skip
longtest builders. It's meant to be used in case a long test proves
to be flaky, and enough confidence can be gained through testing
elsewhere that the failure is not a regression caused by a change
merged to the release branch.

For now, its default value includes both longtest builders, so they
are currently opt-in and this CL is a no-op. After testing proves
that it is viable to rely on this (and any issues preventing that
from being possible are resolved), the default value of the flag
will be changed to the empty string.

For golang/go#29252.
For golang/go#39054.
For golang/go#37827.
Fixes golang/go#24614.

Change-Id: I33ea6601aade2873f857c63f00a0c11821f35a95
Reviewed-on: https://go-review.googlesource.com/c/build/+/234531
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
@gopherbot
Copy link

@gopherbot gopherbot commented Jun 11, 2020

Change https://golang.org/cl/237602 mentions this issue: [release-branch.go1.13] net: add more timing slop for TestDialParallel on Windows

gopherbot pushed a commit that referenced this issue Jun 12, 2020
…l on Windows

For #35616.
Fixes #39538.
For #29252.

Change-Id: I51b2490100cfe0e902da09eee8d027e0ec86ed53
Reviewed-on: https://go-review.googlesource.com/c/go/+/207466
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
(cherry picked from commit c20b71e)
Reviewed-on: https://go-review.googlesource.com/c/go/+/237602
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
@gopherbot
Copy link

@gopherbot gopherbot commented Jun 17, 2020

Change https://golang.org/cl/238539 mentions this issue: cmd/releasebot: opt into windows-amd64-longtest target by default

gopherbot pushed a commit to golang/build that referenced this issue Jun 18, 2020
The windows-amd64-longtest target is passing on master (for Go 1.15),
release-branch.go1.14, and release-branch.go1.13. It was successfully
opted-into during the Go 1.15 Beta 1 release (see golang/go#39502).

Start including it in the release process by default, so that we have
more information available during the upcoming releases, and in turn
can use it to improve the releases and the release process further.

Also improve the formatting of the release table to improve
its readability, based on experience from golang/go#39502.

For golang/go#29252.

Change-Id: I002c3bf811facf50ca5fb363adebe2baacd039ea
Reviewed-on: https://go-review.googlesource.com/c/build/+/238539
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
@dmitshur
Copy link
Member

@dmitshur dmitshur commented Jul 6, 2020

We've made good progress on incorporating automatic long-test testing into the release process by now, and have a better sense of the remaining work to be able to automate this further. We have a manual process in place for filling in the gap until this issue is resolved fully.

This is sufficient progress for the 1.15 timeframe. Moving to 1.16 as this doesn't need to block the Go 1.15 release.

@dmitshur dmitshur modified the milestones: Go1.15, Go1.16 Jul 6, 2020
gopherbot pushed a commit that referenced this issue Aug 18, 2020
There were two places where the -short flag was added in order to
speed up tests when run in short mode, in CL 178399 and CL 177417.

It appears viable to re-use the GO_TEST_SHORT value so that -short
flag is not used when the tests are executed on a longtest builder,
where it is not a goal to skip slow tests for improved performance.

Do so, in order to make the testing configurations simpler and more
predictable.

Factor out the flag name out of the string returned by short, so that
it can be used in context of 'go test' which can accept a -short flag,
and a test binary which requires the use of a longer -test.short flag.

For #39054.
For #29252.

Change-Id: I52dfbef73cc8307735c52e2ebaa609305fb05933
Reviewed-on: https://go-review.googlesource.com/c/go/+/233898
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.