Implement a staging workflow to reduce costs #2031

basil · 2023-04-29T00:21:45Z

This PR proposes turning master into a staging area as a cost reduction strategy.

The idea is as follows: for PRs to master (including all Dependabot PRs), we only run prep.sh (in the future, we could add a very small amount of Launchable testing to increase confidence) and mark the build as passing, unless a skip-subset label is present. For master builds, we only run prep.sh as well. This effectively turns master into a staging area where untested/broken code may be present.

To ensure quality releases, we would stop doing releases from the master branch and start doing them from a new release branch. On some cadence (e.g., Wednesday, Friday, and Sunday), a GitHub action runs to merge the master branch into the release branch. There should never be any conflicts here, since the only commits to the release branch will be coming from this job, which is doing a one-way sync from master to release. Unlike PRs to master, PRs to the release branch do run the full test suite, so we can catch any problems and fix them throughout the week. Non-PR builds of release fail just like current non-PR builds of master to avoid duplicated work.

So how would releases happen in this system? I am not too sure. The maven-cd workflow currently in use does not support doing releases from non-default branches and relies on https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#check_run which "will only trigger a workflow run if the workflow file is on the default branch." In our case we want to only do releases from the release branch, since that is the only branch that has been tested. For now this PR triggers a build of the release branch on Monday in preparation for a Tuesday CD release of some sort but does not actually implement the necessary changes in .github/workflows/cd.yaml. I would be looking for someone else to work on this.

The idea is that once this is adopted, the vast majority of builds would be "subset" builds, with "subset" being currently defined as just prep.sh but eventually including some tests from Launchable to improve our confidence. The only "full" builds would be on Wednesday, Friday, and Sunday when PRs are created against the release branch, as well as any non-PR builds of the release branch right before an actual release.

This would drastically reduce costs while still retaining stability. Maintainers would have to merge the automatically created PRs on Wednesday, Friday, and Sunday, and bisect/revert any offending changes if necessary. (Those changes would be fixed & resubmitted with the skip-subset label to properly test them before re-integration.) This seems like a small price to pay for a more sustainable operational deployment.

This is just a starting point, so please propose alternative ideas and start implementing them. It is much easier to talk about a proposal when looking at code (even if the code is still incomplete). For example, an alternative proposal would be to run the error step for any Dependabot PR build (making them unmergable) and write a more complicated GitHub Action workflow to periodically (say, Wednesday, Friday, and Sunday) gather these into a batch and offer a combined PR for full testing. That would be harder to implement, and while it would solve the CD problems described above, it would introduce new problems such as merge conflicts. It also would not decrease costs for non-Dependabot, non-combined PRs. So for these reasons, I didn't try this out, but I would not be upset if someone else made a competing PR with an alternative strategy for discussion.

jglick

Some clarifications on GHA:

The maven-cd workflow currently in use does not support doing releases from non-default branches

Not in response to the check_run trigger, i.e., automatically when sufficient labels are present. You can however trigger releases explicitly (workflow_dispatch) from other another branch, provided that the head of the branch in fact has a passing check.

For frequently released components this would be a hassle, since you have to merge the stuff you want, then wait for Jenkins to finish, then come back and approve a release. For bom getting a release every week or two it could be tolerated. We already sometimes choose to trigger a manual release, just to ship accumulated dependencies PRs that would not individually trigger a release according to interesting-category-action.

I am not sure that the release-drafter action as currently configured would work to populate release notes according to PRs merged to master, and making this reliable could be hard given that some of the most recent such PRs might not be included in the current release.

In this scheme we could consider automerging DB PRs especially given that there would be an extra point of human intervention (the master → release merge) before plugin bumps appear in releases.

.github/workflows/cd.yaml

Co-authored-by: Jesse Glick <jglick@cloudbees.com>

basil · 2023-05-01T23:58:22Z

Looking for an Apache-style vote on the idea from the following people who regularly contribute non-operational changes to this repository (some of whom are also members of @jenkinsci/bom-developers):

Note that current costs are not sustainable and we have a mandate to decrease them, so if you are about to cast a negative vote then please be prepared to discuss (and possibly implement, if the scope is large enough) alternatives. If you intend to abstain, then please let me know explicitly if possible so I am not wondering whether or not you got the ping.

timja · 2023-05-02T11:03:55Z

Is the release branch going to be a PR into master?

i.e. where and how will it be visible that the release branch is failing. and how will people be notified that a merge has caused a broken master build. I'm not that happy with a (likely relatively high, without investigating too much) chance of a potential failing release build forcing the releaser to go and investigate at the point of release rather than a the point of pull request

Example: I need to release bom to smooth dependency updates elsewhere but the build is failing preventing me releasing it.

Would another option be to use merge queues?

Something like don't build expensive CI by default.
but on merge queue do build it.

dependabot PRs are automatically added to merge queue, and the PRs are scheduled so should all come in around the same time and could set a min time of 30mins / 1 hour maybe.

jglick · 2023-05-02T12:30:07Z

where and how will it be visible that the release branch is failing. and how will people be notified that a merge has caused a broken master build

My understanding from the PR description is that we would notice when the Wed/Fri/Sun synch builds fail. Someone would need to be paying attention to those, and manually bisect any problems. This would mean cases where a plugin or core update actually broke tests in some other plugin. The more common case is just that a plugin depends on something newer than the oldest LTS line or there is some other POM issue, which should be caught in the initial PR during the prep stage.

dependabot PRs are automatically added to merge queue, and the PRs are scheduled so should all come in around the same time

That might work. It does seem to be true that DB tends to file a batch of several PRs all at once. If it proves practical, it would be more usable than the current proposal, though it would not control costs as aggressively (would like @dduportal to offer guidance). I think we would still keep #1993 in that case: once the merge queue is good, there is no reason to rebuild master unless you are actually proposing a release.

I have been trying to follow https://github.com/orgs/community/discussions/46757, where a lot of people attempt to distinguish PR head builds from merge queue builds this way, but docs seem unclear. My attempts at using the merge queue for similar situations were not successful: while it does queue up and merge PRs, it retests each one individually, preventing any cost reduction. That may have been due to ignorance of some option.

basil · 2023-05-02T15:17:47Z

where and how will it be visible that the release branch is failing.

Automation will file a pull request three times a week to merge the master branch into the release branch, which (other than actually performing a release) is the only time the full test suite will run. If the previously filed pull request against release was not merged yet, the automation will simply update it with the latest master changes.

how will people be notified that a merge has caused a broken master build.

Note that we will always be running prep.sh so the only possible cause of a broken master build under this proposal is a failing test. But this would only be noticed when syncing the master branch into the release branch. In this proposal, automation will do this and file a PR against the release branch three times a week, running the full test suite. I don't think this will be that much more work for maintainers, since they already have to dig into and investigate test failures already. This will make that process a bit more annoying by adding some increased bisection work, but such is the compromise necessary to achieve a reasonable budget.

I'm not that happy

Well we have to reduce costs somehow. By definition that is going to create some developer unhappiness. But I would argue that running the full test suite on every pull request was never an operationally sustainable design to begin with.

Would another option be to use merge queues?

I do not believe so. Merge queues test each PR individually as mentioned above.

timja · 2023-05-02T17:13:39Z

Ok thanks for that info +1

jglick · 2023-05-02T17:40:11Z

+0 assuming release-drafter can be made to work somehow. The advantage compared to #2034 is that trunk failures would wind up producing a GitHub notification, though in the context of a PR that cannot be edited; if automated notification is important, it would suffice to use GHA to file an issue resulting from @nightly failures, or send a notification from Jenkinsfile for example to a designated email list. Still TBD whether there will be many such failures anyway.

MarkEWaite · 2023-05-02T17:43:18Z

+1 from me

basil · 2023-05-02T18:05:14Z

if automated notification is important

I think regular scheduled test runs and automated notification is critical to provide a positive incentive to regularly perform maintainership duties. Without this, maintainership duties will fall to whoever ends up performing a BOM release or doing some unrelated core/PCT work that happens to require a full test run, which will negatively incentivize that valuable work.

If #2034 implemented regular scheduled test runs and automated notification, it could outcompete this PR because of its simpler single-branch model. I would welcome this, as I believe working solutions should be evaluated based on technical merit.

As #2034 currently stands, it is a less complete version of what is being proposed here that creates negative incentives for performing BOM releases or core/PCT work that happens to require a full test run, so it would not meet my criteria for consideration as a working alternative solution.

* Skip PCT by default on PRs * Run full tests weekly * Empty commit should suffice * `git branch -D` might be necessary Co-authored-by: Joseph Petersen <me@jetersen.dev> * Copying DB automerge workflow from #2031 * Revert #2032 for `prep` phase, retaining node pool for PCT #2034 (comment) --------- Co-authored-by: Joseph Petersen <me@jetersen.dev>

Implement a staging workflow to reduce costs

c62e808

jglick reviewed May 1, 2023

View reviewed changes

.github/workflows/cd.yaml Outdated Show resolved Hide resolved

.github/workflows/cd.yaml Outdated Show resolved Hide resolved

jglick mentioned this pull request May 1, 2023

Bump plugin from 4.61 to 4.62 in /sample-plugin #2029

Merged

basil and others added 4 commits May 1, 2023 10:56

Merge remote-tracking branch 'origin/master' into staging

a69859f

Update .github/workflows/cd.yaml

b5e7bc3

Co-authored-by: Jesse Glick <jglick@cloudbees.com>

Update .github/workflows/cd.yaml

1019862

Co-authored-by: Jesse Glick <jglick@cloudbees.com>

Enable Dependabot auto-merge

df719cb

basil marked this pull request as ready for review May 1, 2023 18:02

basil added 5 commits May 1, 2023 14:40

cleanup

92f9761

Consistent naming

2d2a19a

Improve documentation

ef37339

use standard cron attribute

d384b0d

Grammar

e17643d

Update release documentation

26dbdbc

dduportal mentioned this pull request May 2, 2023

Spring 2023: Decrease AWS costs jenkins-infra/helpdesk#3502

Closed

jglick requested review from dduportal and a team May 2, 2023 12:16

jglick added the chore Reduces future maintenance label May 2, 2023

jglick mentioned this pull request May 2, 2023

Skip PCT by default on PRs #2034

Merged

jglick added a commit to jglick/bom that referenced this pull request May 3, 2023

Copying DB automerge workflow from jenkinsci#2031

f639628

basil closed this May 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a staging workflow to reduce costs #2031

Implement a staging workflow to reduce costs #2031

basil commented Apr 29, 2023 •

edited

jglick left a comment

basil commented May 1, 2023

timja commented May 2, 2023

jglick commented May 2, 2023

basil commented May 2, 2023 •

edited

timja commented May 2, 2023

jglick commented May 2, 2023

MarkEWaite commented May 2, 2023

basil commented May 2, 2023 •

edited

Implement a staging workflow to reduce costs #2031

Implement a staging workflow to reduce costs #2031

Conversation

basil commented Apr 29, 2023 • edited

jglick left a comment

Choose a reason for hiding this comment

basil commented May 1, 2023

timja commented May 2, 2023

jglick commented May 2, 2023

basil commented May 2, 2023 • edited

timja commented May 2, 2023

jglick commented May 2, 2023

MarkEWaite commented May 2, 2023

basil commented May 2, 2023 • edited

basil commented Apr 29, 2023 •

edited

basil commented May 2, 2023 •

edited

basil commented May 2, 2023 •

edited