Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hack/stabilization-changes: Gate 4.(y-1).z into fast-4.y, etc. on update paths to 4.y #2059

Conversation

wking
Copy link
Member

@wking wking commented Jun 15, 2022

We sometimes tombstone or otherwise do not release a 4.y.z in a given week, when we do release 4.(y-1).z'. Hypothetical example:

  1. Some bug shows up which makes 4.98.0-to-4.99 conditionally risky, and we've raised minor_min to 4.98.25 for new 4.99 builds.
  2. Week 1: 4.98.90 and 4.99.50 are built. All is well.
  3. Week 2: 4.98.91 and 4.99.51 are built, but 4.99.51 gets tombstoned.
  4. Week 3: 4.98.91 enters fast channels, including fast-4.99.
  5. Alice is on 4.98.0, and decides to move to 4.99 via fast-4.99. There are no recommended updates to 4.99.50 or 4.9.51 because of the long-standing minor_min floor.
  6. Alice sees recommendations to 4.98.90 and 4.98.91, and heads off to 4.98.91 because it fixes more bugs.
  7. Oh no, there are no recommendations to get from 4.98.91 to 4.99. Alice is stuck.
  8. Week 4: 4.99.52 enters fast channels, with an update path from 4.98.91, and Alice can finally update.

What we want is to withhold 4.98.91 from fast-4.99 until it has an update path into 4.99, so when Alice tries to get to 4.99, we send her through 4.98.90, saving her the week+ of waiting for a new 4.99.z release that can update from 4.98.91.

Most of the difficulty here is figuring out whether there will be an update path. We could teach the script to compute those based on local graph-data information and registry requests like hack/show-edges.py does. But walking Quay tags and pulling and inspecting release images is slow and tedious, and we probably don't need this sort of fanciness for candidate-... channels. So this commit sticks with our current behavior for candidate-..., uses the script's existing update service request logic to get update recommendations for candidate-4.y, and expects those same recommendations to hold true for fast-4.y and later if the source and target releases were both in the fast-4.y or later channel (which is how Cincinnati works today, and likely to be how it continues to work in the future).

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 15, 2022
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 15, 2022
if release_major_minor == channel_major_minor:
return # we are concerned about getting from 4.(y-1) and earlier into 4.y, not about movement within 4.y.
cincinnati_uri, cincinnati_data = get_cincinnati_channel(cache=cache, channel='candidate-{}'.format(channel_major_minor))
paths = FIXME # find version in cincinnati_data and look for updates (possibly multi-hop) to 4.y
Copy link
Member

@sdodson sdodson Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as long as version has a path to release_major_minor + 1 we're good. In multi-hop we would've stopped 4.98.91 from going into eus-4.100 because 4.99.50 never made it in because it lacked an upgrade path to 4.100.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased and taught about path walking with 65ca645 -> 936de75. I started in on your single-hop proposal, but realized that even if 4.99.50-to-4.100 was unconditionally recommended when we decided to promote 4.99.50, it may no longer be recommended when we are considering 4.98.91. And holding 4.98.91 out of a 4.10 channel in that case will make it more likely that folks on 4.98.(z < 91) get a happy path through, instead of heading to 4.98.91, on to 4.99.50, and then being stuck.

Testing with a partial revert of #2245:

$ git revert fd6471d
$ hack/stabilization-changes.py
...
* Recommend waiting to promote 4.7.54 to eus-4.8; it was promoted the feeder stable by 0a4c588e81 (Merge pull request #2237 from openshift-ota-bot/promote-4.7.54-to-stable, 2022-07-20, 4 days, 0:43:33.877052) No unconditional paths from 4.7.54 to later major or minor releases in https://api.openshift.com/api/upgrades_info/v1/graph?channel=candidate-4.8&arch=amd64
...

Which looks good to me.

@sdodson
Copy link
Member

sdodson commented Jul 8, 2022

Could this also alert us whenever we remove recommendations which cause dead-ends after the fact? I think we should be granted freedom to preserve continuity in upgrades to channel-X.Y even if it means removing some X.Y-n.z temporarily but perhaps only by raising alarms for humans to look into. I view this as different from removing x.y.z from channel-X.Y but maybe I'm overthinking this.

I'm in favor of accepting this, provided you're done and you've tested, then we can revisit closing smaller gaps in the future.

@wking wking changed the title WIP: hack/stabilization-changes: Gate 4.(y-1).z into fast-4.y, etc. on update paths to 4.y hack/stabilization-changes: Gate 4.(y-1).z into fast-4.y, etc. on update paths to 4.y Jul 25, 2022
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 25, 2022
@wking wking force-pushed the gate-older-minors-on-updates-to-cap-minor-in-candidate-4.y branch 2 times, most recently from 4a064dc to 936de75 Compare July 25, 2022 07:21
…ate paths to 4.y

We sometimes tombstone or otherwise do not release a 4.y.z in a given
week, when we do release 4.(y-1).z'. Hypothetical example:

1. Some bug shows up which makes 4.98.0-to-4.99 conditionally risky,
   and we've raised minor_min to 4.98.25 for new 4.99 builds.
2. Week 1: 4.98.90 and 4.99.50 are built. All is well.
3. Week 2: 4.98.91 and 4.99.51 are built, but 4.99.51 gets tombstoned.
4. Week 3: 4.98.91 enters fast channels, including fast-4.99.
5. Alice is on 4.98.0, and decides to move to 4.99 via
   fast-4.99. There are no recommended updates to 4.99.50 or 4.9.51
   because of the long-standing minor_min floor.
6. Alice sees recommendations to 4.98.90 and 4.98.91, and heads off to
   4.98.91 because it fixes more bugs.
7. Oh no, there are no recommendations to get from 4.98.91 to
   4.99. Alice is stuck.
8. Week 4: 4.99.52 enters fast channels, with an update path from
   4.98.91, and Alice can finally update.

What we want is to withhold 4.98.91 from fast-4.99 until it has an
update path into 4.99, so when Alice tries to get to 4.99, we send her
through 4.98.90, saving her the week+ of waiting for a new 4.99.z
release that can update from 4.98.91.

Most of the difficulty here is figuring out whether there will be an
update path.  We could teach the script to compute those based on
local graph-data information and registry requests like
hack/show-edges.py does.  But walking Quay tags and pulling and
inspecting release images is slow and tedious, and we probably don't
need this sort of fanciness for candidate-... channels. So this commit
sticks with our current behavior for candidate-..., uses the script's
existing update service request logic to get update recommendations
for candidate-4.y, and expects those same recommendations to hold true
for fast-4.y and later if the source and target releases were both in
the fast-4.y or later channel (which is how Cincinnati works today,
and likely to be how it continues to work in the future).

I'm looking at multiple hops.  For example, with 4.(y-2).z,
4.(y-1).z', and 4.y.z" releases, 4.(y-1).z' will only enter fast-4.10
if it has a path to 4.y.z".  But by the time we go to consider
4.(y-2).z, the previously-confirmed 4.(y-1).z'-to-4.y.z" update may no
longer be recommended, and in that case we'd want to hold 4.(y-2).z
out of the channel until 4.(y-1).z'-to-4.y.z" was restored or an
alternate path to 4.y became unconditionally recommended.
@wking
Copy link
Member Author

wking commented Jul 25, 2022

Testing:

$ git --no-pager log --oneline -1
674e829 (HEAD -> gate-older-minors-on-updates-to-cap-minor-in-candidate-4.y) hack/stabilization-changes: Gate 4.(y-1).z into fast-4.y, etc. on update paths to 4.y

Clear out the final z of each 4.y in eus-4.10 to set up some promotions. Also remove 4.9.41, because 4.8.46 is old enough to update through 4.9.41 to 4.10.21:

$ sed -i '/- 4[.]\(8[.]46\|9[.]4[12]\|10[.]22\)/d' channels/eus-4.10.yaml 

First time through, only 4.10.21 makes the cut (same major.minor as the channel):

$ hack/stabilization-changes.py 2>/dev/null | grep eus-4.10
* channels/eus-4.10: Promote 4.10.22. It was promoted to the feeder stable by 383e64bf0d (Merge pull request #2212 from openshift-ota-bot/promote-4.10.22-to-stable, 2022-07-18) 6 days, 13:51:59.627324 ago. data://no-token-so-no-pull
* Recommend waiting to promote 4.8.46 to eus-4.10; it was promoted the feeder stable by bb27a83d4e (Merge pull request #2190 from openshift-ota-bot/promote-4.8.46-to-stable, 2022-07-13, 11 days, 15:24:29.634141) No unconditional paths from 4.8.46 to 4.10 in https://api.openshift.com/api/upgrades_info/v1/graph?channel=candidate-4.10&arch=amd64 https://api.openshift.com/api/upgrades_info/v1/graph?channel=candidate-4.9&arch=amd64
* channels/eus-4.10: Promote 4.9.41. It was promoted to the feeder stable by eec9b2f74d (Merge pull request #2184 from openshift-ota-bot/promote-4.9.41-to-stable, 2022-07-12) 12 days, 8:37:39.728757 ago. data://no-token-so-no-pull
* Recommend waiting to promote 4.9.42 to eus-4.10; it was promoted the feeder stable by b8295d6cb8 (Merge pull request #2216 from openshift-ota-bot/promote-4.9.42-to-stable, 2022-07-19, 5 days, 16:13:55.755404) No unconditional paths from 4.9.42 to 4.10 in https://api.openshift.com/api/upgrades_info/v1/graph?channel=candidate-4.10&arch=amd64

Second time through, now 4.9.42 makes the cut (because 4.10.22 is in) and 4.8.46 makes the cut (because 4.9.41 is in):

$ hack/stabilization-changes.py 2>/dev/null | grep eus-4.10
* channels/eus-4.10: Promote 4.8.46. It was promoted to the feeder stable by bb27a83d4e (Merge pull request #2190 from openshift-ota-bot/promote-4.8.46-to-stable, 2022-07-13) 11 days, 15:24:54.859633 ago. data://no-token-so-no-pull
* channels/eus-4.10: Promote 4.9.42. It was promoted to the feeder stable by b8295d6cb8 (Merge pull request #2216 from openshift-ota-bot/promote-4.9.42-to-stable, 2022-07-19) 5 days, 16:14:20.858847 ago. data://no-token-so-no-pull

@wking wking force-pushed the gate-older-minors-on-updates-to-cap-minor-in-candidate-4.y branch from 936de75 to 674e829 Compare July 25, 2022 07:57
@wking
Copy link
Member Author

wking commented Jul 25, 2022

I think we should be granted freedom to preserve continuity in upgrades to channel-X.Y even if it means removing some X.Y-n.z temporarily but perhaps only by raising alarms for humans to look into.

I don't like removing releases from channels, because then users who updated to that release in that channel will start seeing VersionNotFound. Instead, for update recommendations leaving 4.10 and later, we could use conditional risks tied to the configured channel. Or just accept the dead-end until we had a replacement path available. Or we could shift this sort of dancing from the graph-data promotion script into Cincinnati itself (where phased rollouts openshift/enhancements#427 are intended to help reduce VersionNotFound issues, and could be extended to cover this sort of dead-end avoidance). But as you say, we can punt this sort of thing out to future work, without having to sort it in this pull request.

@LalatenduMohanty
Copy link
Member

But as you say, we can punt this sort of thing out to future work, without having to sort it in this pull request.

+1

Copy link
Member

@LalatenduMohanty LalatenduMohanty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 3, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 3, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [LalatenduMohanty,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 3, 2022

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-ci openshift-ci bot merged commit 6258c11 into openshift:master Aug 3, 2022
@wking wking deleted the gate-older-minors-on-updates-to-cap-minor-in-candidate-4.y branch August 3, 2022 22:17
wking added a commit to wking/cincinnati-graph-data that referenced this pull request Aug 3, 2022
This reverts commit fd6471d.

With 674e829 (hack/stabilization-changes: Gate 4.(y-1).z into
fast-4.y, etc. on update paths to 4.y, 2022-06-15, openshift#2059) in place, we
no longer need the manual workaround.
wking added a commit to wking/cincinnati-graph-data that referenced this pull request Aug 3, 2022
This reverts commit af55cca.

With 674e829 (hack/stabilization-changes: Gate 4.(y-1).z into
fast-4.y, etc. on update paths to 4.y, 2022-06-15, openshift#2059) in place, we
no longer need the manual workaround.
wking added a commit to wking/cincinnati-graph-data that referenced this pull request Aug 22, 2022
For output like:

  $ hack/show-edges.py --revision 3511fa8 --list-unable-to-reach-target-minor-version stable-4.8 | tail -n3
  4.8.9 -> 4.8.45
  4.8.9 -> 4.8.46
  No unconditional paths from 4.7.54 to 4.8

To make it easier to understand historical exposure to the issue that
674e829 (hack/stabilization-changes: Gate 4.(y-1).z into fast-4.y,
etc. on update paths to 4.y, 2022-06-15, openshift#2059) was trying to fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
3 participants