Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1797624: loosen upgradeable condition to allow z-level upgrades #291

Merged
merged 1 commit into from Feb 6, 2020

Conversation

deads2k
Copy link
Contributor

@deads2k deads2k commented Jan 2, 2020

If current and desired have the same 4.y, then the upgrade is allowed even
if upgradeable==false. If the 4.y is different, then upgrades are not allowed.
This permits CVE updates and fixes the current service-catalog upgrade problem.

Alternative to #285

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 2, 2020
Copy link
Contributor

@smarterclayton smarterclayton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems ok

@@ -31,7 +31,7 @@ func (e *Error) Cause() error {
// Precondition defines the precondition check for a payload.
type Precondition interface {
// Run executes the precondition checks ands returns an error when the precondition fails.
Run(context.Context) error
Run(ctx context.Context, desiredVersion string) error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to pass a concrete struct here like ReleaseContext struct { DesiredVersion types.Version }

}

// if we are upgradeable==true we can always upgrade
up := resourcemerge.FindOperatorStatusCondition(cv.Status.Conditions, configv1.OperatorUpgradeable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the test cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cute. this was merged without tests initially :)

@shawn-hurley
Copy link

This might be slightly off-topic for this PR, but are there alerts that are fired for this condition and do we need to change the text for those alerts?

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 16, 2020
@deads2k
Copy link
Contributor Author

deads2k commented Jan 16, 2020

updated for comments, tests added.

@deads2k
Copy link
Contributor Author

deads2k commented Jan 16, 2020

terraform

/retest

@wking
Copy link
Member

wking commented Jan 16, 2020

One issue with this is that currently we block things like 4.2.13 -> 4.2.14 when the registry is set Unmanaged:

{
  "conditions": [
    {
      "type": "Available",
      "status": "True",
      "lastTransitionTime": "2020-01-07T08:19:28Z",
      "reason": "Unmanaged",
      "message": "The registry configuration is set to unmanaged mode"
    },
    {
      "type": "Progressing",
      "status": "False",
      "lastTransitionTime": "2020-01-07T08:19:28Z",
      "reason": "Unmanaged",
      "message": "The registry configuration is set to unmanaged mode"
    },
    {
      "type": "Degraded",
      "status": "False",
      "lastTransitionTime": "2019-08-13T08:54:27Z",
      "reason": "Unmanaged",
      "message": "The registry configuration is set to unmanaged mode"
    }
  ],
  "versions": [
    {
      "name": "operator",
      "version": "4.2.13"
    }
  ],
  ...
}

Setting Upgradeable=False would allow the registry operator to say "I'm Unmanaged so I'm not going to block any upgrade attempts" and head off updates before they get started. Currently that 4.2.13->4.2.14 update is hung mid-update with Cluster operator image-registry is still updating. Do we have a recommended approach for this issue if we change the Updateable semantics? Should the registry operator be going along with updates and bumping version for z-stream bumps in those cases?

@deads2k
Copy link
Contributor Author

deads2k commented Jan 16, 2020

Setting Upgradeable=False would allow the registry operator to say "I'm Unmanaged so I'm not going to block any upgrade attempts" and head off updates before they get started. Currently that 4.2.13->4.2.14 update is hung mid-update with Cluster operator image-registry is still updating. Do we have a recommended approach for this issue if we change the Updateable semantics? Should the registry operator be going along with updates and bumping version for z-stream bumps in those cases?

Having an unmanaged component that blocks upgrades of any kind would really surprise me since it seems like they would always want to encourage updates that would allow the component to go back managed. z-level upgrades should basically always work or you will be blocking CVE updates.

@wking
Copy link
Member

wking commented Jan 16, 2020

... are there alerts that are fired for this condition ...

Not yet, but we have an internal ticket to add some.

@wking
Copy link
Member

wking commented Jan 16, 2020

Having an unmanaged component that blocks upgrades...

Sounds like rhbz#1791934 will be resolved by having the operator bump its version to not block updates, so it is no longer a concern vs. this PR.

@deads2k
Copy link
Contributor Author

deads2k commented Jan 17, 2020

terraform again

/retest

@sdodson
Copy link
Member

sdodson commented Jan 20, 2020

/approve
This not a code review, but general concept was discussed and agreed upon during arch call. Marking as such.

return nil
}

currentMinor := getEffectiveMinor(cv.Status.History[0].Version)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of pulling this out of the cluster's ClusterVersion history, I'd rather pull it from the local state. But then you'd have to pipe it through to this precondition code. Including it in ReleaseContext would be fine, although we'd want to rename to something more generic. Maybe just Context (so precondition.Context in external packages)? To get from the Operator into the SyncWorker... I dunno, seems like we could feed the context into NewSyncWorkerWithPreconditions here? If that sounds plausible, I can work up a fixup commit that you could squash in here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the way I would do it since we're already getting the information here (pre-existing gets) and you're already getting cached data, so it seems unnecessary and odd. Could you do it as a follow-up instead if you really want it after?


// if there is no difference in the minor version (4.y.z where 4.y is the same for current and desired), then we can still upgrade
if currentMinor == desiredMinor {
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to build the *precondition.Error above and log the fact that we're ignoring it due to the minor update here. Or maybe push out an Event, since once we actually start updating, we're going to replace the CVO Pod and lose the old CVO's logs? Leaving some breadcrumbs around this to make it less surprising would be nice, but is not something we need to block merging if it would be too difficult to implement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems a little weird to do, since people are unlikely to wonder why their desired update was allowed and succeeded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems a little weird to do, since people are unlikely to wonder why their desired update was allowed and succeeded.

I can see component authors wondering why an update was initiated despite them having set Upgradeable=False. But whatever, obviously not critical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was discussing bug https://bugzilla.redhat.com/show_bug.cgi?id=1822513 with @wking and he suggested I update here as a follow-up to this thread.
The bug's underlying cause is that currentMinor is always being pulled from cv.Status.History[0].Version (

currentMinor := getEffectiveMinor(cv.Status.History[0].Version)
) which contains the version being upgraded to and not the current version. In the bug's specific case, when --to-image is used, cv.Status.History[0].Version = "" which then fails the check for a z-level upgrade. Instead we should iterate the version history to find and use the first version with State == configv1.CompletedUpdate, which will yield the current version, and pull currentMinor from it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@deads2k thoughts on the above?

If current and desired have the same 4.y, then the upgrade is allowed even
if upgradeable==false.  If the 4.y is different, then upgrades are not allowed.
This permits CVE updates and fixes the current service-catalog upgrade problem.
@deads2k
Copy link
Contributor Author

deads2k commented Jan 29, 2020

/retest

1 similar comment
@deads2k
Copy link
Contributor Author

deads2k commented Jan 30, 2020

/retest

@deads2k deads2k changed the title loosen upgradeable condition to allow z-level upgrades bug 1797624: loosen upgradeable condition to allow z-level upgrades Feb 3, 2020
@openshift-ci-robot
Copy link
Contributor

@deads2k: This pull request references Bugzilla bug 1797624, which is invalid:

  • expected the bug to target the "4.4.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

bug 1797624: loosen upgradeable condition to allow z-level upgrades

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Feb 3, 2020
@deads2k
Copy link
Contributor Author

deads2k commented Feb 3, 2020

/cherrypick release-4.3

@openshift-cherrypick-robot

@deads2k: once the present PR merges, I will cherry-pick it on top of release-4.3 in a new PR and assign it to you.

In response to this:

/cherrypick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot removed the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Feb 3, 2020
@openshift-ci-robot
Copy link
Contributor

@deads2k: This pull request references Bugzilla bug 1797624, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking changed the title bug 1797624: loosen upgradeable condition to allow z-level upgrades Bug 1797624: loosen upgradeable condition to allow z-level upgrades Feb 6, 2020
@wking
Copy link
Member

wking commented Feb 6, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 6, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, sdodson, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 6, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

9 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit 69df6ba into openshift:master Feb 6, 2020
@openshift-ci-robot
Copy link
Contributor

@deads2k: All pull requests linked via external trackers have merged. Bugzilla bug 1797624 has been moved to the MODIFIED state.

In response to this:

Bug 1797624: loosen upgradeable condition to allow z-level upgrades

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-cherrypick-robot

@deads2k: new pull request created: #315

In response to this:

/cherrypick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wking added a commit to wking/cluster-version-operator that referenced this pull request Apr 22, 2020
Catching the message strings up with the softening from bcd58d8
(loosen upgradeable condition to allow z-level upgrades, 2020-01-02, openshift#291),
addressing [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1823306
wking added a commit to wking/cluster-version-operator that referenced this pull request May 6, 2020
Catching the message strings up with the softening from bcd58d8
(loosen upgradeable condition to allow z-level upgrades, 2020-01-02, openshift#291),
addressing [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1823306
wking added a commit to wking/cluster-version-operator that referenced this pull request May 6, 2020
Catching the message strings up with the softening from bcd58d8
(loosen upgradeable condition to allow z-level upgrades, 2020-01-02, openshift#291),
addressing [1].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1823306
wking added a commit to wking/cluster-version-operator that referenced this pull request Dec 8, 2021
…gradeable_test

To match our preferred spelling.  The test file has had the outgoing
spelling since it landed in bcd58d8 (loosen upgradeable condition
to allow z-level upgrades, 2020-01-02, openshift#291).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants