pkg/cli/admin/upgrade: Clarify client-side vs. server-side docs #1181

wking · 2022-06-25T03:33:29Z

The outgoing docs were frequently conflated. The incoming docs hopefully make the distinction clearer, and they list the client-side checkForUpgrade conditions. I haven't mentioned the Invalid check, because that trips so rarely, but I have used the open-ended "include checks for...", so folks aren't too surprised if they hit a client-side guard that's not listed in the docs. The "failing clusters" bit is a lie, but that's already tracked in #900, and eventually I'll be able to talk someone into reviewing the fix.

The cluster-side guards are also open-ended, although I do call out both release verification (signature checks and similar) andupgradeable conditions, since those are declared in the force godocs. I do not mention additional guards like the etcd backups, because the version of oc requesting the update may diverge from the version of the cluster being asked to update, so we don't really know which checks that cluster will run cluster-side.

The outgoing docs were frequently conflated. The incoming docs hopefully make the distinction clearer, and they list the client-side checkForUpgrade conditions. I haven't mentioned the Invalid check, because that trips so rarely, but I have used the open-ended "include checks for...", so folks aren't too surprised if they hit a client-side guard that's not listed in the docs. The "failing clusters" bit is a lie, but that's already tracked in [1], and eventually I'll be able to talk someone into reviewing the fix. The cluster-side guards are also open-ended, although I do call out both release verification (signature checks and similar) and upgradeable conditions, since those are declared in the force godocs [2]. I do not mention additional guards like the etcd backups [3], because the version of oc requesting the update may diverge from the version of the cluster being asked to update, so we don't really know which checks that cluster will run cluster-side. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1992680 [2]: https://github.com/openshift/api/blob/22eb4f6f4385a0183a5eee4c8ca6d49eecda8120/config/v1/types_cluster_version.go#L378-L386 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1997347

bparees · 2022-06-27T17:13:07Z

pkg/cli/admin/upgrade/upgrade.go

 	flags.BoolVar(&o.AllowExplicitUpgrade, "allow-explicit-upgrade", o.AllowExplicitUpgrade, "Upgrade even if the upgrade target is not listed in the available versions list.")
-	flags.BoolVar(&o.AllowUpgradeWithWarnings, "allow-upgrade-with-warnings", o.AllowUpgradeWithWarnings, "Upgrade even if an upgrade is in process or a cluster error is blocking the update.")
+	flags.BoolVar(&o.AllowUpgradeWithWarnings, "allow-upgrade-with-warnings", o.AllowUpgradeWithWarnings, "Upgrade regardless of client-side guard failures, such as upgrades in progress or failing clusters.")


i'm surprised that "failing clusters" is a client side check (I assume "failing clusters" means ClusterVersion status.condition failing=true) and not a serverside check, does that mean if someone edits the CV directly to request an upgrade, the failing condition will not block the upgrade?

That is right. If someone directly modify clusterversion resource then they can bypass the client side checks. Also I think webconsole does not have these checks. Thats why I am not supportive of client side checks as it is not consistent across different interfaces.

Thats why I am not supportive of client side checks as it is not consistent across different interfaces.

+1

(I assume "failing clusters" means ClusterVersion status.condition failing=true)

Failing here means when operators are degraded.

oc/pkg/cli/admin/upgrade/upgrade.go

Lines 560 to 577 in 1eaad48

func checkForUpgrade(cv *configv1.ClusterVersion) error {

results := []string{}

if c := findClusterOperatorStatusCondition(cv.Status.Conditions, "Invalid"); c != nil && c.Status == configv1.ConditionTrue {

results = append(results, fmt.Sprintf("the cluster version object is invalid, you must correct the invalid state first:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n ")))

}

if c := findClusterOperatorStatusCondition(cv.Status.Conditions, configv1.OperatorDegraded); c != nil && c.Status == configv1.ConditionTrue {

results = append(results, fmt.Sprintf("the cluster is experiencing an upgrade-blocking error:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n ")))

}

if c := findClusterOperatorStatusCondition(cv.Status.Conditions, configv1.OperatorProgressing); c != nil && c.Status == configv1.ConditionTrue {

results = append(results, fmt.Sprintf("the cluster is already upgrading:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n ")))

}

if len(results) == 0 {

return nil

}

return errors.New(strings.Join(results, ""))

}

such as upgrades in progress or failing clusters.

@wking we should change this to such as upgrades in progress or some situation where upgrade is stuck because of an error condition.

LalatenduMohanty · 2022-07-18T15:07:58Z

pkg/cli/admin/upgrade/upgrade.go

 	flags.BoolVar(&o.AllowExplicitUpgrade, "allow-explicit-upgrade", o.AllowExplicitUpgrade, "Upgrade even if the upgrade target is not listed in the available versions list.")
-	flags.BoolVar(&o.AllowUpgradeWithWarnings, "allow-upgrade-with-warnings", o.AllowUpgradeWithWarnings, "Upgrade even if an upgrade is in process or a cluster error is blocking the update.")
+	flags.BoolVar(&o.AllowUpgradeWithWarnings, "allow-upgrade-with-warnings", o.AllowUpgradeWithWarnings, "Upgrade regardless of client-side guard failures, such as upgrades in progress or failing clusters.")


(I assume "failing clusters" means ClusterVersion status.condition failing=true)

Failing here means when operators are degraded.

oc/pkg/cli/admin/upgrade/upgrade.go

Lines 560 to 577 in 1eaad48

func checkForUpgrade(cv *configv1.ClusterVersion) error {

results := []string{}

if c := findClusterOperatorStatusCondition(cv.Status.Conditions, "Invalid"); c != nil && c.Status == configv1.ConditionTrue {

results = append(results, fmt.Sprintf("the cluster version object is invalid, you must correct the invalid state first:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n ")))

}

if c := findClusterOperatorStatusCondition(cv.Status.Conditions, configv1.OperatorDegraded); c != nil && c.Status == configv1.ConditionTrue {

results = append(results, fmt.Sprintf("the cluster is experiencing an upgrade-blocking error:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n ")))

}

if c := findClusterOperatorStatusCondition(cv.Status.Conditions, configv1.OperatorProgressing); c != nil && c.Status == configv1.ConditionTrue {

results = append(results, fmt.Sprintf("the cluster is already upgrading:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n ")))

}

if len(results) == 0 {

return nil

}

return errors.New(strings.Join(results, ""))

}

such as upgrades in progress or failing clusters.

@wking we should change this to such as upgrades in progress or some situation where upgrade is stuck because of an error condition.

openshift-bot · 2023-02-04T09:00:22Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2023-03-07T00:30:20Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2023-04-06T08:00:56Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci · 2023-04-06T08:01:36Z

@openshift-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

bparees · 2023-04-06T15:01:00Z

@wking @LalatenduMohanty did this get improved via another PR? if not, can we reopen this and land it?

LalatenduMohanty · 2023-04-06T15:04:56Z

/reopen

LalatenduMohanty · 2023-04-06T15:05:16Z

/lifecycle frozen

openshift-ci · 2023-04-06T15:05:21Z

@LalatenduMohanty: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2023-04-06T15:05:42Z

@LalatenduMohanty: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

LalatenduMohanty

/lgtm

openshift-ci · 2023-04-06T15:12:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: LalatenduMohanty, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/cli/admin/upgrade/OWNERS~~ [LalatenduMohanty,wking]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

LalatenduMohanty · 2023-04-06T15:18:17Z

/remove-lifecycle stale

petr-muller · 2023-04-06T15:23:24Z

/remove-lifecycle rotten

openshift-ci-robot · 2023-04-06T16:51:34Z

/retest-required

Remaining retests: 0 against base HEAD 8228cb2 and 2 for PR HEAD 10bd469 in total

openshift-ci-robot · 2023-04-08T13:14:25Z

/retest-required

Remaining retests: 0 against base HEAD 539d06f and 1 for PR HEAD 10bd469 in total

openshift-ci-robot · 2023-04-10T09:29:56Z

/retest-required

Remaining retests: 0 against base HEAD 8964f36 and 0 for PR HEAD 10bd469 in total

openshift-ci · 2023-04-10T11:10:33Z

@wking: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws	`10bd469`	link	true	`/test e2e-aws`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot · 2023-04-10T12:02:19Z

/hold

Revision 10bd469 was retested 3 times: holding

LalatenduMohanty · 2023-04-15T02:27:50Z

/hold cancel

LalatenduMohanty · 2023-04-15T02:28:50Z

/retest

wking · 2023-04-15T23:02:55Z

/test e2e-aws

openshift-ci · 2023-04-15T23:02:58Z

@wking: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

/test build-rpms-from-tar
/test e2e-agnostic-ovn-cmd
/test e2e-aws-ovn
/test e2e-aws-ovn-builds
/test e2e-aws-ovn-serial
/test e2e-aws-ovn-upgrade
/test images
/test rpm-build
/test unit
/test verify
/test verify-deps

The following commands are available to trigger optional jobs:

/test e2e-metal-ipi-ovn-ipv6

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-oc-master-build-rpms-from-tar
pull-ci-openshift-oc-master-e2e-agnostic-ovn-cmd
pull-ci-openshift-oc-master-e2e-aws-ovn
pull-ci-openshift-oc-master-e2e-aws-ovn-serial
pull-ci-openshift-oc-master-e2e-aws-ovn-upgrade
pull-ci-openshift-oc-master-e2e-metal-ipi-ovn-ipv6
pull-ci-openshift-oc-master-images
pull-ci-openshift-oc-master-rpm-build
pull-ci-openshift-oc-master-unit
pull-ci-openshift-oc-master-verify
pull-ci-openshift-oc-master-verify-deps

In response to this:

/test e2e-aws

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci bot requested review from deads2k and soltysh June 25, 2022 03:34

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 25, 2022

bparees reviewed Jun 27, 2022

View reviewed changes

LalatenduMohanty suggested changes Jul 18, 2022

View reviewed changes

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 4, 2023

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 7, 2023

openshift-ci bot closed this Apr 6, 2023

openshift-ci bot reopened this Apr 6, 2023

LalatenduMohanty approved these changes Apr 6, 2023

View reviewed changes

openshift-ci bot assigned LalatenduMohanty Apr 6, 2023

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 6, 2023

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 6, 2023

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 10, 2023

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 15, 2023

openshift-merge-robot merged commit 5f8c36d into openshift:master Apr 17, 2023
13 checks passed

wking deleted the doc-distinguish-upgrade-client-vs-server-guards branch April 17, 2023 22:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pkg/cli/admin/upgrade: Clarify client-side vs. server-side docs #1181

pkg/cli/admin/upgrade: Clarify client-side vs. server-side docs #1181

wking commented Jun 25, 2022

bparees Jun 27, 2022

LalatenduMohanty Jul 18, 2022

bparees Jul 18, 2022

LalatenduMohanty Jul 18, 2022

LalatenduMohanty Jul 18, 2022

openshift-bot commented Feb 4, 2023

openshift-bot commented Mar 7, 2023

openshift-bot commented Apr 6, 2023

openshift-ci bot commented Apr 6, 2023

bparees commented Apr 6, 2023

LalatenduMohanty commented Apr 6, 2023

LalatenduMohanty commented Apr 6, 2023

openshift-ci bot commented Apr 6, 2023

openshift-ci bot commented Apr 6, 2023

LalatenduMohanty left a comment

openshift-ci bot commented Apr 6, 2023

LalatenduMohanty commented Apr 6, 2023

petr-muller commented Apr 6, 2023

openshift-ci-robot commented Apr 6, 2023

openshift-ci-robot commented Apr 8, 2023

openshift-ci-robot commented Apr 10, 2023

openshift-ci bot commented Apr 10, 2023 •

edited

openshift-ci-robot commented Apr 10, 2023

LalatenduMohanty commented Apr 15, 2023

LalatenduMohanty commented Apr 15, 2023

wking commented Apr 15, 2023

openshift-ci bot commented Apr 15, 2023

	func checkForUpgrade(cv *configv1.ClusterVersion) error {
	results := []string{}
	if c := findClusterOperatorStatusCondition(cv.Status.Conditions, "Invalid"); c != nil && c.Status == configv1.ConditionTrue {
	results = append(results, fmt.Sprintf("the cluster version object is invalid, you must correct the invalid state first:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n ")))
	}
	if c := findClusterOperatorStatusCondition(cv.Status.Conditions, configv1.OperatorDegraded); c != nil && c.Status == configv1.ConditionTrue {
	results = append(results, fmt.Sprintf("the cluster is experiencing an upgrade-blocking error:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n ")))
	}
	if c := findClusterOperatorStatusCondition(cv.Status.Conditions, configv1.OperatorProgressing); c != nil && c.Status == configv1.ConditionTrue {
	results = append(results, fmt.Sprintf("the cluster is already upgrading:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n ")))
	}

	if len(results) == 0 {
	return nil
	}

	return errors.New(strings.Join(results, ""))
	}

pkg/cli/admin/upgrade: Clarify client-side vs. server-side docs #1181

pkg/cli/admin/upgrade: Clarify client-side vs. server-side docs #1181

Conversation

wking commented Jun 25, 2022

bparees Jun 27, 2022

Choose a reason for hiding this comment

LalatenduMohanty Jul 18, 2022

Choose a reason for hiding this comment

bparees Jul 18, 2022

Choose a reason for hiding this comment

LalatenduMohanty Jul 18, 2022

Choose a reason for hiding this comment

LalatenduMohanty Jul 18, 2022

Choose a reason for hiding this comment

openshift-bot commented Feb 4, 2023

openshift-bot commented Mar 7, 2023

openshift-bot commented Apr 6, 2023

openshift-ci bot commented Apr 6, 2023

bparees commented Apr 6, 2023

LalatenduMohanty commented Apr 6, 2023

LalatenduMohanty commented Apr 6, 2023

openshift-ci bot commented Apr 6, 2023

openshift-ci bot commented Apr 6, 2023

LalatenduMohanty left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Apr 6, 2023

LalatenduMohanty commented Apr 6, 2023

petr-muller commented Apr 6, 2023

openshift-ci-robot commented Apr 6, 2023

openshift-ci-robot commented Apr 8, 2023

openshift-ci-robot commented Apr 10, 2023

openshift-ci bot commented Apr 10, 2023 • edited

openshift-ci-robot commented Apr 10, 2023

LalatenduMohanty commented Apr 15, 2023

LalatenduMohanty commented Apr 15, 2023

wking commented Apr 15, 2023

openshift-ci bot commented Apr 15, 2023

openshift-ci bot commented Apr 10, 2023 •

edited