New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/cli/admin/upgrade: Clarify client-side vs. server-side docs #1181
pkg/cli/admin/upgrade: Clarify client-side vs. server-side docs #1181
Conversation
The outgoing docs were frequently conflated. The incoming docs hopefully make the distinction clearer, and they list the client-side checkForUpgrade conditions. I haven't mentioned the Invalid check, because that trips so rarely, but I have used the open-ended "include checks for...", so folks aren't too surprised if they hit a client-side guard that's not listed in the docs. The "failing clusters" bit is a lie, but that's already tracked in [1], and eventually I'll be able to talk someone into reviewing the fix. The cluster-side guards are also open-ended, although I do call out both release verification (signature checks and similar) and upgradeable conditions, since those are declared in the force godocs [2]. I do not mention additional guards like the etcd backups [3], because the version of oc requesting the update may diverge from the version of the cluster being asked to update, so we don't really know which checks that cluster will run cluster-side. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=1992680 [2]: https://github.com/openshift/api/blob/22eb4f6f4385a0183a5eee4c8ca6d49eecda8120/config/v1/types_cluster_version.go#L378-L386 [3]: https://bugzilla.redhat.com/show_bug.cgi?id=1997347
flags.BoolVar(&o.AllowExplicitUpgrade, "allow-explicit-upgrade", o.AllowExplicitUpgrade, "Upgrade even if the upgrade target is not listed in the available versions list.") | ||
flags.BoolVar(&o.AllowUpgradeWithWarnings, "allow-upgrade-with-warnings", o.AllowUpgradeWithWarnings, "Upgrade even if an upgrade is in process or a cluster error is blocking the update.") | ||
flags.BoolVar(&o.AllowUpgradeWithWarnings, "allow-upgrade-with-warnings", o.AllowUpgradeWithWarnings, "Upgrade regardless of client-side guard failures, such as upgrades in progress or failing clusters.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm surprised that "failing clusters" is a client side check (I assume "failing clusters" means ClusterVersion status.condition failing=true) and not a serverside check, does that mean if someone edits the CV directly to request an upgrade, the failing condition will not block the upgrade?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is right. If someone directly modify clusterversion resource then they can bypass the client side checks. Also I think webconsole does not have these checks. Thats why I am not supportive of client side checks as it is not consistent across different interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats why I am not supportive of client side checks as it is not consistent across different interfaces.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I assume "failing clusters" means ClusterVersion status.condition failing=true)
Failing here means when operators are degraded.
oc/pkg/cli/admin/upgrade/upgrade.go
Lines 560 to 577 in 1eaad48
func checkForUpgrade(cv *configv1.ClusterVersion) error { | |
results := []string{} | |
if c := findClusterOperatorStatusCondition(cv.Status.Conditions, "Invalid"); c != nil && c.Status == configv1.ConditionTrue { | |
results = append(results, fmt.Sprintf("the cluster version object is invalid, you must correct the invalid state first:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n "))) | |
} | |
if c := findClusterOperatorStatusCondition(cv.Status.Conditions, configv1.OperatorDegraded); c != nil && c.Status == configv1.ConditionTrue { | |
results = append(results, fmt.Sprintf("the cluster is experiencing an upgrade-blocking error:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n "))) | |
} | |
if c := findClusterOperatorStatusCondition(cv.Status.Conditions, configv1.OperatorProgressing); c != nil && c.Status == configv1.ConditionTrue { | |
results = append(results, fmt.Sprintf("the cluster is already upgrading:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n "))) | |
} | |
if len(results) == 0 { | |
return nil | |
} | |
return errors.New(strings.Join(results, "")) | |
} |
such as upgrades in progress or failing clusters.
@wking we should change this to such as upgrades in progress or some situation where upgrade is stuck because of an error condition
.
flags.BoolVar(&o.AllowExplicitUpgrade, "allow-explicit-upgrade", o.AllowExplicitUpgrade, "Upgrade even if the upgrade target is not listed in the available versions list.") | ||
flags.BoolVar(&o.AllowUpgradeWithWarnings, "allow-upgrade-with-warnings", o.AllowUpgradeWithWarnings, "Upgrade even if an upgrade is in process or a cluster error is blocking the update.") | ||
flags.BoolVar(&o.AllowUpgradeWithWarnings, "allow-upgrade-with-warnings", o.AllowUpgradeWithWarnings, "Upgrade regardless of client-side guard failures, such as upgrades in progress or failing clusters.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I assume "failing clusters" means ClusterVersion status.condition failing=true)
Failing here means when operators are degraded.
oc/pkg/cli/admin/upgrade/upgrade.go
Lines 560 to 577 in 1eaad48
func checkForUpgrade(cv *configv1.ClusterVersion) error { | |
results := []string{} | |
if c := findClusterOperatorStatusCondition(cv.Status.Conditions, "Invalid"); c != nil && c.Status == configv1.ConditionTrue { | |
results = append(results, fmt.Sprintf("the cluster version object is invalid, you must correct the invalid state first:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n "))) | |
} | |
if c := findClusterOperatorStatusCondition(cv.Status.Conditions, configv1.OperatorDegraded); c != nil && c.Status == configv1.ConditionTrue { | |
results = append(results, fmt.Sprintf("the cluster is experiencing an upgrade-blocking error:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n "))) | |
} | |
if c := findClusterOperatorStatusCondition(cv.Status.Conditions, configv1.OperatorProgressing); c != nil && c.Status == configv1.ConditionTrue { | |
results = append(results, fmt.Sprintf("the cluster is already upgrading:\n\n Reason: %s\n Message: %s\n\n", c.Reason, strings.ReplaceAll(c.Message, "\n", "\n "))) | |
} | |
if len(results) == 0 { | |
return nil | |
} | |
return errors.New(strings.Join(results, "")) | |
} |
such as upgrades in progress or failing clusters.
@wking we should change this to such as upgrades in progress or some situation where upgrade is stuck because of an error condition
.
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@wking @LalatenduMohanty did this get improved via another PR? if not, can we reopen this and land it? |
/reopen |
/lifecycle frozen |
@LalatenduMohanty: The In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@LalatenduMohanty: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: LalatenduMohanty, wking The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/remove-lifecycle stale |
/remove-lifecycle rotten |
@wking: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/hold Revision 10bd469 was retested 3 times: holding |
/hold cancel |
/retest |
/test e2e-aws |
@wking: The specified target(s) for
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
The outgoing docs were frequently conflated. The incoming docs hopefully make the distinction clearer, and they list the client-side
checkForUpgrade
conditions. I haven't mentioned theInvalid
check, because that trips so rarely, but I have used the open-ended "include checks for...", so folks aren't too surprised if they hit a client-side guard that's not listed in the docs. The "failing clusters" bit is a lie, but that's already tracked in #900, and eventually I'll be able to talk someone into reviewing the fix.The cluster-side guards are also open-ended, although I do call out both release verification (signature checks and similar) andupgradeable conditions, since those are declared in the
force
godocs. I do not mention additional guards like the etcd backups, because the version ofoc
requesting the update may diverge from the version of the cluster being asked to update, so we don't really know which checks that cluster will run cluster-side.