New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
status: Hide generic operator status in favor of more specific errors #192
status: Hide generic operator status in favor of more specific errors #192
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hrm, this doesn't seem to be working like I expected |
Oh wait, nm, that was another PR. |
When upgrading and a sync loop exits, we often see: * Operator A not upgraded yet * Operator B not upgraded yet * Operator C not upgraded yet * Some more specific error * Operator D not upgraded yet Since the not available messages carry little value and are expected when we are processing parallel output, if we have one or more errors during sync that are not the generic error, only report those. This focuses the user's attention on the actual problem, rather than distracting them with noise. In the ideal case, the user immediately sees "router is totally broken" vs "a, b, c, d, e, router is broken, f".
a46aaea
to
bdd4545
Compare
if filtered := filterErrors(errs, isClusterOperatorNotAvailable); len(filtered) > 0 { | ||
return newMultipleError(filtered) | ||
} | ||
// if we're only waiting for operators, condense the error down to a singleton | ||
if err := newClusterOperatorsNotAvailable(errs); err != nil { | ||
return err | ||
} | ||
return newMultipleError(errs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the only newMultipleErrors
consumer. It feels strange to have some preprocessing locally; can we add this new logic inside newMultipleErrors
(and drop the local collapse, since newMultipleErrors
already does that internally)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s three different flows - I expect there to be more in the future. MultipleErrors is for when we don’t know anything
Reapplying label (here's where we are after fixing a couple of the ugly):
|
/retest Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
/retest Please review the full test history for this PR and help us cut down flakes. |
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error.
newClusterOperatorsNotAvailable is from c2ac20f (status: Report the operators that have not yet deployed, 2019-04-09, openshift#158). And the not-available filtering is from bdd4545 (status: Hide generic operator status in favor of more specific errors, 2019-05-19, openshift#192). But in ce1eda1 (pkg/cvo/internal/operatorstatus: Change nested message, 2021-02-04, openshift#514), we moved "waiting on status.versions" from ClusterOperatorNotAvailable to ClusterOperatorUpdating. And we want to avoid uncollapsed errors like: Multiple errors are preventing progress: * Cluster operator machine-api is updating versions * Cluster operator openshift-apiserver is updating versions where we are waiting on multiple ClusterOperator which are in similar situations. This commit drops the filtering, because cluster operators are important. It does sort those errors to the end of the list though, so the first error is the non-ClusterOperator error. TestCVO_ParallelError no longer tests the consolidated error message, because the consolidation is now restricted to ClusterOperator resources. I tried moving the pkg/cvo/testdata/paralleltest/release-manifests manifests to ClusterOperator, but then the test struggled with: I0802 16:04:18.133935 2005 sync_worker.go:945] Unable to precreate resource clusteroperator so now TestCVO_ParallelError is excercising the fact that non-ClusterOperator failures are not aggregated.
When upgrading and a sync loop exits, we often see:
Since the not available messages carry little value and are expected
when we are processing parallel output, if we have one or more errors
during sync that are not the generic error, only report those. This
focuses the user's attention on the actual problem, rather than
distracting them with noise. In the ideal case, the user immediately
sees "router is totally broken" vs "a, b, c, d, e, router is broken, f".
E.g. https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/1724/
The "still updating" errors are just noise.
/cherrypick release-4.1