-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WRKLDS-593: pkg/cli/admin/mustgather: Include must-gather namespace in inspect fallback #1333
Conversation
@wking: This pull request references WRKLDS-593 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…llback In a CI run [1], the must-gather failed [2]: [must-gather ] OUT namespace/openshift-must-gather-9dzzn created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-9kdhs created [must-gather ] OUT pod for plug-in image registry.build01.ci.openshift.org/ci-op-4ryqvzt5/stable@sha256:c68fcf3161719f600a803627bca6d0ebf7894bf75282937aebe908a9ef54dedb created [must-gather-lxz2x] POD 2022-10-21T10:17:43.213934752Z Gathering data for ns/openshift-cluster-version... ... [must-gather-lxz2x] POD 2022-10-21T10:18:07.577813555Z Gathering data for ns/openshift-monitoring... [must-gather-lxz2x] OUT gather logs unavailable: read tcp 10.131.188.56:50182->3.12.171.103:6443: read: connection reset by peer [must-gather-lxz2x] OUT waiting for gather to complete [must-gather-lxz2x] OUT gather never finished: pods "must-gather-lxz2x" not found [must-gather ] OUT namespace/openshift-must-gather-9dzzn deleted [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-9kdhs deleted Gathering data for ns/openshift-config... And from the build log [3]: Running must-gather... Error running must-gather collection: gather never finished for pod must-gather-lxz2x: pods "must-gather-lxz2x" not found Falling back to `oc adm inspect clusteroperators.v1.config.openshift.io` to collect basic cluster information. This commit shifts the inspection fallback from 971dcc6 (inspect clusteroperators as a backup to must-gather if it fails, 2021-02-23, openshift#749) to come before that temporary namespace is deleted, and to include the temporary namespace, so we capture events and such to help debug why the pod failed to launch or if/when something else deleted it. I'm also adding ClusterVersion to the fallback inspection, because it's often useful to know what the cluster-version operator is up to. [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/836/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1583384716713660416 [2]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/836/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1583384716713660416/artifacts/e2e-aws-operator/gather-must-gather/artifacts/must-gather/must-gather.log [3]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/836/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1583384716713660416/artifacts/e2e-aws-operator/gather-must-gather/build-log.txt
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: wking The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
949031b
to
288b52d
Compare
func (o *MustGatherOptions) BackupGathering(ctx context.Context, errs []error) { | ||
func (o *MustGatherOptions) BackupGathering(ctx context.Context, namespace string, errs []error) { | ||
targets := []string{ | ||
"clusterversion.v1.config.openshift.io", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also adding ClusterVersion to the fallback inspection, because it's often useful to know what the cluster-version operator is up to. But I'm happy to spin that out into a separate pull request if folks would prefer distinct pivots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to do that in separate PR.
Thanks for this patch @wking. Has your change any impact on when |
@wking: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Carrying and adjusting #1273.
In a CI run, the must-gather failed:
And from the build log:
This commit shifts the inspection fallback from 971dcc6 (#749) to come before that temporary namespace is deleted, and to include the temporary namespace, so we capture events and such to help debug why the pod failed to launch or if/when something else deleted it.