Error reporting in a collection script cause "oc adm must-gather" to fail #85

djfjeff · 2019-04-26T18:13:14Z

I created a custom image with a new collection script called gather-cnv (https://github.com/djfjeff/must-gather/blob/gather-cnv-support/collection-scripts/gather-cnv).

Basically, for now, the new collection script only do : openshift-must-gather inspect ns/kubevirt

Manually running the inspect command above fetch the information but still produce some error output :

openshift-must-gather inspect ns/kubevirt

2019/04/26 14:02:40 Gathering data for ns/kubevirt...
2019/04/26 14:02:40     Collecting resources for namespace "kubevirt"...
2019/04/26 14:02:40     Gathering pod data for namespace "kubevirt"...
2019/04/26 14:02:40         Gathering data for pod "virt-api-f8d6cd97-5l68b"
2019/04/26 14:02:41         Unable to gather previous container logs: previous terminated container "virt-api" in pod "virt-api-f8d6cd97-5l68b" not found
2019/04/26 14:02:42         Skipping /version info gathering for pod "virt-api-f8d6cd97-5l68b". Endpoint not found...
2019/04/26 14:02:42         Gathering data for pod "virt-api-f8d6cd97-6h4kx"
2019/04/26 14:02:43         Unable to gather previous container logs: previous terminated container "virt-api" in pod "virt-api-f8d6cd97-6h4kx" not found
2019/04/26 14:02:43         Skipping /version info gathering for pod "virt-api-f8d6cd97-6h4kx". Endpoint not found...
2019/04/26 14:02:43         Gathering data for pod "virt-controller-865b95f6c6-gtplf"
2019/04/26 14:02:43         Unable to gather previous container logs: previous terminated container "virt-controller" in pod "virt-controller-865b95f6c6-gtplf" not found
2019/04/26 14:02:44         Gathering data for pod "virt-controller-865b95f6c6-vl6cc"
2019/04/26 14:02:44         Unable to gather previous container logs: previous terminated container "virt-controller" in pod "virt-controller-865b95f6c6-vl6cc" not found
2019/04/26 14:02:44         Gathering data for pod "virt-handler-5zd2z"
2019/04/26 14:02:44         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-5zd2z" not found
2019/04/26 14:02:44         Gathering data for pod "virt-handler-9q9cp"
2019/04/26 14:02:44         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-9q9cp" not found
2019/04/26 14:02:45         Gathering data for pod "virt-handler-m9q8r"
2019/04/26 14:02:45         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-m9q8r" not found
2019/04/26 14:02:45         Gathering data for pod "virt-handler-t6kz2"
2019/04/26 14:02:45         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-t6kz2" not found
2019/04/26 14:02:45         Gathering data for pod "virt-operator-76b568d986-6x9rc"
2019/04/26 14:02:45         Unable to gather previous container logs: previous terminated container "virt-operator" in pod "virt-operator-76b568d986-6x9rc" not found
Error: one or more errors ocurred while gathering pod-specific data for namespace: kubevirt

    [one or more errors ocurred while gathering container data for pod virt-api-f8d6cd97-5l68b:

    unable to gather container /healthz: unable to find any available /healthz paths hosted in pod "virt-api-f8d6cd97-5l68b", one or more errors ocurred while gathering container data for pod virt-api-f8d6cd97-6h4kx:

    unable to gather container /healthz: unable to find any available /healthz paths hosted in pod "virt-api-f8d6cd97-6h4kx", one or more errors ocurred while gathering container data for pod virt-controller-865b95f6c6-gtplf:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-controller-865b95f6c6-vl6cc:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-5zd2z:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-9q9cp:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-m9q8r:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-t6kz2:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-operator-76b568d986-6x9rc:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource]]

The issue I have is when running it with oc adm must-gather, it seems the error reporting break the gather init container and make the whole process to fail :

oc adm must-gather --image=quay.io/jsaucier/must-gather-cnv:latest -- gather-cnv

namespace/openshift-must-gather-h9m84 created
clusterrolebinding.rbac.authorization.k8s.io/must-gather-q8xwf created
clusterrolebinding.rbac.authorization.k8s.io/must-gather-q8xwf deleted
namespace/openshift-must-gather-h9m84 deleted
error: pod is not running: Failed

To confirm, I try changing the namespace I gather in gather-cnv from ns/kubevirt to ns/default (which does not produce any error reporting) and everything works fine.

The text was updated successfully, but these errors were encountered:

djfjeff · 2019-04-29T11:23:20Z

To add more information :

oc adm must-gather --keep --image=quay.io/jsaucier/must-gather-cnv:latest -- gather-cnv                                                                                       

namespace/openshift-must-gather-cr82g created
clusterrolebinding.rbac.authorization.k8s.io/must-gather-qf46z created
error: pod is not running: Failed

oc project openshift-must-gather-cr82g
oc logs -f must-gather-jrpll

Error from server (BadRequest): container "copy" in pod "must-gather-jrpll" is waiting to start: PodInitializing

oc logs -f must-gather-jrpll -c gather

2019/04/29 10:56:27 Gathering data for ns/kubevirt...
2019/04/29 10:56:27     Collecting resources for namespace "kubevirt"...
2019/04/29 10:56:27     Gathering pod data for namespace "kubevirt"...
2019/04/29 10:56:27         Gathering data for pod "virt-api-f8d6cd97-5l68b"
2019/04/29 10:56:28         Unable to gather previous container logs: previous terminated container "virt-api" in pod "virt-api-f8d6cd97-5l68b" not found
2019/04/29 10:56:28         Skipping /version info gathering for pod "virt-api-f8d6cd97-5l68b". Endpoint not found...
2019/04/29 10:56:28         Gathering data for pod "virt-api-f8d6cd97-6h4kx"
2019/04/29 10:56:29         Unable to gather previous container logs: previous terminated container "virt-api" in pod "virt-api-f8d6cd97-6h4kx" not found
2019/04/29 10:56:29         Skipping /version info gathering for pod "virt-api-f8d6cd97-6h4kx". Endpoint not found...
2019/04/29 10:56:29         Gathering data for pod "virt-controller-865b95f6c6-gtplf"
2019/04/29 10:56:30         Unable to gather previous container logs: previous terminated container "virt-controller" in pod "virt-controller-865b95f6c6-gtplf" not found
2019/04/29 10:56:30         Gathering data for pod "virt-controller-865b95f6c6-vl6cc"
2019/04/29 10:56:30         Unable to gather previous container logs: previous terminated container "virt-controller" in pod "virt-controller-865b95f6c6-vl6cc" not found
2019/04/29 10:56:30         Gathering data for pod "virt-handler-5zd2z"
2019/04/29 10:56:30         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-5zd2z" not found
2019/04/29 10:56:30         Gathering data for pod "virt-handler-9q9cp"
2019/04/29 10:56:31         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-9q9cp" not found
2019/04/29 10:56:31         Gathering data for pod "virt-handler-m9q8r"
2019/04/29 10:56:31         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-m9q8r" not found
2019/04/29 10:56:31         Gathering data for pod "virt-handler-t6kz2"
2019/04/29 10:56:31         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-t6kz2" not found
2019/04/29 10:56:31         Gathering data for pod "virt-operator-76b568d986-6x9rc"
2019/04/29 10:56:32         Unable to gather previous container logs: previous terminated container "virt-operator" in pod "virt-operator-76b568d986-6x9rc" not found
Error: one or more errors ocurred while gathering pod-specific data for namespace: kubevirt

    [one or more errors ocurred while gathering container data for pod virt-api-f8d6cd97-5l68b:

    unable to gather container /healthz: unable to find any available /healthz paths hosted in pod "virt-api-f8d6cd97-5l68b", one or more errors ocurred while gathering container data for pod virt-api-f8d6cd97-6h4kx:

    unable to gather container /healthz: unable to find any available /healthz paths hosted in pod "virt-api-f8d6cd97-6h4kx", one or more errors ocurred while gathering container data for pod virt-controller-865b95f6c6-gtplf:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-controller-865b95f6c6-vl6cc:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-5zd2z:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-9q9cp:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-m9q8r:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-t6kz2:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-operator-76b568d986-6x9rc:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource]]

sferich888 · 2019-04-29T13:21:34Z

Do you get the same thing with:

$ oc adm must-gather -- /usr/bin/openshift-must-gather inspect ns/kubevirt

djfjeff · 2019-04-29T13:23:29Z

No, same result :

oc adm must-gather -- /usr/bin/openshift-must-gather inspect ns/kubevirt

namespace/openshift-must-gather-68r65 created
clusterrolebinding.rbac.authorization.k8s.io/must-gather-vfc6l created
clusterrolebinding.rbac.authorization.k8s.io/must-gather-vfc6l deleted
namespace/openshift-must-gather-68r65 deleted
error: pod is not running: Failed

djfjeff · 2019-04-29T13:26:06Z

And just to show that the command works under normal circonstance :

oc adm must-gather -- /usr/bin/openshift-must-gather inspect ns/default

namespace/openshift-must-gather-8qwh4 created
clusterrolebinding.rbac.authorization.k8s.io/must-gather-8c9sp created
receiving incremental file list
created directory must-gather.local.1940580096616587914
./
namespaces/
namespaces/default/
namespaces/default/default.yaml
namespaces/default/apps.openshift.io/
namespaces/default/apps.openshift.io/deploymentconfigs.yaml
namespaces/default/apps/
namespaces/default/apps/daemonsets.yaml
namespaces/default/apps/deployments.yaml
namespaces/default/apps/replicasets.yaml
namespaces/default/apps/statefulsets.yaml
namespaces/default/autoscaling/
namespaces/default/autoscaling/horizontalpodautoscalers.yaml
namespaces/default/batch/
namespaces/default/batch/cronjobs.yaml
namespaces/default/batch/jobs.yaml
namespaces/default/build.openshift.io/
namespaces/default/build.openshift.io/buildconfigs.yaml
namespaces/default/build.openshift.io/builds.yaml
namespaces/default/core/
namespaces/default/core/configmaps.yaml
namespaces/default/core/events.yaml
namespaces/default/core/pods.yaml
namespaces/default/core/replicationcontrollers.yaml
namespaces/default/core/secrets.yaml
namespaces/default/core/services.yaml
namespaces/default/image.openshift.io/
namespaces/default/image.openshift.io/imagestreams.yaml
namespaces/default/route.openshift.io/
namespaces/default/route.openshift.io/routes.yaml

sent 436 bytes  received 94,291 bytes  63,151.33 bytes/sec
total size is 92,489  speedup is 0.98
clusterrolebinding.rbac.authorization.k8s.io/must-gather-8c9sp deleted
namespace/openshift-must-gather-8qwh4 deleted

sferich888 · 2019-04-29T13:44:57Z

Unfortunately, the pod's logs don't tell you much. This is what I get:

Error from server (BadRequest): container "copy" in pod "must-gather-x7vgf" is waiting to start: PodInitializing
...
Error: namespaces "kubevirt" not found
...
Error from server (NotFound): pods "must-gather-x7vgf" not found

I got this by running:

$ oc adm must-gather -- /usr/bin/openshift-must-gather inspect ns/kubevirt

In a different terminal run the following:

$ MG_NAMESPACE="FILL_ME_IN_FROM_OUTPUT_ABOVE"; while true; do oc logs $(oc get pods -n $MG_NAMESPACE -o name) -n $MG_NAMESPACE -f; done

@sanchezl we may want to improve the logging around: https://github.com/openshift/origin/blob/master/pkg/oc/cli/admin/mustgather/mustgather.go#L239-L241 to output the pods logs to the user (or at the very least save them to a file).

@djfjeff can you try setting the following at the top of your bash script:

set +e

djfjeff · 2019-04-29T13:55:07Z

@sferich888 Same result with set +e, it seems this is the default in bash so it did not change the error reporting.

sanchezl · 2019-04-29T17:22:35Z

If the script fails, the pod fails.

Don't use set -e
add an exit 0 at the end.

djfjeff · 2019-04-29T18:13:31Z

@sanchezl Adding the exit 0 fixed the issue with the script.

However, it would be great if oc adm must-gather still report back what it was able to gather instead of failing. For example, oc adm must-gather -- /usr/bin/openshift-must-gather inspect ns/kubevirt still fail and report nothing instead of sending back what it is able to gather.

sferich888 · 2019-04-29T19:18:52Z

@djfjeff @sanchezl and I discussed that today, and I think we want to make it so that all commands out of the 'gather' container returns 0. This keeps commands like the one you denote from failing to get some type of an archive.

For now, adding exit 0 as I am doing with: #88 is the solution we are taking to avoid issues.

openshift-bot · 2020-09-09T10:25:00Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2020-10-09T12:16:24Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

openshift-bot · 2020-11-08T14:03:00Z

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

openshift-ci-robot · 2020-11-08T14:03:17Z

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sferich888 mentioned this issue Apr 29, 2019

Add Istio #87

Closed

openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 9, 2020

openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 9, 2020

openshift-ci-robot closed this as completed Nov 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error reporting in a collection script cause "oc adm must-gather" to fail #85

Error reporting in a collection script cause "oc adm must-gather" to fail #85

djfjeff commented Apr 26, 2019

djfjeff commented Apr 29, 2019

sferich888 commented Apr 29, 2019

djfjeff commented Apr 29, 2019

djfjeff commented Apr 29, 2019

sferich888 commented Apr 29, 2019

djfjeff commented Apr 29, 2019

sanchezl commented Apr 29, 2019

djfjeff commented Apr 29, 2019

sferich888 commented Apr 29, 2019

openshift-bot commented Sep 9, 2020

openshift-bot commented Oct 9, 2020

openshift-bot commented Nov 8, 2020

openshift-ci-robot commented Nov 8, 2020

Error reporting in a collection script cause "oc adm must-gather" to fail #85

Error reporting in a collection script cause "oc adm must-gather" to fail #85

Comments

djfjeff commented Apr 26, 2019

djfjeff commented Apr 29, 2019

sferich888 commented Apr 29, 2019

djfjeff commented Apr 29, 2019

djfjeff commented Apr 29, 2019

sferich888 commented Apr 29, 2019

djfjeff commented Apr 29, 2019

sanchezl commented Apr 29, 2019

djfjeff commented Apr 29, 2019

sferich888 commented Apr 29, 2019

openshift-bot commented Sep 9, 2020

openshift-bot commented Oct 9, 2020

openshift-bot commented Nov 8, 2020

openshift-ci-robot commented Nov 8, 2020