Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error reporting in a collection script cause "oc adm must-gather" to fail #85

Closed
djfjeff opened this issue Apr 26, 2019 · 13 comments
Closed
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@djfjeff
Copy link

djfjeff commented Apr 26, 2019

I created a custom image with a new collection script called gather-cnv (https://github.com/djfjeff/must-gather/blob/gather-cnv-support/collection-scripts/gather-cnv).

Basically, for now, the new collection script only do : openshift-must-gather inspect ns/kubevirt

Manually running the inspect command above fetch the information but still produce some error output :

openshift-must-gather inspect ns/kubevirt

2019/04/26 14:02:40 Gathering data for ns/kubevirt...
2019/04/26 14:02:40     Collecting resources for namespace "kubevirt"...
2019/04/26 14:02:40     Gathering pod data for namespace "kubevirt"...
2019/04/26 14:02:40         Gathering data for pod "virt-api-f8d6cd97-5l68b"
2019/04/26 14:02:41         Unable to gather previous container logs: previous terminated container "virt-api" in pod "virt-api-f8d6cd97-5l68b" not found
2019/04/26 14:02:42         Skipping /version info gathering for pod "virt-api-f8d6cd97-5l68b". Endpoint not found...
2019/04/26 14:02:42         Gathering data for pod "virt-api-f8d6cd97-6h4kx"
2019/04/26 14:02:43         Unable to gather previous container logs: previous terminated container "virt-api" in pod "virt-api-f8d6cd97-6h4kx" not found
2019/04/26 14:02:43         Skipping /version info gathering for pod "virt-api-f8d6cd97-6h4kx". Endpoint not found...
2019/04/26 14:02:43         Gathering data for pod "virt-controller-865b95f6c6-gtplf"
2019/04/26 14:02:43         Unable to gather previous container logs: previous terminated container "virt-controller" in pod "virt-controller-865b95f6c6-gtplf" not found
2019/04/26 14:02:44         Gathering data for pod "virt-controller-865b95f6c6-vl6cc"
2019/04/26 14:02:44         Unable to gather previous container logs: previous terminated container "virt-controller" in pod "virt-controller-865b95f6c6-vl6cc" not found
2019/04/26 14:02:44         Gathering data for pod "virt-handler-5zd2z"
2019/04/26 14:02:44         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-5zd2z" not found
2019/04/26 14:02:44         Gathering data for pod "virt-handler-9q9cp"
2019/04/26 14:02:44         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-9q9cp" not found
2019/04/26 14:02:45         Gathering data for pod "virt-handler-m9q8r"
2019/04/26 14:02:45         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-m9q8r" not found
2019/04/26 14:02:45         Gathering data for pod "virt-handler-t6kz2"
2019/04/26 14:02:45         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-t6kz2" not found
2019/04/26 14:02:45         Gathering data for pod "virt-operator-76b568d986-6x9rc"
2019/04/26 14:02:45         Unable to gather previous container logs: previous terminated container "virt-operator" in pod "virt-operator-76b568d986-6x9rc" not found
Error: one or more errors ocurred while gathering pod-specific data for namespace: kubevirt

    [one or more errors ocurred while gathering container data for pod virt-api-f8d6cd97-5l68b:

    unable to gather container /healthz: unable to find any available /healthz paths hosted in pod "virt-api-f8d6cd97-5l68b", one or more errors ocurred while gathering container data for pod virt-api-f8d6cd97-6h4kx:

    unable to gather container /healthz: unable to find any available /healthz paths hosted in pod "virt-api-f8d6cd97-6h4kx", one or more errors ocurred while gathering container data for pod virt-controller-865b95f6c6-gtplf:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-controller-865b95f6c6-vl6cc:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-5zd2z:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-9q9cp:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-m9q8r:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-t6kz2:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-operator-76b568d986-6x9rc:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource]]

The issue I have is when running it with oc adm must-gather, it seems the error reporting break the gather init container and make the whole process to fail :

oc adm must-gather --image=quay.io/jsaucier/must-gather-cnv:latest -- gather-cnv

namespace/openshift-must-gather-h9m84 created
clusterrolebinding.rbac.authorization.k8s.io/must-gather-q8xwf created
clusterrolebinding.rbac.authorization.k8s.io/must-gather-q8xwf deleted
namespace/openshift-must-gather-h9m84 deleted
error: pod is not running: Failed

To confirm, I try changing the namespace I gather in gather-cnv from ns/kubevirt to ns/default (which does not produce any error reporting) and everything works fine.

@djfjeff
Copy link
Author

djfjeff commented Apr 29, 2019

To add more information :

oc adm must-gather --keep --image=quay.io/jsaucier/must-gather-cnv:latest -- gather-cnv                                                                                       

namespace/openshift-must-gather-cr82g created
clusterrolebinding.rbac.authorization.k8s.io/must-gather-qf46z created
error: pod is not running: Failed
oc project openshift-must-gather-cr82g
oc logs -f must-gather-jrpll

Error from server (BadRequest): container "copy" in pod "must-gather-jrpll" is waiting to start: PodInitializing
oc logs -f must-gather-jrpll -c gather

2019/04/29 10:56:27 Gathering data for ns/kubevirt...
2019/04/29 10:56:27     Collecting resources for namespace "kubevirt"...
2019/04/29 10:56:27     Gathering pod data for namespace "kubevirt"...
2019/04/29 10:56:27         Gathering data for pod "virt-api-f8d6cd97-5l68b"
2019/04/29 10:56:28         Unable to gather previous container logs: previous terminated container "virt-api" in pod "virt-api-f8d6cd97-5l68b" not found
2019/04/29 10:56:28         Skipping /version info gathering for pod "virt-api-f8d6cd97-5l68b". Endpoint not found...
2019/04/29 10:56:28         Gathering data for pod "virt-api-f8d6cd97-6h4kx"
2019/04/29 10:56:29         Unable to gather previous container logs: previous terminated container "virt-api" in pod "virt-api-f8d6cd97-6h4kx" not found
2019/04/29 10:56:29         Skipping /version info gathering for pod "virt-api-f8d6cd97-6h4kx". Endpoint not found...
2019/04/29 10:56:29         Gathering data for pod "virt-controller-865b95f6c6-gtplf"
2019/04/29 10:56:30         Unable to gather previous container logs: previous terminated container "virt-controller" in pod "virt-controller-865b95f6c6-gtplf" not found
2019/04/29 10:56:30         Gathering data for pod "virt-controller-865b95f6c6-vl6cc"
2019/04/29 10:56:30         Unable to gather previous container logs: previous terminated container "virt-controller" in pod "virt-controller-865b95f6c6-vl6cc" not found
2019/04/29 10:56:30         Gathering data for pod "virt-handler-5zd2z"
2019/04/29 10:56:30         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-5zd2z" not found
2019/04/29 10:56:30         Gathering data for pod "virt-handler-9q9cp"
2019/04/29 10:56:31         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-9q9cp" not found
2019/04/29 10:56:31         Gathering data for pod "virt-handler-m9q8r"
2019/04/29 10:56:31         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-m9q8r" not found
2019/04/29 10:56:31         Gathering data for pod "virt-handler-t6kz2"
2019/04/29 10:56:31         Unable to gather previous container logs: previous terminated container "virt-handler" in pod "virt-handler-t6kz2" not found
2019/04/29 10:56:31         Gathering data for pod "virt-operator-76b568d986-6x9rc"
2019/04/29 10:56:32         Unable to gather previous container logs: previous terminated container "virt-operator" in pod "virt-operator-76b568d986-6x9rc" not found
Error: one or more errors ocurred while gathering pod-specific data for namespace: kubevirt

    [one or more errors ocurred while gathering container data for pod virt-api-f8d6cd97-5l68b:

    unable to gather container /healthz: unable to find any available /healthz paths hosted in pod "virt-api-f8d6cd97-5l68b", one or more errors ocurred while gathering container data for pod virt-api-f8d6cd97-6h4kx:

    unable to gather container /healthz: unable to find any available /healthz paths hosted in pod "virt-api-f8d6cd97-6h4kx", one or more errors ocurred while gathering container data for pod virt-controller-865b95f6c6-gtplf:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-controller-865b95f6c6-vl6cc:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-5zd2z:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-9q9cp:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-m9q8r:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-handler-t6kz2:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource], one or more errors ocurred while gathering container data for pod virt-operator-76b568d986-6x9rc:

    [unable to gather container /healthz: the server could not find the requested resource, unable to gather container /version: the server could not find the requested resource]]

@sferich888
Copy link
Contributor

Do you get the same thing with:

$ oc adm must-gather -- /usr/bin/openshift-must-gather inspect ns/kubevirt

@djfjeff
Copy link
Author

djfjeff commented Apr 29, 2019

No, same result :

oc adm must-gather -- /usr/bin/openshift-must-gather inspect ns/kubevirt

namespace/openshift-must-gather-68r65 created
clusterrolebinding.rbac.authorization.k8s.io/must-gather-vfc6l created
clusterrolebinding.rbac.authorization.k8s.io/must-gather-vfc6l deleted
namespace/openshift-must-gather-68r65 deleted
error: pod is not running: Failed

@djfjeff
Copy link
Author

djfjeff commented Apr 29, 2019

And just to show that the command works under normal circonstance :

oc adm must-gather -- /usr/bin/openshift-must-gather inspect ns/default

namespace/openshift-must-gather-8qwh4 created
clusterrolebinding.rbac.authorization.k8s.io/must-gather-8c9sp created
receiving incremental file list
created directory must-gather.local.1940580096616587914
./
namespaces/
namespaces/default/
namespaces/default/default.yaml
namespaces/default/apps.openshift.io/
namespaces/default/apps.openshift.io/deploymentconfigs.yaml
namespaces/default/apps/
namespaces/default/apps/daemonsets.yaml
namespaces/default/apps/deployments.yaml
namespaces/default/apps/replicasets.yaml
namespaces/default/apps/statefulsets.yaml
namespaces/default/autoscaling/
namespaces/default/autoscaling/horizontalpodautoscalers.yaml
namespaces/default/batch/
namespaces/default/batch/cronjobs.yaml
namespaces/default/batch/jobs.yaml
namespaces/default/build.openshift.io/
namespaces/default/build.openshift.io/buildconfigs.yaml
namespaces/default/build.openshift.io/builds.yaml
namespaces/default/core/
namespaces/default/core/configmaps.yaml
namespaces/default/core/events.yaml
namespaces/default/core/pods.yaml
namespaces/default/core/replicationcontrollers.yaml
namespaces/default/core/secrets.yaml
namespaces/default/core/services.yaml
namespaces/default/image.openshift.io/
namespaces/default/image.openshift.io/imagestreams.yaml
namespaces/default/route.openshift.io/
namespaces/default/route.openshift.io/routes.yaml

sent 436 bytes  received 94,291 bytes  63,151.33 bytes/sec
total size is 92,489  speedup is 0.98
clusterrolebinding.rbac.authorization.k8s.io/must-gather-8c9sp deleted
namespace/openshift-must-gather-8qwh4 deleted

@sferich888
Copy link
Contributor

Unfortunately, the pod's logs don't tell you much. This is what I get:

Error from server (BadRequest): container "copy" in pod "must-gather-x7vgf" is waiting to start: PodInitializing
...
Error: namespaces "kubevirt" not found
...
Error from server (NotFound): pods "must-gather-x7vgf" not found

I got this by running:

$ oc adm must-gather -- /usr/bin/openshift-must-gather inspect ns/kubevirt

In a different terminal run the following:

$ MG_NAMESPACE="FILL_ME_IN_FROM_OUTPUT_ABOVE"; while true; do oc logs $(oc get pods -n $MG_NAMESPACE -o name) -n $MG_NAMESPACE -f; done

@sanchezl we may want to improve the logging around: https://github.com/openshift/origin/blob/master/pkg/oc/cli/admin/mustgather/mustgather.go#L239-L241 to output the pods logs to the user (or at the very least save them to a file).

@djfjeff can you try setting the following at the top of your bash script:

set +e

@djfjeff
Copy link
Author

djfjeff commented Apr 29, 2019

@sferich888 Same result with set +e, it seems this is the default in bash so it did not change the error reporting.

@sanchezl
Copy link
Contributor

If the script fails, the pod fails.

  • Don't use set -e
  • add an exit 0 at the end.

@djfjeff
Copy link
Author

djfjeff commented Apr 29, 2019

@sanchezl Adding the exit 0 fixed the issue with the script.

However, it would be great if oc adm must-gather still report back what it was able to gather instead of failing. For example, oc adm must-gather -- /usr/bin/openshift-must-gather inspect ns/kubevirt still fail and report nothing instead of sending back what it is able to gather.

@sferich888
Copy link
Contributor

@djfjeff @sanchezl and I discussed that today, and I think we want to make it so that all commands out of the 'gather' container returns 0. This keeps commands like the one you denote from failing to get some type of an archive.

For now, adding exit 0 as I am doing with: #88 is the solution we are taking to avoid issues.

@sferich888 sferich888 mentioned this issue Apr 29, 2019
@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 9, 2020
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 9, 2020
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci-robot
Copy link

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants