Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latest images of must-gather is not collecting OLM data #237

Closed
mtulio opened this issue Jun 11, 2021 · 9 comments
Closed

latest images of must-gather is not collecting OLM data #237

mtulio opened this issue Jun 11, 2021 · 9 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@mtulio
Copy link
Contributor

mtulio commented Jun 11, 2021

I believe that gather_olm script is not collecting data in 4.6 images (I've tested only in this Y).

I expect that the script gather_olm collects OLM operator objects (sub, installPlans, catalogSources, CSVs) and save inside each place, we can see it when the command bellow is executed directly:

oc adm inspect --dest-dir=must-gather-olm -A olm

Steps to reproduce

I wrote 3 scenarios running the most recent image for Y 4.6:
A) script[0]+[1]: default must-gather - should run all collection-scripts/gather_* (including gather_olm)
B) script[0]+[2]: must-gather calling the script directly: -- /usr/bin/gather_olm
C) script[0]+[3]: collecting the same command define on gather_olm

So we can see that A and B are not collecting OLM objects, and when running the command directly on C the objects can be collected.

[0] Get OCP version

$declare -r OCP_VERSION=$(oc get clusterversion version --template='{{.status.desired.version}}' |sed -n -r 's/([[:digit:]]\.[[:digit:]]).*/\1/p')

[1] default must-gather

$ echo ${OCP_VERSION}
4.6

$ oc adm must-gather \
    --dest-dir=must-gather-general \
    --image=quay.io/openshift/origin-must-gather:"${OCP_VERSION}"

[must-gather      ] OUT Using must-gather plugin-in image: quay.io/openshift/origin-must-gather:${OCP_VERSION}
[...]

$ tree -a must-gather-general/*cluster-scoped-resources/operators.coreos.com/operators
must-gather-general/*cluster-scoped-resources/operators.coreos.com/operators [error opening dir]

0 directories, 0 files

$  tree -a must-gather-general/*cluster-scoped-resources/operators.coreos.com/operators
must-gather-general/*cluster-scoped-resources/operators.coreos.com/operators [error opening dir]

0 directories, 0 files

$ tree -a must-gather-general/*namespaces/openshift-marketplace/operators.coreos.com/catalogsources
must-gather-general/*namespaces/openshift-marketplace/operators.coreos.com/catalogsources [error opening dir]

0 directories, 0 files

[2] must-gather calling the script directly: -- /usr/bin/gather_olm

oc adm must-gather \
    --dest-dir=must-gather-general-olm \
    --image=quay.io/openshift/origin-must-gather:"${OCP_VERSION}" -- /usr/bin/gather_olm

$ tree -a must-gather-general-olm/*cluster-scoped-resources/operators.coreos.com/operators
must-gather-general-olm/*cluster-scoped-resources/operators.coreos.com/operators [error opening dir]

[3] running the same command on gather_olm

$ oc adm inspect --dest-dir=must-gather-olm -A olm
Wrote inspect data to must-gather-olm.

$ tree -a must-gather-olm/*cluster-scoped-resources/operators.coreos.com/operators
must-gather-olm/cluster-scoped-resources/operators.coreos.com/operators
├── 3scale-community-operator.3scale.yaml
├── 3scale-operator.3scale.yaml
├── cluster-logging.openshift-logging.yaml
├── compliance-operator.openshift-compliance.yaml
├── elasticsearch-operator.openshift-operators-redhat.yaml
└── web-terminal.openshift-operators.yaml

$ tree -a must-gather-olm/*namespaces/openshift-marketplace/operators.coreos.com/catalogsources
must-gather-olm/namespaces/openshift-marketplace/operators.coreos.com/catalogsources
├── certified-operators.yaml
├── community-operators.yaml
├── redhat-marketplace.yaml
└── redhat-operators.yaml

Questions

  1. what is the best way to use most recent must-gathers in different Y stream?
  2. Why OLM is not being collected running latest must-gather images?

I have already had the same issue described on #230 in my local development, and the olm script gather_olm is using the option --dest-dir and raising errors due to old cli.

@mtulio
Copy link
Contributor Author

mtulio commented Jun 15, 2021

The client seems not to follow the cluster Y version, but seems not to be related with arg errors as --dest-dir is available on 4.2:

$ podman run --rm -it quay.io/openshift/origin-must-gather:4.6 /bin/bash -c 'oc version'
Client Version: v4.2.0-alpha.0-859-gaaa9ca3

$ podman run --rm -it quay.io/openshift/origin-must-gather:4.6 /bin/bash -c 'oc adm inspect -h' |grep dest-dir
      --dest-dir='': Root directory used for storing all gathered cluster operator data. Defaults to

@mtulio
Copy link
Contributor Author

mtulio commented Jun 15, 2021

Same issue on OCP 4.7

[1] default must-gather

$ declare -r OCP_VERSION=$(oc get clusterversion version --template='{{.status.desired.version}}' |sed -n -r 's/([[:digit:]]\.[[:digit:]]).*/\1/p')

$ echo ${OCP_VERSION}
4.7

$ oc adm must-gather     --dest-dir=must-gather-general     --image=quay.io/openshift/origin-must-gather:"${OCP_VERSION}"

$ tree -a must-gather-general/*cluster-scoped-resources/operators.coreos.com/operators
must-gather-general/*cluster-scoped-resources/operators.coreos.com/operators [error opening dir]

0 directories, 0 files

[quicklab@upi-0 mrb]$ tree -a must-gather-general/*cluster-scoped-resources/operators.coreos.com/operators
must-gather-general/*cluster-scoped-resources/operators.coreos.com/operators [error opening dir]

0 directories, 0 files
[quicklab@upi-0 mrb]$ tree -a must-gather-general/*namespaces/openshift-marketplace/operators.coreos.com/catalogsources
must-gather-general/*namespaces/openshift-marketplace/operators.coreos.com/catalogsources [error opening dir]

0 directories, 0 files

[2]

$ oc adm must-gather     --dest-dir=must-gather-general-olm     --image=quay.io/openshift/origin-must-gather:"${OCP_VERSION}" -- /usr/bin/gather_olm
$ tree -a must-gather-general-olm/*cluster-scoped-resources/operators.coreos.com/operators
must-gather-general-olm/*cluster-scoped-resources/operators.coreos.com/operators [error opening dir]

0 directories, 0 files

[3]

$ oc adm inspect --dest-dir=must-gather-olm -A olm
Wrote inspect data to must-gather-olm.
$ tree -a must-gather-olm/*cluster-scoped-resources/operators.coreos.com/operators
must-gather-olm/cluster-scoped-resources/operators.coreos.com/operators
└── metering-ocp.openshift-metering.yaml

0 directories, 1 file

$ tree -a must-gather-olm/*namespaces/openshift-marketplace/operators.coreos.com/catalogsources
must-gather-olm/namespaces/openshift-marketplace/operators.coreos.com/catalogsources
├── certified-operators.yaml
├── community-operators.yaml
├── redhat-marketplace.yaml
└── redhat-operators.yaml

0 directories, 4 files

$ ls 
must-gather-general  must-gather-general-olm  must-gather-olm

@mtulio
Copy link
Contributor Author

mtulio commented Jun 16, 2021

Why recent scripts is not ported to current supported OCP images. Eg gather_olm is not included on tag 4.6 image?

$ podman run --rm -v ${HOME}/.kube/config:/root/.kube/config:z -it quay.io/openshift/origin-must-gather:4.6 /bin/bash -c "ls -sl /usr/bin/gather*"
 4 -rwxr-xr-x. 1 root root 1397 Mar 11 15:33 /usr/bin/gather
 4 -rwxr-xr-x. 1 root root 1205 Mar 11 15:33 /usr/bin/gather_audit_logs
 4 -rwxr-xr-x. 1 root root 1814 Mar 11 15:33 /usr/bin/gather_core_dumps
 4 -rwxr-xr-x. 1 root root 1935 Mar 11 15:33 /usr/bin/gather_etcd
12 -rwxr-xr-x. 1 root root 9557 Mar 11 15:33 /usr/bin/gather_network_logs
 8 -rwxr-xr-x. 1 root root 8030 Mar 11 15:33 /usr/bin/gather_network_ovn_trace
 4 -rwxr-xr-x. 1 root root 1871 Mar 11 15:33 /usr/bin/gather_service_logs
 4 -rwxr-xr-x. 1 root root 1252 Mar 11 15:33 /usr/bin/gather_windows_node_logs

$ podman run --rm -v ${HOME}/.kube/config:/root/.kube/config:z -it quay.io/openshift/origin-must-gather:4.7 /bin/bash -c "ls -sl /usr/bin/gather*"

 4 -rwxr-xr-x. 1 root root  1999 Jan 25 18:45 /usr/bin/gather
 4 -rwxr-xr-x. 1 root root   328 Jan 25 18:45 /usr/bin/gather_admission_webhooks
 4 -rwxr-xr-x. 1 root root  1205 Jan 25 18:45 /usr/bin/gather_audit_logs
 4 -rwxr-xr-x. 1 root root  1812 Jan 25 18:45 /usr/bin/gather_core_dumps
 4 -rwxr-xr-x. 1 root root  1935 Jan 25 18:45 /usr/bin/gather_etcd
 4 -rwxr-xr-x. 1 root root  1075 Jan 25 18:45 /usr/bin/gather_haproxy_config
16 -rwxr-xr-x. 1 root root 13760 Jan 25 18:45 /usr/bin/gather_network_logs
 8 -rwxr-xr-x. 1 root root  8030 Jan 25 18:45 /usr/bin/gather_network_ovn_trace
 4 -rwxr-xr-x. 1 root root    66 Jan 25 18:45 /usr/bin/gather_olm
 4 -rwxr-xr-x. 1 root root  1539 Jan 25 18:45 /usr/bin/gather_service_logs
 4 -rw-r--r--. 1 root root  1424 Jan 25 18:45 /usr/bin/gather_service_logs_util
 4 -rwxr-xr-x. 1 root root  1252 Jan 25 18:45 /usr/bin/gather_windows_node_logs

$ podman run --rm -v ${HOME}/.kube/config:/root/.kube/config:z -it quay.io/openshift/origin-must-gather:4.8 /bin/bash -c "ls -sl /usr/bin/gather*"

 4 -rwxr-xr-x. 1 root root  2227 Jun 10 15:04 /usr/bin/gather
 4 -rwxr-xr-x. 1 root root   328 Jun 10 15:04 /usr/bin/gather_admission_webhooks
 4 -rwxr-xr-x. 1 root root  1232 Jun 10 15:04 /usr/bin/gather_audit_logs
 4 -rwxr-xr-x. 1 root root  1812 Jun 10 15:04 /usr/bin/gather_core_dumps
 4 -rwxr-xr-x. 1 root root  1935 Jun 10 15:04 /usr/bin/gather_etcd
 4 -rwxr-xr-x. 1 root root  1075 Jun 10 15:04 /usr/bin/gather_haproxy_config
 4 -rwxr-xr-x. 1 root root  1063 Jun 10 15:04 /usr/bin/gather_monitoring
16 -rwxr-xr-x. 1 root root 15650 Jun 10 15:04 /usr/bin/gather_network_logs
 8 -rwxr-xr-x. 1 root root  8030 Jun 10 15:04 /usr/bin/gather_network_ovn_trace
 4 -rwxr-xr-x. 1 root root    66 Jun 10 15:04 /usr/bin/gather_olm
 4 -rwxr-xr-x. 1 root root  3363 Jun 10 15:04 /usr/bin/gather_priority_and_fairness
 4 -rwxr-xr-x. 1 root root  1628 Jun 10 15:04 /usr/bin/gather_service_logs
 4 -rw-r--r--. 1 root root  1424 Jun 10 15:04 /usr/bin/gather_service_logs_util
 4 -rwxr-xr-x. 1 root root  1252 Jun 10 15:04 /usr/bin/gather_windows_node_logs

@sferich888
Copy link
Contributor

Correct; https://github.com/openshift/must-gather/tree/release-4.6/collection-scripts doesn't have the script so its not collecting the proper information. We would need to backport #182 to 4.6 for this to work.

I created: #242 to start this packport process.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 19, 2021
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 19, 2021
@sferich888
Copy link
Contributor

I have to resolve #242 (comment) to merge and as a result resolve this issue.

@sferich888
Copy link
Contributor

The backport to fix this is in flight and will be resolved with #242

/close

@openshift-ci openshift-ci bot closed this as completed Oct 29, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 29, 2021

@sferich888: Closing this issue.

In response to this:

The backport to fix this is in flight and will be resolved with #242

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

3 participants