Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KATA-815 Document must-gather for developers #365

Merged
merged 5 commits into from
Jan 9, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
9 changes: 7 additions & 2 deletions must-gather/collection-scripts/gather_nodes
Original file line number Diff line number Diff line change
Expand Up @@ -113,14 +113,19 @@ wait

# Collect journal logs for specified units for all nodes
NODE_UNITS=(kubelet crio)
for NODE in $(oc get nodes --no-headers -o custom-columns=':metadata.name'); do
NODES=$(oc get nodes --no-headers -o custom-columns=':metadata.name')
tbuskey marked this conversation as resolved.
Show resolved Hide resolved
for NODE in $NODES; do
NODE_PATH=${NODES_PATH}/$NODE
mkdir -p "${NODE_PATH}"
for UNIT in "${NODE_UNITS[@]}"; do
oc adm node-logs "$NODE" -u "$UNIT" > "${NODE_PATH}/${NODE}_logs_$UNIT" &
done
# virtiofsd is in syslog and started by shim so -u won't work
tbuskey marked this conversation as resolved.
Show resolved Hide resolved
# Rust regression prevents oc adm node-logs "$NODE" -t virtiofsd # 20240104
oc adm node-logs "$NODE" -g 'virtiofsd\[[[:digit:]]+\]:' > "${NODE_PATH}/${NODE}_logs_virtiofsd" &

oc debug node/${NODE} -- sh -c "(chroot /host rpm -qa | egrep '(kata-containers|qemu|virtiofsd)' | sort)" > "${NODE_PATH}/version"
tbuskey marked this conversation as resolved.
Show resolved Hide resolved

oc debug node/${NODE} -- sh -c "(chroot /host rpm -qa | egrep '(kata-containers|qemu)' | sort)" > "${NODE_PATH}/version"
done

oc delete -f $DAEMONSET_MANIFEST
Expand Down
54 changes: 54 additions & 0 deletions must-gather/must-gather-requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# <center>Requirements for OpenShift Sandboxed Containers (OSC) must-gather</center>


### Usage
The kataconfig must have `logLevel: debug` set before running `must-gather`.

OSC `must-gather` should gather all OCS information and logs needed for debugging in a directory
```sh
oc adm must-gather --image=registry.redhat.io/openshift-sandboxed-containers/osc-must-gather-rhel9:latest
```
Data about other parts of the cluster is gathered with `oc adm must-gather`. Run `oc adm must-gather -h` to see more options.

### Openshift Sandboxed Containers
Kata runtime is the `containerd-shim-kata-v2` process that talks to the kata agent in the VM.
See also the [Official 1.5 documentation](https://access.redhat.com/documentation/en-us/openshift_sandboxed_containers/1.5/html-single/openshift_sandboxed_containers_user_guide/index#troubleshooting-sandboxed-containers)

#### Gathered Data
- Resource definitions
- Service logs
- All namespaces and child objects with OSC resources
- All OSC custom resource definitions (CRDs)
- sandboxed-containers/namespaces/openshift-sandboxed-containers-operator/**_*_**\_description
- versions in nodes/**_nodename_**/**_nodename_**/version
Copy link
Member

@gkurz gkurz Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per curiosity, what is the motivation to have nodename in italic and bold ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it varies. For my cluster today, the node names will be
tbuskey-240109-2-d9zsw-master-0
tbuskey-240109-2-d9zsw-master-1
tbuskey-240109-2-d9zsw-master-2
tbuskey-240109-2-d9zsw-worker-eastus1-82g44
tbuskey-240109-2-d9zsw-worker-eastus2-s5d6r
tbuskey-240109-2-d9zsw-worker-eastus3-6xdl9

- kata-containers
- qemu
- virtiofsd


#### Locations
- CRI-O logs - from the kata runtime
- nodes/**_nodename_**/**_nodename_**\_logs\_crio
- QEMU
- logs are part of the **CRI-O** logs as _subsystem=qemu_ , _subsystem=qmp_ and/or _qemuPID=**PID**_
tbuskey marked this conversation as resolved.
Show resolved Hide resolved
- virtiofsd
- nodes/**_nodename_**/**_nodename_**\_logs\_virtiofsd
- Audits
- audit_logs/**_nodename_**-audit.log.gz
- Logs
- sandboxed-containers/namespaces/openshift-sandboxed-containers-operator/controller-manager-**_*\_logs_**
tbuskey marked this conversation as resolved.
Show resolved Hide resolved
- sandboxed-containers/namespaces/openshift-sandboxed-containers-operator/install-**_*\_logs_**
- sandboxed-containers/namespaces/openshift-sandboxed-containers-operator/openshift-sandboxed-containers-monitor-**_*\_logs_**
- sandboxed-containers/namespaces/openshift-sandboxed-containers-operator/peerpodconfig-ctrl-caa-daemon-**_*\_logs_**
- sandboxed-containers/namespaces/openshift-sandboxed-containers-operator/peer-pods-webhook-**_*\_logs_**
- OSC CRDs
- sandboxed-containers/namespaces/openshift-sandboxed-containers-operator/**_*_**\_description
- sandboxed-containers/clusterserviceversion_description
- sandboxed-containers/kataconfig_description
- sandboxed-containers/services_description
- sandboxed-containers/subscription_description
- sandboxed-containers/validatingwebhookconfigurations_description
- apiservices/v1.kataconfiguration.openshift.io.yaml
- cluster-scoped-resources/apiextensions.k8s.io/customresourcedefinitions/kataconfigs.kataconfiguration.openshift.io.yaml