-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
must-gather: add more info for ceph crash #818
Conversation
@@ -159,4 +162,15 @@ for ns in $namespaces; do | |||
{ timeout 120 oc debug nodes/"${node}" -- bash -c "test -f /host/var/lib/rook/log/${ns}/ceph-volume.log && cat /host/var/lib/rook/log/${ns}/ceph-volume.log" > "${NODE_OUTPUT_DIR}"/ceph-volume.log; } >> "${BASE_COLLECTION_PATH}"/gather-debug.log 2>&1 | |||
done | |||
oc delete -f pod_helper.yaml | |||
|
|||
# Collecting ceph prepare volume logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment needs to be changed to capturing crash dumps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -159,4 +162,15 @@ for ns in $namespaces; do | |||
{ timeout 120 oc debug nodes/"${node}" -- bash -c "test -f /host/var/lib/rook/log/${ns}/ceph-volume.log && cat /host/var/lib/rook/log/${ns}/ceph-volume.log" > "${NODE_OUTPUT_DIR}"/ceph-volume.log; } >> "${BASE_COLLECTION_PATH}"/gather-debug.log 2>&1 | |||
done | |||
oc delete -f pod_helper.yaml | |||
|
|||
# Collecting ceph crash dump | |||
for node in $(oc get nodes -l cluster.ocs.openshift.io/openshift-storage='' --no-headers | grep -w 'Ready' | awk '{print $1}'); do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for node in $(oc get nodes -l cluster.ocs.openshift.io/openshift-storage='' --no-headers | grep -w 'Ready' | awk '{print $1}'); do | |
for node in $(oc get nodes -l cluster.ocs.openshift.io/openshift-storage='' --no-headers | awk '/Ready/ {print $1}'); do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
printf "collecting crash logs from node %s \n" "${node}" | tee -a "${BASE_COLLECTION_PATH}"/gather-debug.log | ||
CRASH_OUTPUT_DIR=${CEPH_COLLECTION_PATH}/crash_${node} | ||
mkdir -p "${CRASH_OUTPUT_DIR}" | ||
oc debug nodes/"${node}" --to-namespace="${ns}" -- bash -c "sleep 5m" & >> "${BASE_COLLECTION_PATH}"/gather-debug.log 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oc debug nodes/"${node}" --to-namespace="${ns}" -- bash -c "sleep 5m" & >> "${BASE_COLLECTION_PATH}"/gather-debug.log 2>&1 | |
oc debug nodes/"${node}" --to-namespace="${ns}" -- bash -c "sleep 5m" & >> "${BASE_COLLECTION_PATH}"/gather-debug.log 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
mkdir -p "${CRASH_OUTPUT_DIR}" | ||
oc debug nodes/"${node}" --to-namespace="${ns}" -- bash -c "sleep 5m" & >> "${BASE_COLLECTION_PATH}"/gather-debug.log 2>&1 | ||
sleep 60 | ||
oc rsync -n "${ns}" `oc get pods -n "${ns}"| grep debug| awk '{print $1}'`:/host/var/lib/rook/openshift-storage/crash/ "${CRASH_OUTPUT_DIR}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oc rsync -n "${ns}" `oc get pods -n "${ns}"| grep debug| awk '{print $1}'`:/host/var/lib/rook/openshift-storage/crash/ "${CRASH_OUTPUT_DIR}" | |
oc rsync -n "${ns}" $(oc get pods -n "${ns}"|awk '/debug/ {print $1}'):/host/var/lib/rook/openshift-storage/crash/ "${CRASH_OUTPUT_DIR}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -136,6 +136,9 @@ for ns in $namespaces; do | |||
for i in $(timeout 120 oc -n "${ns}" exec "${HOSTNAME}"-helper -- bash -c "ceph osd lspools --connect-timeout=15"|awk '{print $2}'); do | |||
{ timeout 120 oc -n "${ns}" exec "${HOSTNAME}"-helper -- bash -c "rbd ls -p $i" >> "${COMMAND_OUTPUT_DIR}/pools_rbd_$i"; } >> "${BASE_COLLECTION_PATH}"/gather-debug.log 2>&1; | |||
done | |||
for i in $(timeout 120 oc -n "${ns}" exec "${HOSTNAME}"-helper -- bash -c "ceph crash ls --connect-timeout=15"|awk '{print $1}'); do | |||
{ timeout 120 oc -n "${ns}" exec "${HOSTNAME}"-helper -- bash -c "ceph crash info $i" >> "${COMMAND_OUTPUT_DIR}/pools_rbd_$i"; } >> "${BASE_COLLECTION_PATH}"/gather-debug.log 2>&1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't you need --connect-timeout=15
here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
mkdir -p "${CRASH_OUTPUT_DIR}" | ||
oc debug nodes/"${node}" --to-namespace="${ns}" -- bash -c "sleep 5m" & >> "${BASE_COLLECTION_PATH}"/gather-debug.log 2>&1 | ||
sleep 60 | ||
oc rsync -n "${ns}" `oc get pods -n "${ns}"| awk '/debug/{print $1}'`:/host/var/lib/rook/openshift-storage/crash/ "${CRASH_OUTPUT_DIR}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backticks are deprecated use $()
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/retest Please review the full test history for this PR and help us cut down flakes. |
@@ -136,6 +136,7 @@ for ns in $namespaces; do | |||
for i in $(timeout 120 oc -n "${ns}" exec "${HOSTNAME}"-helper -- bash -c "ceph osd lspools --connect-timeout=15"|awk '{print $2}'); do | |||
{ timeout 120 oc -n "${ns}" exec "${HOSTNAME}"-helper -- bash -c "rbd ls -p $i" >> "${COMMAND_OUTPUT_DIR}/pools_rbd_$i"; } >> "${BASE_COLLECTION_PATH}"/gather-debug.log 2>&1; | |||
done | |||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
061008f
to
53dc021
Compare
add command to check for ceph crash info for every crash and collect core dump of rook crash. Signed-off-by: crombus <pkundra@redhat.com>
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: agarwal-mudit, jarrpa The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
@crombus: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/cherry-pick release-4.6 |
@crombus: new pull request created: #830 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
add command to check for ceph crash info
for every crash and collect core dump of
rook crash.
Signed-off-by: crombus pkundra@redhat.com