New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add initial sriov debug script #8
add initial sriov debug script #8
Conversation
/cc @pliurh |
|
||
log_nodeinfo () { | ||
# Outputs a list of nodes in the form "nodename IP" | ||
oc get nodes --template '{{range .items}}{{$name := .metadata.name}}{{range .status.addresses}}{{if eq .type "InternalIP"}}{{$name}} {{.address}}{{"\n"}}{{end}}{{end}}{{end}}' > $logdir/meta/nodeinfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also collect the node labels? So we can know whether sriov pods have been running on the selected nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I need to think about how to save the node labels in a file so that they can be easily iterated or consumed.
Current nodeinfo file records one line for each node with individual info be separated by space. which can be looped against for any node checking later.
If we were to add the node label, it might be good to create a new file as one node has multiple labels.
done | ||
done < $logdir/meta/nodeinfo | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shall also collect the net-att-def objs which generated by the operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current script, I tried to collect the info that will be used directly to analyse the correctness of sriov configuration.
Right now, we don't have a check on the net-attach-def or network of sriov interface. thus I didn't add net-atach-def collection.
I think once we know what to check for net-attach-def, they can be collected.
len=$(oc get sriovnetworknodestate $node -n $SRIOV_NAMESPACE --template '{{len .spec.interfaces}}' 2>/dev/null) | ||
# Outputs a list of node PF devices in the form "nodename interfacename pciaddress numvfs linktype" | ||
for i in $(seq 0 $(($len-1))); do | ||
oc get sriovnetworknodestate $node -n $SRIOV_NAMESPACE --template "{{.metadata.name}} {{(index .spec.interfaces $i).name}} {{(index .spec.interfaces $i).pciAddress}} {{(index .spec.interfaces $i).linkType}} {{(index .spec.interfaces $i).numVfs}}{{\"\n\"}}" >> $logdir/meta/state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may also need to have the status
of sriovnetworknodestate CRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The status of sriovnetworknodestate is checked in do_operator
, before checking the node do_node
.
the info here is used only for comparing the configured state with actual interfaces on the node.
|
||
log_operatorconfig () { | ||
# Output default operator config in the form "daemonNodeSelector enableInjector enableOperatorWebhook logLevel" | ||
oc get sriovoperatorconfig default -n $SRIOV_NAMESPACE --template '{{.spec.enableInjector}} {{.spec.enableOperatorWebhook}} {{.spec.logLevel}}' > $logdir/meta/operatorconfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The configDaemonNodeSelector
is missing.
fi | ||
done < $logdir/meta/state | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we shall also collect the kernel cmdline of nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: rcarrillocruz, zshi-redhat The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Please review the full test history for this PR and help us cut down flakes. |
No description provided.