Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add initial sriov debug script #8

Merged
merged 2 commits into from Oct 21, 2020

Conversation

zshi-redhat
Copy link
Contributor

No description provided.

@zshi-redhat
Copy link
Contributor Author

/cc @pliurh


log_nodeinfo () {
# Outputs a list of nodes in the form "nodename IP"
oc get nodes --template '{{range .items}}{{$name := .metadata.name}}{{range .status.addresses}}{{if eq .type "InternalIP"}}{{$name}} {{.address}}{{"\n"}}{{end}}{{end}}{{end}}' > $logdir/meta/nodeinfo
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also collect the node labels? So we can know whether sriov pods have been running on the selected nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I need to think about how to save the node labels in a file so that they can be easily iterated or consumed.
Current nodeinfo file records one line for each node with individual info be separated by space. which can be looped against for any node checking later.
If we were to add the node label, it might be good to create a new file as one node has multiple labels.

done
done < $logdir/meta/nodeinfo
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shall also collect the net-att-def objs which generated by the operator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current script, I tried to collect the info that will be used directly to analyse the correctness of sriov configuration.
Right now, we don't have a check on the net-attach-def or network of sriov interface. thus I didn't add net-atach-def collection.
I think once we know what to check for net-attach-def, they can be collected.

len=$(oc get sriovnetworknodestate $node -n $SRIOV_NAMESPACE --template '{{len .spec.interfaces}}' 2>/dev/null)
# Outputs a list of node PF devices in the form "nodename interfacename pciaddress numvfs linktype"
for i in $(seq 0 $(($len-1))); do
oc get sriovnetworknodestate $node -n $SRIOV_NAMESPACE --template "{{.metadata.name}} {{(index .spec.interfaces $i).name}} {{(index .spec.interfaces $i).pciAddress}} {{(index .spec.interfaces $i).linkType}} {{(index .spec.interfaces $i).numVfs}}{{\"\n\"}}" >> $logdir/meta/state
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may also need to have the status of sriovnetworknodestate CRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status of sriovnetworknodestate is checked in do_operator, before checking the node do_node.
the info here is used only for comparing the configured state with actual interfaces on the node.


log_operatorconfig () {
# Output default operator config in the form "daemonNodeSelector enableInjector enableOperatorWebhook logLevel"
oc get sriovoperatorconfig default -n $SRIOV_NAMESPACE --template '{{.spec.enableInjector}} {{.spec.enableOperatorWebhook}} {{.spec.logLevel}}' > $logdir/meta/operatorconfig
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configDaemonNodeSelector is missing.

fi
done < $logdir/meta/state
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we shall also collect the kernel cmdline of nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@rcarrillocruz
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 21, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rcarrillocruz, zshi-redhat

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [rcarrillocruz,zshi-redhat]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 21, 2020
@openshift-bot
Copy link

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit cfa9af9 into openshift:master Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants