New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds log when gpuManager.start() failed #44727
Conversation
Hi @x1957. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Possible dup of #44463 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM:)
@cmluciano I find your that PR doesn't include what this PR does. Since that PR is WIP, did you ever plan to include this? :) |
The linked PR does log here. It then retries the GPU initiate process in a go routine. |
/lgtm It's better to surface errors. I agree with @cmluciano. The issue might be that of parallel driver installation. |
@k8s-bot ok to test |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vishh, x1957
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
Thanks for the patch @x1957 |
@cmluciano nice work # |
@k8s-bot kops aws e2e test this |
@x1957 check your CLA status, please. |
@xiangpengzhao How to check CLA status? Thanks |
@fejta any thoughts? Do we have some manual way to check and label |
@emsearcy I believe this may be an after-effect of the CLAbot bug from a couple weeks ago. Could you please trigger a re-run? |
@dankohn our CLA tool doesn't touch the labels: it's only responsible for the cla/linuxfoundation status check result. There is another Kubernetes-specific bot that uses the cla/linuxfoundation status to set labels. |
@caniszczyk ??? |
@k8s-bot pull-kubernetes-federation-e2e-gce test this |
Added label manually. May have been a github glitch. The label automation is on our end. cc @kubernetes/test-infra-maintainers |
Automatic merge from submit-queue |
If gpuManager.start() returns error, there is no log.
We confused with scheduler do not schedule any pod(with gpu) to one node.
kubectl describe node xxx shows there is no gpu on that node, because the gpu driver do not work on that node, gpuManager.start() failed, but we can not see anything in log.