-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using in clusters which contains both GPU nodes and non-GPU nodes #9
Comments
Sure, it's a problem. But looks like you also installed |
@RenaudWasTaken when deploying to a node with no GPU, we should wait indefinitely, right? We already have a case for this, but it assumes NVML is present and working: |
The way I expected device plugins to work is to stop when they detect that no devices are available on the node. |
Fixed by 9b54e91 |
@RenaudWasTaken “A Pod Template in a DaemonSet must have a RestartPolicy equal to Always, or be unspecified, which defaults to Always.” https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/#pod-template As for now, a node without GPU will be in a crash loop. |
Looks like we'll have to update the k8s docs.
I just pushed a fix for that :) |
@RenaudWasTaken I‘m getting |
@idealhack thanks for noticing this mistake. |
So crashing is actually the right behavior for GPU plugin on non-GPU node, right? Speaking of docs, how about we add a note about using taints to handle this kind of clusters in README, like https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/#example-use-cases |
When using daemon sets in this kind of cluster, non-GPU nodes will complains
It's straightforward to use taints (which could be documented), but how about also done it in this plugin (i.e. better error handling)?
The text was updated successfully, but these errors were encountered: