Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpu: plugin exit gracefully on host without particular HW device. #230

Closed
SidneyAn opened this issue Nov 4, 2019 · 1 comment
Closed

Comments

@SidneyAn
Copy link

SidneyAn commented Nov 4, 2019

intel-gpu-plugin will crash on host without Intel GPU (i915 driver) like following.
controller-1:~$ kubectl logs intel-gpu-plugin-8sqch -n kube-system
GPU device plugin started
Device scan failed: open /sys/class/drm: no such file or directory
Can't read sysfs folder
main.(*devicePlugin).scan
/go/src/github.com/intel/intel-device-plugins-for kubernetes/cmd/gpu_plugin/gpu_plugin.go:83
main.(*devicePlugin).Scan
/go/src/github.com/intel/intel-device-plugins-for-kubernetes/cmd/gpu_plugin/gpu_plugin.go:69
github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin.(*Manager).Run.func1
/go/src/github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin/manager.go:96
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1333

Shall we catch the exception and print some helpful message like "Failed to scan device. If this is a GPU node, you can check the prerequisites at: https://....." instead?

@grahamwhaley
Copy link

Looks like we should at least fail more gracefully.
OOI @SidneyAn - how did you deploy the plugin to the node(s) - that may make some difference as well to how it is handled, as if via a daemonset, I think if not handled carefully then the daemonset can get stuck in a retry loop.

@rojkov rojkov closed this as completed in 6537e38 Jan 30, 2020
askervin pushed a commit to askervin/intel-device-plugins-for-kubernetes that referenced this issue May 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants