You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
intel-gpu-plugin will crash on host without Intel GPU (i915 driver) like following. controller-1:~$ kubectl logs intel-gpu-plugin-8sqch -n kube-system GPU device plugin started Device scan failed: open /sys/class/drm: no such file or directory Can't read sysfs folder main.(*devicePlugin).scan /go/src/github.com/intel/intel-device-plugins-for kubernetes/cmd/gpu_plugin/gpu_plugin.go:83 main.(*devicePlugin).Scan /go/src/github.com/intel/intel-device-plugins-for-kubernetes/cmd/gpu_plugin/gpu_plugin.go:69 github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin.(*Manager).Run.func1 /go/src/github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin/manager.go:96 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1333
Shall we catch the exception and print some helpful message like "Failed to scan device. If this is a GPU node, you can check the prerequisites at: https://....." instead?
The text was updated successfully, but these errors were encountered:
Looks like we should at least fail more gracefully.
OOI @SidneyAn - how did you deploy the plugin to the node(s) - that may make some difference as well to how it is handled, as if via a daemonset, I think if not handled carefully then the daemonset can get stuck in a retry loop.
intel-gpu-plugin will crash on host without Intel GPU (i915 driver) like following.
controller-1:~$ kubectl logs intel-gpu-plugin-8sqch -n kube-system
GPU device plugin started
Device scan failed: open /sys/class/drm: no such file or directory
Can't read sysfs folder
main.(*devicePlugin).scan
/go/src/github.com/intel/intel-device-plugins-for kubernetes/cmd/gpu_plugin/gpu_plugin.go:83
main.(*devicePlugin).Scan
/go/src/github.com/intel/intel-device-plugins-for-kubernetes/cmd/gpu_plugin/gpu_plugin.go:69
github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin.(*Manager).Run.func1
/go/src/github.com/intel/intel-device-plugins-for-kubernetes/pkg/deviceplugin/manager.go:96
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1333
Shall we catch the exception and print some helpful message like "Failed to scan device. If this is a GPU node, you can check the prerequisites at: https://....." instead?
The text was updated successfully, but these errors were encountered: