-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected Kubernetes node feature-labeling as sgx-capable when SGX feature is turned off in BIOS and host has no /dev/sgx_* devices #638
Comments
@mythi any thoughts? In principle, this is not a bug of NFD itself, but merely in the rule configuration. Where is that config maintained? However, there might be something that would be good to add to NFD to make proper detection of SGX easier (possible without custom hooks or sidecar containers). |
@marquiz, I think the rule metioned above in nfd-worker.conf was added by "Intel Software Guard Extensions (SGX) device plugin for Kubernetes" (https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/deployments/sgx_nfd/nfd-worker.conf) so you are right, NFD itself is not responsible for this behavior. |
It looks I should been address this issue to SGX device plugin maintainers rather the NFD |
Yeah, I suggest to submit an issue there, too. But we can keep this one open until a solution is found. |
By the way I cant find how to detect the presence of /dev/sgx_* devices with attributes/methods available in this Doc. |
That's correct. It's not possible without using hooks or side-car containers (doing the dev node detection).
I wonder if there is some other reliable way of detecting this apart from looking at the devices directly. @mythi ?? |
You can enumerate the cpuid leaf 12h for SGX EPC sections and non-zero value means the BIOS has put aside memory for SGX. Our |
Hmm, sounds like we'd need to add more capabilities to the cpu source 🧐 |
|
I found some interesting behaviour on NFD side, after doing some tests from my side. I have a Server with SGX support by default(which means the BIOS is set up properly and the Kernel driver works properly). After I start NFD using the same way mentioned by @MustDie95. Everything works fine. Then I disabled the SGX in BIOS and reboot the system It looks like an issue to me. Lastly, I did the same thing on another Server (which runs the same HostOS with the server with SGX support). And SGX is never enabled before on this server. Looks like NFD works fine if the CPU flags are not changed. But if CPU flags are changed NFD doesn't change the label accordingly. |
That's because the kernel also checks the MSR registers for "BIOS enabled" and the cpuid package checks cpuid leafs only. |
Yeah, nfd's cpuid does not parse |
Fixed by #647 (not sure why Fixes did not work) /close |
@mythi: You can't close an active issue/PR unless you authored it or you are a collaborator. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Closing this now as fixed/implemented. @MustDie95 please report back if you still have concerns /close |
@marquiz: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
Kubernetes node has feature.node.kubernetes.io/custom-intel.sgx: 'true' even if SGX support was forcibly turned off in BIOS
What you expected to happen:
Kubernetes node is not feature-labeled with SGX or this label was removed when SGX was off in BIOS and there is no /dev/sgx_* devices on host (even if this feature supported by CPU and OS).
How to reproduce it (as minimally and precisely as possible):
Install server, setup Kubernetes and deploy Node Feature Discovery with 'kubectl apply -k https://github.com/kubernetes-sigs/node-feature-discovery/deployment/overlays/default?ref=v0.22.0'
All nodes are marked with feature.node.kubernetes.io/custom-intel.sgx: true' despite BIOS settings.
And Intel sgxdeviceplugin-sample constantly trying to start (because has nodeSelector based on this feature) on this node but failing.
Anything else we need to know?:
nfd-worker.conf :
sources:
custom:
matchOn:
cpuId: ["SGX", "SGXLC"]
I think we should add more rules to make sure Intel SGX is really functional on the host. For example, examine the presence of /dev/sgx_* devices
Environment:
The text was updated successfully, but these errors were encountered: