-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cgroup v2: Tracking of the allowed devices #7710
Comments
|
@iholder-redhat , fyi |
|
Hey @vasiliy-ul, thanks for raising this issue and thanks for letting me know about it. This implementation was actually a sort of a workaround since runc do not provide any mechanism for not overriding rules that were defined in the past. In fact, I've already opened an issue for runc about it. IMHO this shouldn't be a Kubevirt solution, but a runc one. I assume that many of runc's users that define cgroups v2 rules wouldn't want past rules to be deleted when defining a new one, therefore it's wrong for every user to reinvent the same wheel. The right solution is to implement this within runc imo. Regarding surviving a crash - I'm not sure this can be guaranteed from a fix merged to runc. So we might end up needing to also back the rules up in a file. Maybe runc can provide such functionality? not sure. WDYT? |
|
Yeah, I remember that you opened that issue while working on the implementation PR. Indeed it would be nice to have this fixed somewhere outside KubeVirt. However I think even with a workaround in place we need to ensure it is stable and robust. I havent yet tested that but consider the following scenario:
Anyway, what definitely needs to be fixed is that the stale data from non-existent VMs should be removed. Otherwise the map will grow in size consuming more and more memory. It will be sort of a leak (even though there is a reference to that map). |
|
Your concerns are valid and important. Regarding the scenario you've listed - it should be tested. I'm not entirely sure that we don't re-allow rules in a case of crash. Regarding the size of the map - I agree, we should prune data that is no longer relevant. |
|
/triage-accepted |
|
Just a small update: I tested the scenario mentioned in #7710 (comment). After killing However when exec'ing into the So this is similar to what was discussed in the cgroupv2 PR regarding |
Very interesting, thanks for sharing that. I think that eventually you're right, we would have to back this up in a file to avoid problems after crashes. I think that runc's part in all of this is to have a some kind of a |
|
Unfortunately a file is not a perefect solution either. There are issues that need to be handled like concurrent reads/writes, timely cleanup... also we cannot apply a device rule and write to a file atomically. Its good that we identified this problem but I would think more on how to properly solve it. At least this github issue should serve as a reminder and placeholder for ideas. |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
|
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with /lifecycle rotten |
|
/remove-lifecycle rotten |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale This is still very relevant, and unfortunately no good solution found yet. |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
|
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with /lifecycle rotten |
/remove-lifecycle stale |
What happened:
Cgroup v2 does not expose any information about the currently allowed devices for a container. The implementation in KubeVirt tracks this internally by using a global var:
kubevirt/pkg/virt-handler/cgroup/cgroup_v2_manager.go
Line 10 in 86f6b7d
This potentially leads to the following problems:
virt-handlerpod: the state will be lostWhat you expected to happen:
One solution might be to store the state in a file e.g.
/var/run/kubevirt-private/<vm-uuid>/devices.list. The file needs to be updated each time a device is added or removed. Also there is a need to remove the file when the corresponding VM is destroyed (or just to cleanup periodically). The file can follow the same data format asdevices.liston cgroup v1 hosts (thus the same code can be used to parse the current state for both v1 nad v2).The approach with the file however brings a problem of doing a transaction: i.e. applying the actual device rules and writing the state to the file atomically.
How to reproduce it (as minimally and precisely as possible):
The issue was spotted when looking through the code.
Additional context:
N/A
Environment:
virtctl version): N/Akubectl version): N/Auname -a): N/AThe text was updated successfully, but these errors were encountered: