Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup v2: Tracking of the allowed devices #7710

Closed
Tracked by #602
vasiliy-ul opened this issue May 9, 2022 · 20 comments · Fixed by #10689
Closed
Tracked by #602

cgroup v2: Tracking of the allowed devices #7710

vasiliy-ul opened this issue May 9, 2022 · 20 comments · Fixed by #10689
Labels

Comments

@vasiliy-ul
Copy link
Contributor

What happened:

Cgroup v2 does not expose any information about the currently allowed devices for a container. The implementation in KubeVirt tracks this internally by using a global var:

var rulesPerPid = make(map[string][]*devices.Rule)

This potentially leads to the following problems:

  • The var will not survive a crash or restart of the virt-handler pod: the state will be lost
  • The state is stored in a dynamic structure (in a map) and the stale data is not removed: the map always grows consuming more memory

What you expected to happen:

One solution might be to store the state in a file e.g. /var/run/kubevirt-private/<vm-uuid>/devices.list. The file needs to be updated each time a device is added or removed. Also there is a need to remove the file when the corresponding VM is destroyed (or just to cleanup periodically). The file can follow the same data format as devices.list on cgroup v1 hosts (thus the same code can be used to parse the current state for both v1 nad v2).

The approach with the file however brings a problem of doing a transaction: i.e. applying the actual device rules and writing the state to the file atomically.

How to reproduce it (as minimally and precisely as possible):

The issue was spotted when looking through the code.

Additional context:

N/A

Environment:

  • KubeVirt version (use virtctl version): N/A
  • Kubernetes version (use kubectl version): N/A
  • VM or VMI specifications: N/A
  • Cloud provider or hardware configuration: N/A
  • OS (e.g. from /etc/os-release): N/A
  • Kernel (e.g. uname -a): N/A
  • Install tools: N/A
  • Others: N/A
@vasiliy-ul
Copy link
Contributor Author

@iholder-redhat , fyi

@iholder101
Copy link
Contributor

Hey @vasiliy-ul, thanks for raising this issue and thanks for letting me know about it.

This implementation was actually a sort of a workaround since runc do not provide any mechanism for not overriding rules that were defined in the past.

In fact, I've already opened an issue for runc about it. IMHO this shouldn't be a Kubevirt solution, but a runc one. I assume that many of runc's users that define cgroups v2 rules wouldn't want past rules to be deleted when defining a new one, therefore it's wrong for every user to reinvent the same wheel. The right solution is to implement this within runc imo.

Regarding surviving a crash - I'm not sure this can be guaranteed from a fix merged to runc. So we might end up needing to also back the rules up in a file. Maybe runc can provide such functionality? not sure.

WDYT?

@vasiliy-ul
Copy link
Contributor Author

Yeah, I remember that you opened that issue while working on the implementation PR. Indeed it would be nice to have this fixed somewhere outside KubeVirt. However I think even with a workaround in place we need to ensure it is stable and robust. I havent yet tested that but consider the following scenario:

  1. We start a VM as usual
  2. Attach some hotplug volune (block storage; so this gets written to that global map)
  3. Then simulate the crash of virt-handler (e.g. by just deleting the pod)
  4. The virt-handler pod will be re-created but the global map will be 'freshly new' and will not contain the previously attached storage device
  5. Then if we try to attach one more storage volume, what happens with the prevouly attached one? I think the 'allow' rule will be lost. Though not yet 100% sure if it will cause errors for an already attached device. This still needs to be verified.

Anyway, what definitely needs to be fixed is that the stale data from non-existent VMs should be removed. Otherwise the map will grow in size consuming more and more memory. It will be sort of a leak (even though there is a reference to that map).

@iholder101
Copy link
Contributor

Your concerns are valid and important.

Regarding the scenario you've listed - it should be tested. I'm not entirely sure that we don't re-allow rules in a case of crash.

Regarding the size of the map - I agree, we should prune data that is no longer relevant.

@xpivarc
Copy link
Member

xpivarc commented May 11, 2022

/triage-accepted

@vasiliy-ul
Copy link
Contributor Author

vasiliy-ul commented May 12, 2022

Just a small update: I tested the scenario mentioned in #7710 (comment). After killing virt-handler and attaching one more hotplug disk the allow-rule for the first volume gets lost. The device is still accessible from the running VM (since cgroup rules are checked only on open and the device was actually opened when the allow rule was in place) and its operational (mounting, reading, writing, etc.).

However when exec'ing into the virt-launcher pod, the device is not accessible anymore:

$ k exec -ti virt-launcher-vmi-fedora-2bd29 -- bash
bash-4.4# cat /run/kubevirt/hotplug-disks/ceph-block-vol > /dev/null
cat: /run/kubevirt/hotplug-disks/ceph-block-vol: Operation not permitted

So this is similar to what was discussed in the cgroupv2 PR regarding /dev/kvm and friends (i.e. the "default" devices). Even though this situation does not look very good, its not clear how critical it is and whether is it going to cause troubles. As long as the device stays attached, its also remains accessible to the guest.

@iholder101
Copy link
Contributor

Just a small update: I tested the scenario mentioned in #7710 (comment). After killing virt-handler and attaching one more hotplug disk the allow-rule for the first volume gets lost. The device is still accessible from the running VM (since cgroup rules are checked only on open and the device was actually opened when the allow rule was in place) and its operational (mounting, reading, writing, etc.).

However when exec'ing into the virt-launcher pod, the device is not accessible anymore:

$ k exec -ti virt-launcher-vmi-fedora-2bd29 -- bash
bash-4.4# cat /run/kubevirt/hotplug-disks/ceph-block-vol > /dev/null
cat: /run/kubevirt/hotplug-disks/ceph-block-vol: Operation not permitted

So this is similar to what was discussed in the cgroupv2 PR regarding /dev/kvm and friends (i.e. the "default" devices). Even though this situation does not look very good, its not clear how critical it is and whether is it going to cause troubles. As long as the device stays attached, its also remains accessible to the guest.

Very interesting, thanks for sharing that.

I think that eventually you're right, we would have to back this up in a file to avoid problems after crashes. I think that runc's part in all of this is to have a some kind of a GetCurrentRules() method which we will use to both append new rules and back them up in a file.

@vasiliy-ul
Copy link
Contributor Author

vasiliy-ul commented May 12, 2022

Unfortunately a file is not a perefect solution either. There are issues that need to be handled like concurrent reads/writes, timely cleanup... also we cannot apply a device rule and write to a file atomically. Its good that we identified this problem but I would think more on how to properly solve it. At least this github issue should serve as a reminder and placeholder for ideas.

@kubevirt-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2022
@stu-gott
Copy link
Member

/remove-lifecycle stale

@kubevirt-bot kubevirt-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 19, 2022
@kubevirt-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 17, 2022
@xpivarc
Copy link
Member

xpivarc commented Nov 18, 2022

/remove-lifecycle stale

@kubevirt-bot kubevirt-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 18, 2022
@kubevirt-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 16, 2023
@kubevirt-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@kubevirt-bot kubevirt-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 18, 2023
@iholder101
Copy link
Contributor

/remove-lifecycle rotten

@kubevirt-bot kubevirt-bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 19, 2023
@kubevirt-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 17, 2023
@iholder101
Copy link
Contributor

/remove-lifecycle stale

This is still very relevant, and unfortunately no good solution found yet.
Now that v2 is more popular and widespread than before, I would imagine that others might have faced the same problem. Interesting how others coped with this.

@kubevirt-bot kubevirt-bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2023
@kubevirt-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 18, 2023
@kubevirt-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@kubevirt-bot kubevirt-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 18, 2023
@akalenyu
Copy link
Contributor

akalenyu commented Nov 3, 2023

/remove-lifecycle stale

/remove-lifecycle stale
/remove-lifecycle rotten

@kubevirt-bot kubevirt-bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants