-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interfaces: add microstack-support interface #8926
interfaces: add microstack-support interface #8926
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review the policy yet. Some quick comments:
- there needs to be a new plug/slot added to a spread test that checks each kind of interface - the test failures usually indicate which
- many tests need changes to the tests in the interfaces/policy package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this. Lots of things to change, but much is just removing things that exist in other interfaces. Please do see various questions inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented where things needed changing or clarifying and submitted a follow-up commit.
I also added most of the rules needed for LVM and loop device interaction but I need to test a few more things to complete that.
Other basic functionality worked the same way with the updated interface (booting a VM and connecting to it via ssh over the networking layout created for/via OVN).
60402cc
to
987b286
Compare
Rebased on top of recent snapd changes and added rules for the LVM support. One more thing besides the LVM support that's needed for MicroStack to have an LVM-backed volume backend is running tgtd. This has to do with the way Cinder exposes access to LVM volumes to VMs - it is done via iSCSI. There are multiple reasons why that's the case:
In the absence of a better volume backend, LVM provides a way to test volume creation which is what is needed to improve MicroStack gate tests since certain tempest tests require volume-backed instances to be created. I am going to check which rules will be needed for running tgtd and if the interface will need to be modified further to make that work. |
987b286
to
4db6c1a
Compare
Added the iSCSI bits in 4db6c1a. Unfortunately, upstream OpenStack does not currently allow one to leverage the QEMU's built-in iscsi support, therefore, MicroStack has to include Using |
For reference, when tested with snapd built with this PR applied and a test-time workaround for LP #1892895 applied, I managed to get 0 failures of Refstack tests. https://review.opendev.org/#/c/738242/9
This is to show the applicability of the interface code to the target workload. |
Add an interface to enable MicroStack to work in a confined environment.
OpenStack relies on the following components when working with LVM volumes: * tgtd - a daemon that exposes block devices via iSCSI; * scsid - a control plane daemon for the iSCSI initiator data plane implemented in the iscsi_tcp module; * iscsi_tcp kernel module. Working with the iSCSI kernel stack requires a more privileged access to sysfs to that iscsi-adm and iscsid can do their job.
After several tempest test runs, based on the kernel logs, it became apparent that libvirt requires a wider rw access to the hierarchy under /sys/fs/cgroup/*/machine.
4db6c1a
to
3584a4b
Compare
Major changes: * Plumbing necessary for strict confinement with the microstack-support interface snapcore/snapd#8926 * Until the interface is merged, devmode will be used and kernel modules will be loaded via an auxiliary service. * upgraded OpenStack components to Focal (20.04) and OpenStack Ussuri; * reworked the old patches; * added the Placement service since it is now separate; * addressed various build issues due to changes in snapcraft and built dependencies: * e.g. libvirt requires the build directory to be separate from the source directory) and LP: #1882255; * LP: #1882535 and pypa/pip#8414 * LP: #1882839 * LP: #1885294 * https://storyboard.openstack.org/#!/story/2007806 * LP: #1864589 * LP: #1777121 * LP: #1881590 * ML2/OVS replated with ML2/OVN; * dnsmasq is not used anymore; * neutron l3 and DHCP agents are not used anymore; * Linux network namespaces are only used for neutron-ovn-metadata-agent. * ML2 DNS support is done via native OVN mechanisms; * OVN-related database services (southbound and northbound dbs); * OVN-related control plane services (ovn-controller, ovn-northd); * core20 base support (bionic hosts are supported); * the removal procedure now relies on the "remove" hook since `snap remove` cannot be used from the confined environment anymore; * prerequisites to enabling AppArmor confinement for QEMU processes created by the confined libvirtd. * Added the Spice html5 console proxy service to enable clients to retrieve and use it via `microstack.openstack console url show --spice <servername>`. * Added missing Cinder templates and DB migrations for the Cinder DB. * Added experimental support for a loop device-based LVM backend for Cinder. Due to LP: #1892895 this is not recommended to be used in production except for tempest testing with an applied workaround; * includes iscsid and iscsi-tcp kernel module loading; * includes LIO and loading of relevant kernel modules; * An LVM PV is created on top of a loop device with a backing file present in $SNAP_COMMON/cinder-lvm.img; * A VG is created on top of the PV; * LVs are created by Cinder and exported via LIO over iscsi to iscsid which hot-plugs new SCSI devices. Those SCSI devices are then propagated by Nova to libvirt and QEMU during volume attachment; * Added post-deployment testing via rally and tempest (via the microstack-test snap). A set of tests included into Refstack 2018.02 is executed (except for object storage tests due to the lack of object storage support). Change-Id: Ic70770095860a57d5e0a55a8a9451f9db6be7448
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've written some comments inline.
Hi again :-) I've been thinking a bit about this interface, and it seems to me that, no matter how carefully we write it, we are always leaving the door open for a process running inside the snap to break out of the confinement, if we allow libvirt to create apparmor profiles and run new processes with the newly created profiles. That's because the process running libvirt could be compromised, or there might be bugs that make it so that the generated apparmor profiles are way more permissive than what we'd like them to be. This would not be an issue if we could rely on the So, I wonder if it wouldn't be better, until we have the |
NoNewPrivileges interferes with profile transitions by snap-confine which is why we do not set it there. There has been some work in this area upstream but it is not yet available everywhere. As for setting it after snap-confine has transitioned, because of the interactions between historic apparmor and NNP, this can alter the behavior of the applications that try to use NNP themselves, etc. We've felt that the combination of the existing sandbox technologies (apparmor, seccomp, etc) achieved many of the same goals as NNP but with added flexibility for (privileged) interfaces that need it. As for the observation about libvirt, you are correct. If an application can load apparmor policy and transition into it, it can trivially break out of confinement (note, libvirt (and docker, containerd, lxd, etc, etc) have many more ways to break out (consider mounting a raw disk that would allow writes that changed policy, apparmor_parser/snapd itself, etc)), so how we've decided to handle that is to called these 'superprivileged' interfaces, where the interfaces allow the application to do its job but we disallow installation/connection/etc through snapd interface policy in the snap declarations. In this manner, we might grant microk8s-support to Canonical since it is the upstream for microk8s, but disallow it to everyone else. We then document that the interface is advisory and grants ownership to the system. In general, IMHO approaching interfaces that require lots of privilege can be done in phases. The first phase is like I describe above (lenient policy that let's them get the job done, with some guardrails (eg, let them only transition to policy with a certain naming convention, etc)). As technologies mature, later phases could further restrict (where the interface could have attributes that allow choosing the 'flavor' of the interface policy, and these attributes are mediatable in the snap declaration) and take advantage of improvements to sandboxing technologies. It's also possible to make adjustments to the internal components of the snap to understand snap sandboxing (eg, in the case of libvirt, perhaps there is a snap helper that libvirtd could call out to do policy writes/transitions or libvirt is modified to have reduced functionality when running in a snap sandbox (ie, it can't or can only attach a subset of devices, etc)). CC @alexmurray for awareness |
this is a high-level remark but it seems some of the accesses here like LVM or iscsi (maybe) and maybe libvirt (not sure, Jamie made that remark a while ago) could be extracted into separate -control or -support interfaces. This makes sense only if the sets are cohesive and not too specific to what microstack does vs what a generic consumer of those (kernel) interfaces would do. |
there is something going in that direction here #9585 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the long delay in response. Answered some in-line comments in the review.
* refactoring; * added Delegate=true since libvirt manages its own cgroup subtree.
I just did another functional test run (Tempest, refstack subset) with locally built snapd that had All volume tests failed due to https://bugs.launchpad.net/snapd/+bug/1892895 which is not something I can fix in this interface (see the details in the bug, including comment #4). When I manually allowed the necessary devices via wildcard rules to avoid the race condition like this
I was able to make to get a full pass: https://paste.ubuntu.com/p/TB6VtxThSz/ In summary: I do not see any new rules that need to be added to this interface in order for us to get our functional tests to pass (except for volume tests for experimental/optional volume functionality). |
The snapd team requested a less permissive rule to be used for block device access. While MicroStack considers volume support experimental due to https://bugs.launchpad.net/snapd/+bug/1892895, it seems acceptable to limit the set of VG names to the ones prefixed with "microstack-" to avoid blocking the whole review on this. Lifting this naming restriction will be a consideration for the future versions of this interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me, just minor comments. Given that the interface is relatively trivial in logic (it's complex as far as AppArmor rules are concerned, but it's a very typical interface as far as snapd is concerned) I wouldn't insist on a spread test, but let's see what other people think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent! Let's wait, though, for the security team review (as well for some more expert team-mates of mine). :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks pretty solid - the only glaring bit which stood out was the iptables related change to the network-control interface - this doesn't seem right to me, but otherwise this seems fine as is (whilst I like the idea of a libvirt-support or similar interface which could encapsulate some of this, I don't think it is fair to block this PR on waiting for that, especially given we don't seem to have other users for such an interface at the moment).
I did another functional test run after ebc2f4f (without volume testing since those rules were not changed). Results: https://paste.ubuntu.com/p/rPtZFv6DVg/. Unit tests are passing and most of the spread tests too (the failures are not related to this change). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, one question about non-systemd layouts and a test suggestion
Codecov Report
@@ Coverage Diff @@
## master #8926 +/- ##
==========================================
- Coverage 80.71% 78.32% -2.39%
==========================================
Files 727 883 +156
Lines 58158 99322 +41164
==========================================
+ Hits 46940 77797 +30857
- Misses 7544 16634 +9090
- Partials 3674 4891 +1217
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
Hey @jdstrand, any chance you could approve this or dismiss the change request? I think what was requested originally has been addressed but the change request is still active. Thanks in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks alright
Add an interface to enable MicroStack to work in a confined environment.