Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LXC and AppArmor -- a lost cause? #10166

Closed
mbiebl opened this issue Sep 24, 2018 · 19 comments
Closed

LXC and AppArmor -- a lost cause? #10166

mbiebl opened this issue Sep 24, 2018 · 19 comments
Labels

Comments

@mbiebl
Copy link
Contributor

mbiebl commented Sep 24, 2018

All the recent back and forth with trying to get the test-suite pass in an AA-confined LXC container makes me wonder, if this is worth the trouble.
Should I/we give up on this idea and declare this setup unsupported?
Especially given the feedback from @brauner and @stgraber which indicates that Ubuntu itself disables AA when running their own CI on LXC.
I'm a bit at a loss atm what to do about this. Should I continue to file bug reports when I run into failures when using autopkgtest with the LXC backend with AA enabled?

@poettering
Copy link
Member

well, for all failures between aa, lxc, systemd I am pretty sure some need to be fixed in aa (or its policy), others in lxc, and even others in systemd. If there are good reasons I'm all for fixing the latter here upstream. That said, I don't run ubuntu/lxc/aa myself, hence I am not going to the biggest help, but I can certainly suggest fixes and stuff.

So, it's really up to you if you want to spend the time on it, all I can offer is my technical input and maybe a patch or two for systemd when things have sufficiently been tracked down.

@brauner
Copy link
Contributor

brauner commented Oct 1, 2018

Same here. I'm more than willing to help out.

@mbiebl
Copy link
Contributor Author

mbiebl commented Oct 5, 2018

See #9700 (comment)
and #10011

With git maste, running the test-suite under LXC+AA triggers a huge amount of failures.

@brauner
Copy link
Contributor

brauner commented Oct 7, 2018

There seem to be two problems that have been identified:

  • changing mount propagation aka remounting certain paths
  • using DynamicUser feature in combination with containers

Both seem to be addressed by various commits. Have you tested with systemd git master and how the test-suite fares.
If you haven't, please test. Additionally, please report back clear error messages so that we can zoom in on any additional issues.

@mbiebl
Copy link
Contributor Author

mbiebl commented Oct 7, 2018

@brauner the problem actually exists only in git master and not v239. Here are logs from v239-1142-gad1bf59c6:
log.confined.txt -- AA turned in the LXC container via lxc.aa_profile = unconfined

log.unconfined.txt -- AA enabled

@brauner
Copy link
Contributor

brauner commented Oct 7, 2018 via email

@mbiebl
Copy link
Contributor Author

mbiebl commented Oct 7, 2018

@brauner sure, no problem.
Attached is the dmesg output of a test-suite run:
dmesg.txt

I made a second run with auditd installed, which produced a more verbose log:
audit.log.txt

@brauner
Copy link
Contributor

brauner commented Oct 8, 2018

So, for neworkd there's the

type=AVC msg=audit(1538948870.928:1533): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default-cgns" name="/" pid=11956 comm="(networkd)" flags="rw, rslave"

problem again. So that needs to be allowed in the profile.

localed-locale seems odd

FAIL: 'System Locale:' not found in:

Hm...
@xnox, who can @mbiebl talk to if he wants to find out how the autopkgtest suite is run on Ubuntu. Do you know how our AppArmor profile looks like for this?

@xnox
Copy link
Member

xnox commented Oct 10, 2018

In ubuntu (for the distro builds, not the ubuntu-ci that is visible on github) we run autopkgtests across all 6 architectures in the current development. All are executed "unconfined" inside OpenStack KVM instances, apart from armhf. The armhf one is executed inside lxc container, on top of a arm64 (xenial?!) instance with some command-line options to make it report armhf in uname inside armhf containers.

The code which does the slaves setup/management is at https://git.launchpad.net/autopkgtest-cloud

I think you care about setup-adt-lxc.commands and the custom aa profile that is applied against lxc itself there....

https://git.launchpad.net/autopkgtest-cloud/tree/lxc-slave-admin/setup-adt-lxc.commands#n56

@poettering
Copy link
Member

Let's close this one, and instead focus on individual issues, i.e. #9700 and such. There's little actionable in this issue itself.

I figure part of the issue is outside of systemd's own scope anyway, and needs to be fixed in aa/lxc on debian. For everything else: please file individual issues instead.

@mbiebl
Copy link
Contributor Author

mbiebl commented Oct 24, 2018

I don't plan to run autopkgtest with AA-enabled on LXC in the future and file further bug reports for it.

From what I understand, even Ubuntu (which is heavily invested in AA) turns off AA completely or uses a custom AA profile.
This feels wrong to me: why should tests run in a separate, heavily modified environment compared to what's later run in production. Surely we can make the tests pass this way, but it doesn't give any guarantess that the executables will actually run later in a AA-confined LXC container.

To me this looks that the combination LXC+AA is unsupported, so spending more time on that doesn't seem useful.

@xnox
Copy link
Member

xnox commented Oct 26, 2018

@mbiebl some of the tests exercise things that are never executed in the initial namespace as root. And it is correct, for normal operation, to deny running things that can escape a container and affect the host.

Your assessment of Ubuntu CI is incomplete. By default, we run ubuntu CI in full VMs, launched by openstack, with AA fully turned on and enforcing. That's our preferred choice on all architectures. Initially, when we were bringing up architectures we did not have openstack for all arches and had to unfortunately resort to running tests in containers. This was the case for s390x, but now fixed. Out of the 6 arches Ubuntu CI is executed on today, only armhf remains in a container, on an arm64 KVM instance. This is because at the moment we still do not have working grub on efi on armhf with our patches working correctly. As soon as we have armhf cloud images working in our multi-arch openstack deployment that arch will too switch to full VMs. The profiles I have linked to are not the default, but the fallback we are using in case of lxd/lxc confiment, but only when we must.

There are conflicting goals between what is possible in containers, and what unittests systemd needs to be able to execute. Given that pid1 on the host system typically has access to do more things than containers will ever be allowed to.

Running test suite in containers is useful, but doesn't sufficiently test all the things that need to be tested.

I don't understand why e.g. debian-ci for autopkgtests does not use ssh runner to spin up EC2 VMs and execute autopkgtests on them as root. Given the available EC2 resources to the Debian project.

@xnox
Copy link
Member

xnox commented Oct 26, 2018

Investigating things that should work in containers, and making sure they don't regress in containers - is useful. For which in Ubuntu there are other jenkins instances that validate that - e.g. boot default cloud image containers and check they are operational; boot non-degraded; etc. But that's integration, rather than unit testing.

@evverx
Copy link
Member

evverx commented Oct 26, 2018

By default, we run ubuntu CI in full VMs, launched by openstack, with AA fully turned on and enforcing.

I have never seen Ubuntu CI failing due to the AppArmor profile, which makes me think that either the profile is too relaxed to be meaningful or it is actually relaxed before the tests are run.

@mbiebl
Copy link
Contributor Author

mbiebl commented Oct 26, 2018

@xnox thanks for the further clarifications regarding Ubuntu CI
@evverx It's the combination LXC+AA which is problematic, and Ubuntu only runs armhf via LXC (with a modified AA policy), iiuc

Me filing bug reports from time to time whenever I try to run the autopkgtests via LXC+AA is not sustainable, we'd have to do that automatically to be useful.
Before that, we'd have to clarify, if the systemd autopkgtest test-suite is even supposed to work inside an AA-confined LXC container or not (or which parts of it). This bug report was an attempt to clarify that, but tbh, I'm none the wiser.

@evverx
Copy link
Member

evverx commented Oct 26, 2018

@mbiebl what I was trying to say is that any policy that blocks anything should at some point be problematic given the pace at which new features are added to systemd and the nature of some of those features. The fact that Ubuntu CI has never failed due to the AppArmor profile (assuming that it's enforced) surprises me to say the least.

@mbiebl
Copy link
Contributor Author

mbiebl commented Oct 26, 2018

@evverx LXC containers apply different AA policies then a qemu based VMs, at least this is my understanding. These files look specific to LXC:

$ find /etc/apparmor.d -name "*lxc*" 
/etc/apparmor.d/libvirt/TEMPLATE.lxc
/etc/apparmor.d/lxc-containers
/etc/apparmor.d/usr.bin.lxc-start
/etc/apparmor.d/lxc
/etc/apparmor.d/lxc/lxc-default-cgns
/etc/apparmor.d/lxc/lxc-default-with-mounting
/etc/apparmor.d/lxc/lxc-default
/etc/apparmor.d/lxc/lxc-default-with-nesting
/etc/apparmor.d/abstractions/lxc
/etc/apparmor.d/abstractions/libvirt-lxc
/etc/apparmor.d/local/usr.bin.lxc-start

Then again, I'm far from a AA or LXC expert, so what I say might be totally bogus.

@mbiebl
Copy link
Contributor Author

mbiebl commented Oct 27, 2018

The profiles I have linked to are not the default, but the fallback we are using in case of lxd/lxc confiment, but only when we must.

@xnox I'm interested in what that means exactly. When and how exactly is the fallback profile used?
Isn't there the risk that you are then testing stuff which doesn't later match the production environment (in case you run the software in LXC/LXD)?

@xnox
Copy link
Member

xnox commented Jan 11, 2019

The profiles I have linked to are not the default, but the fallback we are using in case of lxd/lxc confiment, but only when we must.

@xnox I'm interested in what that means exactly. When and how exactly is the fallback profile used?
Isn't there the risk that you are then testing stuff which doesn't later match the production environment (in case you run the software in LXC/LXD)?

@mbiebl

I'm not sure how to word this better. We prefer to run autopkgtests in KVM VMs, as that most closely emulates bare-metal / production environments. And we have that available for all architectures apart from armhf. Eventually, we will have that for armhf as well. Whilst KVM VM was not available, we utilise lxd containers for armhf testing of autpkgtests (all of Ubuntu packages, not just systemd exclusive). Based on our trial an error we have relaxed lxd confinement to get more autopkgtests to pass, given the fact that these lxd containers run on virtualized hosts and are quite restricted in other ways. But doing such conscious confinement changes was for the goal of making armhf lxd confined testing be slightly more closer to KVM one. It is not the goal of armhf-lxd testing, to excercise all autopkgtests in a "default lxd" environment.

I understand that above changes make local reproducibility of armhf LXD test results harder.

But equally, it has never been my personal goal to make all upstream systemd tests work correctly under such test environment.

There is no "fallback profile", either KVM is used on arches for which we have openstack, or the laxed lxd as shown is used on manually provisioned machines (currently on armhf only).

Is above more clear, now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

5 participants