New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
systemd-journald-audit.socket vs lxd #6519
Comments
my recommendation: simply block the socket(AF_NETLINK, SOCK_RAW, NETLINK_AUDIT) call through seccomp in lxd, and make it return -EPERM. That way handling the case when audit is off in the kernel and when it is blocked in a container is handled the exact same way. Moreover systemd will just work then, as it already makes the necessary checks. And lxd would just do what nspawn does already. |
ack. Also the audit/apparmor DENIED is red herring. as that's for something else, due to wrong sock_type. So audit namespacing is being worked on in the kernel, thus the argument against filtering it, is that it will start working on newer kernels. My other question from irc chats with stgraber was this:
somehow I would have expected for kernel to eperm opening the socket in the first places, rather than waiting for bind to eperm that. |
please keep this issue open for commenting for a little while. and maybe tag it as something like a discussion or some such. |
Well, I am pretty sure that audit namespacing will mean using a new API (i.e. CLONE_NEWAUDIT or so), hence it appears to me that nothing is lost if auditing is blocked for now entirely, and only turned back on when it can actually work and the CLONE_NEWAUDIT stuff is used. And even if audit namespacing is piggybacked on some other clone() bit, then I still think it's the duty of the container manager to grok that, and unmask auditing in that case, and only do so when the kernel is safe to support it... |
The plan is to piggyback onto user namespace and thus be transparent. But indeed that is speculation, until all merged and stable. And it has been "worked on" for a while now. Thus no expectations as to when audit namespacing will land. |
I wonder if there's an actual workaround for that problem nowerdays? |
@asbachb simply mask it using |
I proposed a proper fix here: that should work even on lxc where sandboxing is not done. |
If a container manager does not follow the guidance in https://systemd.io/CONTAINER_INTERFACE/ regarding audit capabilities, then the current check may not be sufficient to determine that audit will function properly. In particular, when calling bind() on the audit fd, we will get EPERM if running in a user-namespaced container. Expand the check to make an AUDIT_GET_FEATURE request on the audit fd to test if it is working. If this fails with ECONNREFUSED, we know it is because the kernel does not support the use of audit outside of the initial user namespace. Note that the approach of this patch was suggested here: systemd#19443 (comment) Fixes: systemd#6519
If a container manager does not follow the guidance in https://systemd.io/CONTAINER_INTERFACE/ regarding audit capabilities, then the current check may not be sufficient to determine that audit will function properly. In particular, when calling bind() on the audit fd, we will get EPERM if running in a user-namespaced container. Expand the check to make an AUDIT_GET_FEATURE request on the audit fd to test if it is working. If this fails with ECONNREFUSED, we know it is because the kernel does not support the use of audit outside of the initial user namespace. Note that the approach of this patch was suggested here: systemd#19443 (comment) Fixes: systemd#6519
If a container manager does not follow the guidance in https://systemd.io/CONTAINER_INTERFACE/ regarding audit capabilities, then the current check may not be sufficient to determine that audit will function properly. In particular, when calling bind() on the audit fd, we will get EPERM if running in a user-namespaced container. Expand the check to make an AUDIT_GET_FEATURE request on the audit fd to test if it is working. If this fails with ECONNREFUSED, we know it is because the kernel does not support the use of audit outside of the initial user namespace. Note that the approach of this patch was suggested here: systemd#19443 (comment) Fixes: systemd#6519
If a container manager does not follow the guidance in https://systemd.io/CONTAINER_INTERFACE/ regarding audit capabilities, then the current check may not be sufficient to determine that audit will function properly. In particular, when calling bind() on the audit fd, we will get EPERM if running in a user-namespaced container. Expand the check to make an AUDIT_GET_FEATURE request on the audit fd to test if it is working. If this fails with ECONNREFUSED, we know it is because the kernel does not support the use of audit outside of the initial user namespace. Note that the approach of this patch was suggested here: systemd#19443 (comment) Fixes: systemd#6519
If a container manager does not follow the guidance in https://systemd.io/CONTAINER_INTERFACE/ regarding audit capabilities, then the current check may not be sufficient to determine that audit will function properly. In particular, when calling bind() on the audit fd, we will get EPERM if running in a user-namespaced container. Expand the check to make an AUDIT_GET_FEATURE request on the audit fd to test if it is working. If this fails with ECONNREFUSED, we know it is because the kernel does not support the use of audit outside of the initial user namespace. Note that the approach of this patch was suggested here: systemd#19443 (comment) Fixes: systemd#6519
If a container manager does not follow the guidance in https://systemd.io/CONTAINER_INTERFACE/ regarding audit capabilities, then the current check may not be sufficient to determine that audit will function properly. In particular, when calling bind() on the audit fd, we will get EPERM if running in a user-namespaced container. Expand the check to make an AUDIT_GET_FEATURE request on the audit fd to test if it is working. If this fails with ECONNREFUSED, we know it is because the kernel does not support the use of audit outside of the initial user namespace. Note that the approach of this patch was suggested here: systemd#19443 (comment) Fixes: systemd#6519
If a container manager does not follow the guidance in https://systemd.io/CONTAINER_INTERFACE/ regarding audit capabilities, then the current check may not be sufficient to determine that audit will function properly. In particular, when calling bind() on the audit fd, we will get EPERM if running in a user-namespaced container. Expand the check to make an AUDIT_GET_FEATURE request on the audit fd to test if it is working. If this fails with ECONNREFUSED, we know it is because the kernel does not support the use of audit outside of the initial user namespace. Note that the approach of this patch was suggested here: systemd#19443 (comment) Fixes: systemd#6519
If a container manager does not follow the guidance in https://systemd.io/CONTAINER_INTERFACE/ regarding audit capabilities, then the current check may not be sufficient to determine that audit will function properly. In particular, when calling bind() on the audit fd, we will get EPERM if running in a user-namespaced container. Expand the check to make an AUDIT_GET_FEATURE request on the audit fd to test if it is working. If this fails with ECONNREFUSED, we know it is because the kernel does not support the use of audit outside of the initial user namespace. Note that the approach of this patch was suggested here: #19443 (comment) Fixes: #6519
If a container manager does not follow the guidance in https://systemd.io/CONTAINER_INTERFACE/ regarding audit capabilities, then the current check may not be sufficient to determine that audit will function properly. In particular, when calling bind() on the audit fd, we will get EPERM if running in a user-namespaced container. Expand the check to make an AUDIT_GET_FEATURE request on the audit fd to test if it is working. If this fails with ECONNREFUSED, we know it is because the kernel does not support the use of audit outside of the initial user namespace. Note that the approach of this patch was suggested here: systemd#19443 (comment) Fixes: systemd#6519 (cherry picked from commit 362235b) (cherry picked from commit 4be604e)
If a container manager does not follow the guidance in https://systemd.io/CONTAINER_INTERFACE/ regarding audit capabilities, then the current check may not be sufficient to determine that audit will function properly. In particular, when calling bind() on the audit fd, we will get EPERM if running in a user-namespaced container. Expand the check to make an AUDIT_GET_FEATURE request on the audit fd to test if it is working. If this fails with ECONNREFUSED, we know it is because the kernel does not support the use of audit outside of the initial user namespace. Note that the approach of this patch was suggested here: systemd#19443 (comment) Fixes: systemd#6519 (cherry picked from commit 362235b)
If a container manager does not follow the guidance in https://systemd.io/CONTAINER_INTERFACE/ regarding audit capabilities, then the current check may not be sufficient to determine that audit will function properly. In particular, when calling bind() on the audit fd, we will get EPERM if running in a user-namespaced container. Expand the check to make an AUDIT_GET_FEATURE request on the audit fd to test if it is working. If this fails with ECONNREFUSED, we know it is because the kernel does not support the use of audit outside of the initial user namespace. Note that the approach of this patch was suggested here: systemd#19443 (comment) Fixes: systemd#6519 (cherry picked from commit 362235b) (cherry picked from commit 4be604e) (cherry picked from commit 7418088)
Submission type
systemd version the issue has been seen with
234
Used distribution
Ubuntu
In case of bug report: Expected behaviour you didn't see
System boots non-degraded
In case of bug report: Unexpected behaviour you saw
System boots degraded, status of systemd-journald-audit.socket is failed Result: resources
In case of bug report: Steps to reproduce the problem
On Ubuntu:
$ lxc launch ubuntu-daily:a degradedboot
$ lxc exec degradedboot bash
$ systemctl status systemd-journald-audit.socket
....
The container in question, is apparmor protected unpriviledged (user namespaced) lxd container (systemd-virt-detect lxc).
I did manual check as mentioned in #6508 (comment) and there is no errno set, and fd 4 is opened.
To debug this further, I have tweaked .socket unit to actaully be related to some other unit, rather than be before systemd-journald, as otherwise there are no useful logs to see why starting the socket unit failed.
Here are more detailed logs:
Which means we get
EPERM
uponbind
call. And on the host I getI'm trying to resolve a user experience issue of default container comming up degraded. Thus at the moment I have no preference on how to solve this.
From above. Should the audit checks try to bind() and watch for EPERM comming from LSM? Or for example, should the host LSM (apparmor) rules be tightened to prevent opening NETLINK_AUDIT if one will not be able to bind to it? By default lxd doesn't limit / filter capabilities that are available in the container.
Please advise best strategy. And I'll ping lxd / apparmor people to read this bug report - despite this not being neither lxd nor apparmor bug tracker. Somehow I suspect that this can be fixed in either of the three projects. And all three can point fingers at the other =)
Also as a side note, if audit-fd was not passed to journald, it will try to open it, but it will ignore failing it. Thus maybe audit.socket unit can be adjusted to somehow be "non-fatal" to not cause degraded state if and when bind() fails for it. However, ideally in above scenario audit.socket unit should not be started at all given, in a way, it is known in advance it will fail for this user case.
The text was updated successfully, but these errors were encountered: