-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: add TemporaryFileSystem= setting and 'tmpfs' option to ProtectHome= #7908
Conversation
4333915
to
46c682a
Compare
src/core/execute.c
Outdated
@@ -2295,6 +2294,12 @@ static int compile_bind_mounts( | |||
|
|||
assert(h == n); | |||
|
|||
if (!strv_isempty(empty_directories) && !strv_isempty(context->empty_directories)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit I haven't spent much time looking at this code, but at a casual glance shouldn't context->empty_directories still be applied when empty_directories is empty at this point? It looks like a non-empty context->empty_directories will just be ignored by this function if empty_directories is empty. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i think @vcaputo is right
src/core/execute.c
Outdated
@@ -2369,7 +2374,7 @@ static int apply_mount_namespace( | |||
&ns_info, context->read_write_paths, | |||
needs_sandboxing ? context->read_only_paths : NULL, | |||
needs_sandboxing ? context->inaccessible_paths : NULL, | |||
empty_directories, | |||
empty_directories ? : context->empty_directories, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, is this why compile_bind_mounts() doesn't return empty_directories when context->empty_directories is the only one?
That's kind of ugly IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be. But this logic can reduce creating unnecessary copy of context->empty_directories
.
@@ -0,0 +1,22 @@ | |||
[Unit] | |||
Description=Test for BindPaths= and BindReadOnlyPaths= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unimportant nit, but I think this is stale from whatever was used as a template in creating this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thanks. I will update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
I gave a casual look over the code, made some comments, but it generally lgtm. |
3f3ec5c
to
a34df8d
Compare
Also added |
Hmm, what's the difference and benefit over InaccessiblePaths=? |
Hi @poettering, To my understanding, it's impossible to do that with InaccessiblePaths=: It works for read-only access limitations on the file system, but not for a complete black-out, like you have in a chroot, where you can see only specific folders. The main interest: With this option, it will be possible to build easily a new complete view on the filesystem without the need to maintain a separate chroot. For the sysadmins, they can apply the security upgrades on the "host", all daemons will have an updated view of their part of root, not need to upgrade each container, with more or less the same security level of a "complete" container with a separate root. We use intensively systemd features to reduce our number of VMs and containers. I have started this project with bind, I'm discussing on their mailing-list about this: https://lists.isc.org/pipermail/bind-users/2018-January/099437.html Thanks for your remarks. |
I think #2780 is also related. |
What about DAC permissions, ACLs, SELinux labels, mount flags (noexec/nodev/nosuid/ro/rw) of the original directory? Shouldn't they also be reapplied to the empty directory and/or let the admin configure them in the service file? |
Documentation here doesn't mention the permissions of the created tmpfs, i.e. who can write to it. I'm not saying it's wrong to set it to 0755, but that's not the default permission for a tmpfs, so I think the docs should be more explicit somehow. EDIT: ninja'd |
Updated. Mainly adds more documentation. @poettering
This hides any configuration under /etc except for /etc/systemd. Several RFEs request such a function, and we already internally use it for '/private' @topimiettinen Sorry, I do not have enough knowledge about them. What kind of information should be inherit from the original directory? @sourcejedi Initially they are mounted with mode=755, but after that they are remounted as read-only. |
@yuwata I think safest would be all of them, so that the result is not different from taking the original directory and removing its contents. |
@topimiettinen Do you concern them on just EmptyDirectories= setting? Or on BindPaths= ? |
I can see the usecase, but I am not sure the name "EmptyDirectory=" is really that great for this, after all for it to be useful (i.e. to be more than InaccessibleDirectory=) it will not be empty. I mean, the fact that it doesn't have to be entirely empty, and doesn't cancel out any mounts below it is kinda the key feature of it, no? In "systemd-nspawn" we already have the |
@poettering Yeah, sounds good. I will rename them. |
(also I have the suspicion this should take an optional mount option string too, the same way as nspawn's --tmpfs= takes one, separated by a colon) |
TRANSIENT-SETTINGS.md
Outdated
@@ -168,6 +168,7 @@ All execution-related settings are available for transient units. | |||
✓ ReadWritePaths= | |||
✓ ReadOnlyPaths= | |||
✓ InaccessiblePaths= | |||
✓ EmptyDirectories= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the commit's title is wrong btw, it says "ExecDirectories=", but should say EmptyDirectories=
src/core/execute.c
Outdated
@@ -2250,7 +2251,7 @@ static int compile_bind_mounts( | |||
* directory. For that we overmount the usually inaccessible "private" subdirectory with a | |||
* tmpfs that makes it accessible and is empty except for the submounts we do this for. */ | |||
|
|||
private_root = strjoin(params->prefix[t], "/private"); | |||
private_root = strjoin("+", params->prefix[t], "/private"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, why this?
src/core/execute.c
Outdated
@@ -2295,6 +2294,12 @@ static int compile_bind_mounts( | |||
|
|||
assert(h == n); | |||
|
|||
if (!strv_isempty(empty_directories) && !strv_isempty(context->empty_directories)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i think @vcaputo is right
Hi @yuwata, I permitted me to send you a gentle ping in case of you have some time to answer on review questions :-) Have a nice week. |
@GMLudo I was tackling other issues recently. I guess that I can update this not so far future, I hope this weekend :-) Testing, commenting, etc., any feedbacks are of course welcome. Thank you. |
It is not necessary to re-assign error code.
This is used in the later commits.
Each path in `Bind{ReadOnly}Paths=` accept '-' prefix. However, the prefix is completely ignored. This makes it work as expected.
This introduces a new setting TemporaryFileSystem=. This is useful to hide files not relevant to the processes invoked by unit, while necessary files or directories can be still accessed by combining with Bind{,ReadOnly}Paths=.
This make ProtectHome= setting can take 'tmpfs'. This is mostly equivalent to `TemporaryFileSystem=/home /run/user /root`.
@poettering and @sourcejedi Thank you for reviewing this so many times. I've force-pushed a new version. Main changes are
Note that still |
The one CI instance is just another kernel GPF. I'll be interested to see what's stopping |
|
||
# Check /proc/self/mountinfo | ||
ExecStart=/bin/sh -c 'test $$(awk \'$$5 == "/var" { print $$6 }\' /proc/self/mountinfo) = "ro,nodev,relatime"' | ||
ExecStart=/bin/sh -c 'test $$(awk \'$$5 == "/var" { print $$11 }\' /proc/self/mountinfo) = "ro,mode=700"' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering why test-execute
didn't fail on Fedora
. There is usually seclabel
in the 11th field of /proc/self/mountinfo
, which makes it impossible for ro,mode=700
to pass the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exec-temporaryfilesystem-options.service: Passing 0 fds to service
exec-temporaryfilesystem-options.service: About to execute: /bin/sh -c 'test $$(awk '$$5 == "/var" { print $$6 }' /proc/self/mountinfo) = "ro,nodev,relatime"'
exec-temporaryfilesystem-options.service: Forked /bin/sh as 10524
Followed symlinks /run/systemd/unit-root/var → /run/systemd/unit-root/var.
Applying namespace mount on /run/systemd/unit-root/var
exec-temporaryfilesystem-options.service: Executing: /bin/sh -c 'test $(awk '$5 == "/var" { print $6 }' /proc/self/mountinfo) = "ro,nodev,relatime"'
Received SIGCHLD from PID 10524 (sh).
Child 10524 (sh) died (code=exited, status=0/SUCCESS)
exec-temporaryfilesystem-options.service: Child 10524 belongs to exec-temporaryfilesystem-options.service.
exec-temporaryfilesystem-options.service: Main process exited, code=exited, status=0/SUCCESS
exec-temporaryfilesystem-options.service: Running next main command for state start.
exec-temporaryfilesystem-options.service: Passing 0 fds to service
exec-temporaryfilesystem-options.service: About to execute: /bin/sh -c 'test $$(awk '$$5 == "/var" { print $$11 }' /proc/self/mountinfo) = "ro,mode=700"'
exec-temporaryfilesystem-options.service: Forked /bin/sh as 10527
Followed symlinks /run/systemd/unit-root/var → /run/systemd/unit-root/var.
Applying namespace mount on /run/systemd/unit-root/var
exec-temporaryfilesystem-options.service: Executing: /bin/sh -c 'test $(awk '$5 == "/var" { print $11 }' /proc/self/mountinfo) = "ro,mode=700"'
Received SIGCHLD from PID 10527 (sh).
Child 10527 (sh) died (code=exited, status=1/FAILURE)
exec-temporaryfilesystem-options.service: Child 10527 belongs to exec-temporaryfilesystem-options.service.
exec-temporaryfilesystem-options.service: Main process exited, code=exited, status=1/FAILURE
exec-temporaryfilesystem-options.service: Failed with result 'exit-code'.
exec-temporaryfilesystem-options.service: Changed start -> failed
exec-temporaryfilesystem-options.service: Unit entered failed state.
PID: 10527
Start Timestamp: Wed 2018-02-21 09:24:08 UTC
Exit Timestamp: Wed 2018-02-21 09:24:08 UTC
Exit Code: exited
Exit Status: 1
Assertion 'service->main_exec_status.status == status_expected' failed at ../src/test/test-execute.c:74, function check(). Aborting.
Aborted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worked for me after changing to
# The mount options default to "mode=0755,nodev,strictatime".
# Let's override some of them, and test the behaviour of "ro".
TemporaryFileSystem=/var:ro,mode=0700,nostrictatime
# Check /proc/self/mountinfo
ExecStart=/bin/sh -x -c 'test "$$(awk \'$$5 == "/var" && $$11 !~ /(^|,)ro(,|$)/ { print $$6 }\' /proc/self/mountinfo)" == ""'
ExecStart=/bin/sh -x -c 'test "$$(awk \'$$5 == "/var" && $$11 !~ /(^|,)mode=700(,|$)/ { print $$6 }\' /proc/self/mountinfo)" == ""'
ExecStart=/bin/sh -x -c 'test "$$(awk \'$$5 == "/var" && $$6 !~ /(^|,)ro(,|$)/ { print $$6 }\' /proc/self/mountinfo)" == ""'
ExecStart=/bin/sh -x -c 'test "$$(awk \'$$5 == "/var" && $$6 !~ /(^|,)nodev(,|$)/ { print $$6 }\' /proc/self/mountinfo)" == ""'
ExecStart=/bin/sh -x -c 'test "$$(awk \'$$5 == "/var" && $$6 ~ /(^|,)strictatime(,|$)/ { print $$6 }\' /proc/self/mountinfo)" == ""'
@yuwata are you able to resolve this soon or should I submit something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. Can I include this in a PR that makes ReadWritePaths=
can work with TemporaryFileSystem=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are welcome to take the above code, I just want to make sure the tests not left broken (on Fedora) for too long. If you think the ReadWritePaths= will be quick to merge then I guess you can include them together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just changed my mind. This is now pushed as #8241. Please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sourcejedi, @yuwata thank you for looking at this. I've already fixed the test locally, but I hesitated to show it because, well, here it is:
diff --git a/test/test-execute/exec-temporaryfilesystem-options.service b/test/test-execute/exec-temporaryfilesystem-options.service
index 1d5d76c81..d63f66fd2 100644
--- a/test/test-execute/exec-temporaryfilesystem-options.service
+++ b/test/test-execute/exec-temporaryfilesystem-options.service
@@ -5,7 +5,13 @@ Description=Test for TemporaryFileSystem with mount options
Type=oneshot
# Check /proc/self/mountinfo
-ExecStart=/bin/sh -c 'test $$(awk \'$$5 == "/var" { print $$6 }\' /proc/self/mountinfo) = "ro,nodev,relatime"'
-ExecStart=/bin/sh -c 'test $$(awk \'$$5 == "/var" { print $$11 }\' /proc/self/mountinfo) = "ro,mode=700"'
+ExecStart=/usr/bin/perl -alne 'if ($$F[4] eq "/var") { \
+ my %%a = map { $$_ => 1 } split(",", $$F[10]); \
+ if (!@a{ro,"mode=700"}) {next}; \
+ my %%b = map { $$_ => 1 } split(",", $$F[5]); \
+ if (@b{ro,nodev,relatime}) {$$found=1; exit}; \
+} \
+END {exit($$found ? 0 : 1)}' \
+/proc/self/mountinfo
TemporaryFileSystem=/var:ro,mode=0700,nostrictatime
Don't ask me how and why it works. I don't know :-)
This makes test-execute work on SELinux enabled systems. Fixes the issue reported at systemd#7908 (comment)
This makes test-execute work on SELinux enabled systems. Fixes the issue reported at #7908 (comment)
…stem= This makes Read{Write,Only}Paths= work with TemporaryFileSystem= or ProtectHome=tmpfs. Follow-up for systemd#7908. Closes systemd#7895, systemd#7153, and systemd#2780.
…stem= This makes Read{Write,Only}Paths= work with TemporaryFileSystem= or ProtectHome=tmpfs. Follow-up for systemd#7908. Closes systemd#7895, systemd#7153, and systemd#2780.
…stem= This makes Read{Write,Only}Paths= work with TemporaryFileSystem= or ProtectHome=tmpfs. Follow-up for systemd#7908. Closes systemd#7895, systemd#7153, and systemd#2780.
This introduces a new setting
TemporaryFileSystem=
. The unit process can access the specified directories, but cannot access the files stored in the directories.Also, this adds a new option
tmpfs
toProtectHome=
, which is equivalent toTemporaryFileSystem=/root /run/user /home
.This is an important preparation to support #7895, #7153 and #2780.