Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: add TemporaryFileSystem= setting and 'tmpfs' option to ProtectHome= #7908

Merged
merged 21 commits into from
Feb 21, 2018

Conversation

yuwata
Copy link
Member

@yuwata yuwata commented Jan 17, 2018

This introduces a new setting TemporaryFileSystem=. The unit process can access the specified directories, but cannot access the files stored in the directories.
Also, this adds a new option tmpfs to ProtectHome=, which is equivalent to TemporaryFileSystem=/root /run/user /home.

This is an important preparation to support #7895, #7153 and #2780.

@@ -2295,6 +2294,12 @@ static int compile_bind_mounts(

assert(h == n);

if (!strv_isempty(empty_directories) && !strv_isempty(context->empty_directories)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I haven't spent much time looking at this code, but at a casual glance shouldn't context->empty_directories still be applied when empty_directories is empty at this point? It looks like a non-empty context->empty_directories will just be ignored by this function if empty_directories is empty. Is that intentional?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i think @vcaputo is right

@@ -2369,7 +2374,7 @@ static int apply_mount_namespace(
&ns_info, context->read_write_paths,
needs_sandboxing ? context->read_only_paths : NULL,
needs_sandboxing ? context->inaccessible_paths : NULL,
empty_directories,
empty_directories ? : context->empty_directories,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, is this why compile_bind_mounts() doesn't return empty_directories when context->empty_directories is the only one?

That's kind of ugly IMHO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be. But this logic can reduce creating unnecessary copy of context->empty_directories.

@@ -0,0 +1,22 @@
[Unit]
Description=Test for BindPaths= and BindReadOnlyPaths=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unimportant nit, but I think this is stale from whatever was used as a template in creating this...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks. I will update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@vcaputo
Copy link
Member

vcaputo commented Jan 17, 2018

I gave a casual look over the code, made some comments, but it generally lgtm.

@yuwata yuwata force-pushed the rfe-7895 branch 2 times, most recently from 3f3ec5c to a34df8d Compare January 17, 2018 06:59
@yuwata
Copy link
Member Author

yuwata commented Jan 17, 2018

Also added ProtectHome=empty, which is equivalent to EmptyDirectories=/root /run/user /home.

@poettering
Copy link
Member

Hmm, what's the difference and benefit over InaccessiblePaths=?

@ludovic-gasc
Copy link

Hi @poettering,

To my understanding, it's impossible to do that with InaccessiblePaths=: It works for read-only access limitations on the file system, but not for a complete black-out, like you have in a chroot, where you can see only specific folders.
Another example : https://unix.stackexchange.com/questions/305586/how-to-whitelist-directories-for-units-in-systemd/387051#387051
As you can read in the answer: "According to this quote, it doesn't work".

The main interest: With this option, it will be possible to build easily a new complete view on the filesystem without the need to maintain a separate chroot.

For the sysadmins, they can apply the security upgrades on the "host", all daemons will have an updated view of their part of root, not need to upgrade each container, with more or less the same security level of a "complete" container with a separate root.

We use intensively systemd features to reduce our number of VMs and containers.
Our final goal is to enable the maximum of hardening options available in systemd to increase the security of daemons we are using on production, to try to reduce the risk of 0-day attack.

I have started this project with bind, I'm discussing on their mailing-list about this: https://lists.isc.org/pipermail/bind-users/2018-January/099437.html

Thanks for your remarks.

@topimiettinen
Copy link
Contributor

I think #2780 is also related.

@topimiettinen
Copy link
Contributor

What about DAC permissions, ACLs, SELinux labels, mount flags (noexec/nodev/nosuid/ro/rw) of the original directory? Shouldn't they also be reapplied to the empty directory and/or let the admin configure them in the service file?

@sourcejedi
Copy link
Contributor

sourcejedi commented Jan 17, 2018

Documentation here doesn't mention the permissions of the created tmpfs, i.e. who can write to it. I'm not saying it's wrong to set it to 0755, but that's not the default permission for a tmpfs, so I think the docs should be more explicit somehow.

EDIT: ninja'd

@yuwata yuwata changed the title core: introduce EmptyDirectories= option core: add EmptyDirectories= setting and 'empty' option to ProtectHome= Jan 18, 2018
@yuwata
Copy link
Member Author

yuwata commented Jan 18, 2018

Updated. Mainly adds more documentation.

@poettering
Example,

EmptyDirectories=/etc
ReadOnlyPaths=/etc/systemd

This hides any configuration under /etc except for /etc/systemd. Several RFEs request such a function, and we already internally use it for '/private' StateDirectory=. This just expose it as a new setting.

@topimiettinen Sorry, I do not have enough knowledge about them. What kind of information should be inherit from the original directory?

@sourcejedi Initially they are mounted with mode=755, but after that they are remounted as read-only.
I've updated the documents to mention that.

@topimiettinen
Copy link
Contributor

@yuwata I think safest would be all of them, so that the result is not different from taking the original directory and removing its contents.

@yuwata
Copy link
Member Author

yuwata commented Jan 22, 2018

@topimiettinen Do you concern them on just EmptyDirectories= setting? Or on BindPaths= ?

@poettering
Copy link
Member

I can see the usecase, but I am not sure the name "EmptyDirectory=" is really that great for this, after all for it to be useful (i.e. to be more than InaccessibleDirectory=) it will not be empty. I mean, the fact that it doesn't have to be entirely empty, and doesn't cancel out any mounts below it is kinda the key feature of it, no?

In "systemd-nspawn" we already have the --tmpfs= option (which is exposes as TemporaryFileSystem= in .nspawn files), which pretty much is what you are trying to do here too, right? Maybe we should just reuse that name for the same concept here? i.e. introduce this as TemporaryFileSystem= option in unit files?

@yuwata
Copy link
Member Author

yuwata commented Jan 22, 2018

@poettering Yeah, sounds good. I will rename them.

@poettering
Copy link
Member

(also I have the suspicion this should take an optional mount option string too, the same way as nspawn's --tmpfs= takes one, separated by a colon)

@@ -168,6 +168,7 @@ All execution-related settings are available for transient units.
✓ ReadWritePaths=
✓ ReadOnlyPaths=
✓ InaccessiblePaths=
✓ EmptyDirectories=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the commit's title is wrong btw, it says "ExecDirectories=", but should say EmptyDirectories=

@@ -2250,7 +2251,7 @@ static int compile_bind_mounts(
* directory. For that we overmount the usually inaccessible "private" subdirectory with a
* tmpfs that makes it accessible and is empty except for the submounts we do this for. */

private_root = strjoin(params->prefix[t], "/private");
private_root = strjoin("+", params->prefix[t], "/private");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, why this?

@@ -2295,6 +2294,12 @@ static int compile_bind_mounts(

assert(h == n);

if (!strv_isempty(empty_directories) && !strv_isempty(context->empty_directories)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i think @vcaputo is right

@poettering poettering added the reviewed/needs-rework 🔨 PR has been reviewed and needs another round of reworks label Jan 22, 2018
@ludovic-gasc
Copy link

Hi @yuwata, I permitted me to send you a gentle ping in case of you have some time to answer on review questions :-)
I'm interested by this feature, but I don't have the technical level to contribute on it :-(
However, I can do some tests with it.

Have a nice week.

@yuwata
Copy link
Member Author

yuwata commented Feb 8, 2018

@GMLudo I was tackling other issues recently. I guess that I can update this not so far future, I hope this weekend :-) Testing, commenting, etc., any feedbacks are of course welcome. Thank you.

It is not necessary to re-assign error code.
This is used in the later commits.
Each path in `Bind{ReadOnly}Paths=` accept '-' prefix. However,
the prefix is completely ignored.
This makes it work as expected.
This introduces a new setting TemporaryFileSystem=. This is useful
to hide files not relevant to the processes invoked by unit, while
necessary files or directories can be still accessed by combining
with Bind{,ReadOnly}Paths=.
This make ProtectHome= setting can take 'tmpfs'. This is mostly
equivalent to `TemporaryFileSystem=/home /run/user /root`.
@yuwata
Copy link
Member Author

yuwata commented Feb 21, 2018

@poettering and @sourcejedi Thank you for reviewing this so many times. I've force-pushed a new version. Main changes are

  • drop bind_remount_recursive_with_mountinfo() for TMPFS and EMPTY_DIR,
  • drop unnecessary !! to cast from int to bool, and
  • man page fixes.

Note that still Read{Write,Only}Paths= cannot use with TemporaryFileSystem=. I have a plan to support this, but I want to do it in another PR.

@sourcejedi
Copy link
Contributor

The one CI instance is just another kernel GPF.

I'll be interested to see what's stopping Read{Write,Only}Paths=, I didn't guess it. You can tag me in to that PR, if you're not fed up reading my comments :). Ok, let's try merging this now.

@sourcejedi sourcejedi merged commit 59e00b2 into systemd:master Feb 21, 2018
@yuwata yuwata deleted the rfe-7895 branch February 21, 2018 08:58

# Check /proc/self/mountinfo
ExecStart=/bin/sh -c 'test $$(awk \'$$5 == "/var" { print $$6 }\' /proc/self/mountinfo) = "ro,nodev,relatime"'
ExecStart=/bin/sh -c 'test $$(awk \'$$5 == "/var" { print $$11 }\' /proc/self/mountinfo) = "ro,mode=700"'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering why test-execute didn't fail on Fedora. There is usually seclabel in the 11th field of /proc/self/mountinfo, which makes it impossible for ro,mode=700 to pass the check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exec-temporaryfilesystem-options.service: Passing 0 fds to service
exec-temporaryfilesystem-options.service: About to execute: /bin/sh -c 'test $$(awk '$$5 == "/var" { print $$6 }' /proc/self/mountinfo) = "ro,nodev,relatime"'
exec-temporaryfilesystem-options.service: Forked /bin/sh as 10524
Followed symlinks /run/systemd/unit-root/var → /run/systemd/unit-root/var.
Applying namespace mount on /run/systemd/unit-root/var
exec-temporaryfilesystem-options.service: Executing: /bin/sh -c 'test $(awk '$5 == "/var" { print $6 }' /proc/self/mountinfo) = "ro,nodev,relatime"'
Received SIGCHLD from PID 10524 (sh).
Child 10524 (sh) died (code=exited, status=0/SUCCESS)
exec-temporaryfilesystem-options.service: Child 10524 belongs to exec-temporaryfilesystem-options.service.
exec-temporaryfilesystem-options.service: Main process exited, code=exited, status=0/SUCCESS
exec-temporaryfilesystem-options.service: Running next main command for state start.
exec-temporaryfilesystem-options.service: Passing 0 fds to service
exec-temporaryfilesystem-options.service: About to execute: /bin/sh -c 'test $$(awk '$$5 == "/var" { print $$11 }' /proc/self/mountinfo) = "ro,mode=700"'
exec-temporaryfilesystem-options.service: Forked /bin/sh as 10527
Followed symlinks /run/systemd/unit-root/var → /run/systemd/unit-root/var.
Applying namespace mount on /run/systemd/unit-root/var
exec-temporaryfilesystem-options.service: Executing: /bin/sh -c 'test $(awk '$5 == "/var" { print $11 }' /proc/self/mountinfo) = "ro,mode=700"'
Received SIGCHLD from PID 10527 (sh).
Child 10527 (sh) died (code=exited, status=1/FAILURE)
exec-temporaryfilesystem-options.service: Child 10527 belongs to exec-temporaryfilesystem-options.service.
exec-temporaryfilesystem-options.service: Main process exited, code=exited, status=1/FAILURE
exec-temporaryfilesystem-options.service: Failed with result 'exit-code'.
exec-temporaryfilesystem-options.service: Changed start -> failed
exec-temporaryfilesystem-options.service: Unit entered failed state.
	PID: 10527
	Start Timestamp: Wed 2018-02-21 09:24:08 UTC
	Exit Timestamp: Wed 2018-02-21 09:24:08 UTC
	Exit Code: exited
	Exit Status: 1
Assertion 'service->main_exec_status.status == status_expected' failed at ../src/test/test-execute.c:74, function check(). Aborting.
Aborted

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worked for me after changing to

# The mount options default to "mode=0755,nodev,strictatime".
# Let's override some of them, and test the behaviour of "ro".
TemporaryFileSystem=/var:ro,mode=0700,nostrictatime

# Check /proc/self/mountinfo
ExecStart=/bin/sh -x -c 'test "$$(awk \'$$5 == "/var" && $$11 !~ /(^|,)ro(,|$)/ { print $$6 }\' /proc/self/mountinfo)" == ""'
ExecStart=/bin/sh -x -c 'test "$$(awk \'$$5 == "/var" && $$11 !~ /(^|,)mode=700(,|$)/ { print $$6 }\' /proc/self/mountinfo)" == ""'

ExecStart=/bin/sh -x -c 'test "$$(awk \'$$5 == "/var" && $$6 !~ /(^|,)ro(,|$)/ { print $$6 }\' /proc/self/mountinfo)" == ""'
ExecStart=/bin/sh -x -c 'test "$$(awk \'$$5 == "/var" && $$6 !~ /(^|,)nodev(,|$)/ { print $$6 }\' /proc/self/mountinfo)" == ""'
ExecStart=/bin/sh -x -c 'test "$$(awk \'$$5 == "/var" && $$6 ~ /(^|,)strictatime(,|$)/ { print $$6 }\' /proc/self/mountinfo)" == ""'

@yuwata are you able to resolve this soon or should I submit something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. Can I include this in a PR that makes ReadWritePaths= can work with TemporaryFileSystem=.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are welcome to take the above code, I just want to make sure the tests not left broken (on Fedora) for too long. If you think the ReadWritePaths= will be quick to merge then I guess you can include them together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just changed my mind. This is now pushed as #8241. Please take a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourcejedi, @yuwata thank you for looking at this. I've already fixed the test locally, but I hesitated to show it because, well, here it is:

diff --git a/test/test-execute/exec-temporaryfilesystem-options.service b/test/test-execute/exec-temporaryfilesystem-options.service
index 1d5d76c81..d63f66fd2 100644
--- a/test/test-execute/exec-temporaryfilesystem-options.service
+++ b/test/test-execute/exec-temporaryfilesystem-options.service
@@ -5,7 +5,13 @@ Description=Test for TemporaryFileSystem with mount options
 Type=oneshot

 # Check /proc/self/mountinfo
-ExecStart=/bin/sh -c 'test $$(awk \'$$5 == "/var" { print $$6 }\' /proc/self/mountinfo) = "ro,nodev,relatime"'
-ExecStart=/bin/sh -c 'test $$(awk \'$$5 == "/var" { print $$11 }\' /proc/self/mountinfo) = "ro,mode=700"'
+ExecStart=/usr/bin/perl -alne 'if ($$F[4] eq "/var") { \
+  my %%a = map { $$_ => 1 } split(",", $$F[10]); \
+  if (!@a{ro,"mode=700"}) {next}; \
+  my %%b = map { $$_ => 1 } split(",", $$F[5]); \
+  if (@b{ro,nodev,relatime}) {$$found=1; exit}; \
+} \
+END {exit($$found ? 0 : 1)}' \
+/proc/self/mountinfo

 TemporaryFileSystem=/var:ro,mode=0700,nostrictatime

Don't ask me how and why it works. I don't know :-)

yuwata pushed a commit to yuwata/systemd that referenced this pull request Feb 21, 2018
This makes test-execute work on SELinux enabled systems.

Fixes the issue reported at
systemd#7908 (comment)
poettering pushed a commit that referenced this pull request Feb 21, 2018
This makes test-execute work on SELinux enabled systems.

Fixes the issue reported at
#7908 (comment)
yuwata added a commit to yuwata/systemd that referenced this pull request Feb 22, 2018
…stem=

This makes Read{Write,Only}Paths= work with TemporaryFileSystem= or
ProtectHome=tmpfs.

Follow-up for systemd#7908.
Closes systemd#7895, systemd#7153, and systemd#2780.
yuwata added a commit to yuwata/systemd that referenced this pull request Feb 22, 2018
…stem=

This makes Read{Write,Only}Paths= work with TemporaryFileSystem= or
ProtectHome=tmpfs.

Follow-up for systemd#7908.
Closes systemd#7895, systemd#7153, and systemd#2780.
yuwata added a commit to yuwata/systemd that referenced this pull request Feb 26, 2018
…stem=

This makes Read{Write,Only}Paths= work with TemporaryFileSystem= or
ProtectHome=tmpfs.

Follow-up for systemd#7908.
Closes systemd#7895, systemd#7153, and systemd#2780.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging this pull request may close these issues.

None yet

7 participants