New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stage1/init: create a new mount ns for each app #2603

Merged
merged 3 commits into from May 12, 2016

Conversation

Projects
None yet
3 participants
@iaguis
Member

iaguis commented May 10, 2016

Up to this point, you could escape the app's chroot easily by using a
simple program downloaded from the internet [1](http://www.unixwiz.net/techtips/chroot-practices.html). To avoid this, we now
create a new mount namespace per each app.

You can still escape the chroot if you have CAP_SYS_PTRACE and access
/proc/1/root but this fixes the issue for images that don't need this
capability.

@jonboulle

This comment has been minimized.

Show comment
Hide comment
@jonboulle

jonboulle May 11, 2016

Contributor

Awesome.
How confident are we that this won't have any impact?
Do we have a test for multiple apps in a pod w/external volumes mounted for example?

Contributor

jonboulle commented May 11, 2016

Awesome.
How confident are we that this won't have any impact?
Do we have a test for multiple apps in a pod w/external volumes mounted for example?

@iaguis

This comment has been minimized.

Show comment
Hide comment
@iaguis

iaguis May 11, 2016

Member

How confident are we that this won't have any impact?

As confident as we trust our functional tests...

Do we have a test for multiple apps in a pod w/external volumes mounted for example?

I just tried it and it works but I don't think we have a test, I'll add one.

Member

iaguis commented May 11, 2016

How confident are we that this won't have any impact?

As confident as we trust our functional tests...

Do we have a test for multiple apps in a pod w/external volumes mounted for example?

I just tried it and it works but I don't think we have a test, I'll add one.

@alban

This comment has been minimized.

Show comment
Hide comment
@alban

alban May 11, 2016

Member

systemd will create a new mount namespace anyway when we start using ReadOnlyDirectories= from #2487. Then, it will be disconnected from stage1 by configuring it as slave even if MountFlags is not set to slave.

It seems that systemd does not have an option to explicitely set MountFlags as slave+shared. If we want slave+shared, it seems we can do that with a combination of ReadOnlyDirectories/ReadWriteDirectories and MountFlags=shared.

Maybe we need both could always use "ReadWriteDirectories=/" (or "ReadOnlyDirectories=/" for #2487) and "MountFlags=shared" to ensure slave+shared.

Member

alban commented May 11, 2016

systemd will create a new mount namespace anyway when we start using ReadOnlyDirectories= from #2487. Then, it will be disconnected from stage1 by configuring it as slave even if MountFlags is not set to slave.

It seems that systemd does not have an option to explicitely set MountFlags as slave+shared. If we want slave+shared, it seems we can do that with a combination of ReadOnlyDirectories/ReadWriteDirectories and MountFlags=shared.

Maybe we need both could always use "ReadWriteDirectories=/" (or "ReadOnlyDirectories=/" for #2487) and "MountFlags=shared" to ensure slave+shared.

@iaguis

This comment has been minimized.

Show comment
Hide comment
@iaguis

iaguis May 11, 2016

Member

Ack.

I'll change my patch to ReadWriteDirectories="/" + MountFlags=shared.

That ensures the mount ns is mounted as slave+shared. Even though the default for MountFlags is shared and we don't need to explicitly set it, let's be explicit and set it anyway.

Using ReadWriteDirectories="/" makes our read-only mounts read-write so that doesn't work.

I'll use MountFlags=shared which also creates a new mount ns and sets the mount as shared+slave.

I'll also add tests that check:

  • External volumes work for pods with multiple apps
  • Apps are in a mount ns that's different from stage1's
  • Apps' mounts are shared+slave of stage1's peer group
Member

iaguis commented May 11, 2016

Ack.

I'll change my patch to ReadWriteDirectories="/" + MountFlags=shared.

That ensures the mount ns is mounted as slave+shared. Even though the default for MountFlags is shared and we don't need to explicitly set it, let's be explicit and set it anyway.

Using ReadWriteDirectories="/" makes our read-only mounts read-write so that doesn't work.

I'll use MountFlags=shared which also creates a new mount ns and sets the mount as shared+slave.

I'll also add tests that check:

  • External volumes work for pods with multiple apps
  • Apps are in a mount ns that's different from stage1's
  • Apps' mounts are shared+slave of stage1's peer group

iaguis added some commits May 9, 2016

stage1/init: create a new mount ns for each app
Up to this point, you could escape the app's chroot easily by using a
simple program downloaded from the internet [[1]][1]. To avoid this, we
now create a new mount namespace per each app.

You can still escape the chroot if you have CAP_SYS_PTRACE and access
`/proc/1/root` but this fixes the issue for images that don't need this
capability.

[1]: http://www.unixwiz.net/techtips/chroot-practices.html
@jonboulle

This comment has been minimized.

Show comment
Hide comment
@jonboulle

jonboulle May 12, 2016

Contributor

still waiting on some tests?

Contributor

jonboulle commented May 12, 2016

still waiting on some tests?

@iaguis

This comment has been minimized.

Show comment
Hide comment
@iaguis

iaguis May 12, 2016

Member

Yep, the last two tests mentioned in #2603 (comment)

Member

iaguis commented May 12, 2016

Yep, the last two tests mentioned in #2603 (comment)

@iaguis

This comment has been minimized.

Show comment
Hide comment
@iaguis

iaguis May 12, 2016

Member

Last test seems more complicated to do, maybe we can do it later. PTAL

Member

iaguis commented May 12, 2016

Last test seems more complicated to do, maybe we can do it later. PTAL

@iaguis

This comment has been minimized.

Show comment
Hide comment
@iaguis
Member

iaguis commented May 12, 2016

cc @alban

Show outdated Hide outdated tests/inspect/inspect.go Outdated
@jonboulle

This comment has been minimized.

Show comment
Hide comment
@jonboulle

jonboulle May 12, 2016

Contributor

LGTM

External volumes work for pods with multiple apps

this is the main one I care about.

Apps' mounts are shared+slave of stage1's peer group

Let's do this in a follow up.

Contributor

jonboulle commented May 12, 2016

LGTM

External volumes work for pods with multiple apps

this is the main one I care about.

Apps' mounts are shared+slave of stage1's peer group

Let's do this in a follow up.

@iaguis iaguis merged commit ee8458c into rkt:master May 12, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
semaphoreci The build passed on Semaphore.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment