New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support cgroups with limits as rootless #1540

Merged
merged 2 commits into from Oct 16, 2017

Conversation

Projects
None yet
6 participants
@williammartin
Contributor

williammartin commented Aug 1, 2017

Following on from the rootless cgroups discussion in #1457, this PR provides an initial implementation for rootless cgroups that fulfils Garden's needs. This does not enable all cgroup related functiionality, but we think it's a good starting point for other currently disabled features to work.

In more detail, this PR removes the current rootless cgroup implementation, and using cgroups in rootless mode should now work broadly the same as rootful, providing that runc has permissions on the cgroup path.

Differences with Rootful

Typically, without permissions on the cgroup path, we would expect an error when applying (entering a pid into a controller), however we now support a kind of "opportunistic" cgroup usage when no limits have been set and no cgroup path has been provided. Since child processes are entered into the same cgroup as their parent, this is effectively the same (regarding cgroup enforcement) as not providing a cgroup path and not setting limits. Attempting to set (change a resource limit) will still result in a permission denied error either during creation or via runc update but hopefully with a slightly more informative error.

The devices cgroup doesn't have all functionality available for setting limits. This is because there is a requirement on CAP_SYS_ADMIN for the devices.allow and devices.deny files. We're not sure of a way to provide different device white/blacklisting per container, but a possible solution for some use cases is to set a static list in a parent cgroup that is inherited. This works because applying does not require CAP_SYS_ADMIN.

Disabled features

We haven't enabled OOM notification or Memory pressure notificiation but this is hopefully as simple as removing the rootless conditional:

if c.config.Rootless {

We haven't enabled CRIU features because we aren't familiar with what is required.

Notes

The BATS added seem to cover the right features, however, we aren't super happy with the changes to the Makefile. Thoughts on how to solve the requirement that a cgroup exists, chowned to rootless in a nicer way would be much appreciated.

Signed-off-by: Ed King eking@pivotal.io

Show outdated Hide outdated Makefile
Show outdated Hide outdated Makefile
Show outdated Hide outdated libcontainer/cgroups/fs/apply_raw.go
Show outdated Hide outdated libcontainer/cgroups/fs/apply_raw.go
@@ -83,8 +83,7 @@ func (p *setnsProcess) start() (err error) {
if err = p.execSetns(); err != nil {
return newSystemErrorWithCause(err, "executing setns process")
}
// We can't join cgroups if we're in a rootless container.
if !p.config.Rootless && len(p.cgroupPaths) > 0 {

This comment has been minimized.

@cyphar

cyphar Aug 3, 2017

Member

This check is still correct in some cases, but I guess erroring out is acceptable if someone explicitly asked for an impossible cgroup configuration (now that we could in principle nest things). I would like to see a test for this though.

@cyphar

cyphar Aug 3, 2017

Member

This check is still correct in some cases, but I guess erroring out is acceptable if someone explicitly asked for an impossible cgroup configuration (now that we could in principle nest things). I would like to see a test for this though.

This comment has been minimized.

@teddyking

teddyking Aug 16, 2017

Contributor

We're not clear under what circumstances the rootless check still makes sense? Can you give an example please?

p.cgroupPaths is loaded from the state.json which best we can tell has to be the same as the cgroup paths the init process is in, so barring people doing weird things, if you succeeded on create, you should succeed now?

@teddyking

teddyking Aug 16, 2017

Contributor

We're not clear under what circumstances the rootless check still makes sense? Can you give an example please?

p.cgroupPaths is loaded from the state.json which best we can tell has to be the same as the cgroup paths the init process is in, so barring people doing weird things, if you succeeded on create, you should succeed now?

This comment has been minimized.

@cyphar

cyphar Aug 21, 2017

Member

I'm actually not sure what I meant either, sorry about that. This change is fine, but I would like to see a runc exec test with rootless cgroups in use, to make sure this works fine.

@cyphar

cyphar Aug 21, 2017

Member

I'm actually not sure what I meant either, sorry about that. This change is fine, but I would like to see a runc exec test with rootless cgroups in use, to make sure this works fine.

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Aug 3, 2017

Member

Just to be clear, I'm very impressed that there were only two changes necessary to make the cgroups/fs driver opportunistic and this looks like it's on the right track. My main gripes are just with how we're testing stuff.

I'll look over the limitations you've listed when I get a chance.

Member

cyphar commented Aug 3, 2017

Just to be clear, I'm very impressed that there were only two changes necessary to make the cgroups/fs driver opportunistic and this looks like it's on the right track. My main gripes are just with how we're testing stuff.

I'll look over the limitations you've listed when I get a chance.

@cyphar cyphar added this to the 1.1.0 milestone Aug 11, 2017

@yastij

This comment has been minimized.

Show comment
Hide comment
@yastij

yastij Aug 11, 2017

Thanks so much guys for the PR !

yastij commented Aug 11, 2017

Thanks so much guys for the PR !

@teddyking

This comment has been minimized.

Show comment
Hide comment
@teddyking

teddyking Aug 16, 2017

Contributor

Thanks very much for the review. Hope we responded to all the comments.

--
@williammartin and Me.

Contributor

teddyking commented Aug 16, 2017

Thanks very much for the review. Hope we responded to all the comments.

--
@williammartin and Me.

@teddyking

This comment has been minimized.

Show comment
Hide comment
@teddyking

teddyking Sep 1, 2017

Contributor

We've rebased and pushed a few changes, specifically:

  • Separate out rootless and rootless+cgroups integration tests
  • Added a rootless+cgroups exec test
  • Shortened the very long error message
  • Moved the m.Paths[sys.Name()] = p back to where it was originally

Hopefully that addresses most of the comments. We can also squash the additional commits if that'd be preferred.

Contributor

teddyking commented Sep 1, 2017

We've rebased and pushed a few changes, specifically:

  • Separate out rootless and rootless+cgroups integration tests
  • Added a rootless+cgroups exec test
  • Shortened the very long error message
  • Moved the m.Paths[sys.Name()] = p back to where it was originally

Hopefully that addresses most of the comments. We can also squash the additional commits if that'd be preferred.

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Sep 2, 2017

Member

First-pass this looks good, squashing into relevant commits would be preferred. I'm going to test this out over the next few days, and then LGTM if it all looks good. Thanks so much for working on this @williammartin, @jszroberto, and @teddyking! ❤️

Member

cyphar commented Sep 2, 2017

First-pass this looks good, squashing into relevant commits would be preferred. I'm going to test this out over the next few days, and then LGTM if it all looks good. Thanks so much for working on this @williammartin, @jszroberto, and @teddyking! ❤️

@williammartin

This comment has been minimized.

Show comment
Hide comment
@williammartin

williammartin Sep 3, 2017

Contributor

@cyphar No problem, thanks for keeping on top of this and providing useful pointers!

Contributor

williammartin commented Sep 3, 2017

@cyphar No problem, thanks for keeping on top of this and providing useful pointers!

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Sep 7, 2017

Member

The issues with test extensibility and making sure opportunistic features are tested correctly are solved in #1529. I can help you rebase this PR's tests on the new tests/rootless.sh setup once it's merged.

Member

cyphar commented Sep 7, 2017

The issues with test extensibility and making sure opportunistic features are tested correctly are solved in #1529. I can help you rebase this PR's tests on the new tests/rootless.sh setup once it's merged.

@teddyking

This comment has been minimized.

Show comment
Hide comment
@teddyking

teddyking Sep 7, 2017

Contributor

Awesome, sounds good to me. I've squashed the commits here into one so hopefully it shouldn't be too difficult to rebase once #1529 is merged in.

Contributor

teddyking commented Sep 7, 2017

Awesome, sounds good to me. I've squashed the commits here into one so hopefully it shouldn't be too difficult to rebase once #1529 is merged in.

@teddyking

This comment has been minimized.

Show comment
Hide comment
@teddyking

teddyking Sep 13, 2017

Contributor

We've rebased on master and taken a stab at updating this PR's tests to fit the new tests/rootless.sh way of donig things.
We had a few issues with the update.bats tests (and I've just seen the jenkins build has failed on these ... will take a look at that now).

Here's a brief overview of the changes since last push:

  • Add $TESTFLAGS to tests/rootless.sh
  • Add "cgroups" to ALL_FEATURES with corresponding enable/disable_ funcs
  • Remove requires root from cgroup integration tests that previously had it
  • In helpers.bash we updated the runc_spec function to ensure that the config.json included a Linux.Resources object, as this is expected by the sed command in setup() func in updates.bash
  • This does raise the question as to whether runc spec --rootless should generate this by default?
Contributor

teddyking commented Sep 13, 2017

We've rebased on master and taken a stab at updating this PR's tests to fit the new tests/rootless.sh way of donig things.
We had a few issues with the update.bats tests (and I've just seen the jenkins build has failed on these ... will take a look at that now).

Here's a brief overview of the changes since last push:

  • Add $TESTFLAGS to tests/rootless.sh
  • Add "cgroups" to ALL_FEATURES with corresponding enable/disable_ funcs
  • Remove requires root from cgroup integration tests that previously had it
  • In helpers.bash we updated the runc_spec function to ensure that the config.json included a Linux.Resources object, as this is expected by the sed command in setup() func in updates.bash
  • This does raise the question as to whether runc spec --rootless should generate this by default?
@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Sep 13, 2017

Member

@teddyking Cool, thanks. I'm currently at OSS but I'll take a look at this next week.

Member

cyphar commented Sep 13, 2017

@teddyking Cool, thanks. I'm currently at OSS but I'll take a look at this next week.

@teddyking

This comment has been minimized.

Show comment
Hide comment
@teddyking

teddyking Sep 14, 2017

Contributor

@cyphar great! I pushed a fix for the failing travis build, but it's now failing on the go 1.8.x job only for a totally unrelated reason (error downloading busybox.tar.xz), so it probably just needs to be kicked off again.

Contributor

teddyking commented Sep 14, 2017

@cyphar great! I pushed a fix for the failing travis build, but it's now failing on the go 1.8.x job only for a totally unrelated reason (error downloading busybox.tar.xz), so it probably just needs to be kicked off again.

@cyphar

Here's my comments. They are very minor, and I will do my final review when I get back from OSS (and I have a chance to play with it some more). Overall it looks pretty great, thanks!

Show outdated Hide outdated tests/rootless.sh
Show outdated Hide outdated tests/rootless.sh
}
@test "runc create (rootless + no limits + cgrouppath + no permission) fails with permission error" {
requires rootless

This comment has been minimized.

@cyphar

cyphar Sep 14, 2017

Member

I will think about this a little more when I get back from OSS, but I am not sure that we'd want the "fails with permission error" tests.

@cyphar

cyphar Sep 14, 2017

Member

I will think about this a little more when I get back from OSS, but I am not sure that we'd want the "fails with permission error" tests.

This comment has been minimized.

@williammartin

williammartin Sep 25, 2017

Contributor

Why wouldn't you want these? Test bloat?

@williammartin

williammartin Sep 25, 2017

Contributor

Why wouldn't you want these? Test bloat?

This comment has been minimized.

@cyphar

cyphar Sep 25, 2017

Member

If it ends up working in the future (which may happen if systemd decides to enable the cgroupv2 delegation code), then the test will break. On the other hand, we could just include these and deal with that later.

@cyphar

cyphar Sep 25, 2017

Member

If it ends up working in the future (which may happen if systemd decides to enable the cgroupv2 delegation code), then the test will break. On the other hand, we could just include these and deal with that later.

This comment has been minimized.

@williammartin

williammartin Sep 25, 2017

Contributor

I can buy that. As long as getting a permission error isn't the desired behaviour right now (rather than say, some other error that might bubble up) and we don't want this test for regression, then I say kill the test and save test bloat/potential confusion later.

Would you mind pointing me to some info on the delegation code?

@williammartin

williammartin Sep 25, 2017

Contributor

I can buy that. As long as getting a permission error isn't the desired behaviour right now (rather than say, some other error that might bubble up) and we don't want this test for regression, then I say kill the test and save test bloat/potential confusion later.

Would you mind pointing me to some info on the delegation code?

This comment has been minimized.

@cyphar

cyphar Sep 25, 2017

Member

There's a delegation section in the cgroupv2 documentation. Effectively you can set -o nsdelegate on the root mount of cgroupv2 and then you get permissions to write to a sub-cgroup when you're in a cgroup namespace. I did a quick search for the actual patch but couldn't find it.

@cyphar

cyphar Sep 25, 2017

Member

There's a delegation section in the cgroupv2 documentation. Effectively you can set -o nsdelegate on the root mount of cgroupv2 and then you get permissions to write to a sub-cgroup when you're in a cgroup namespace. I did a quick search for the actual patch but couldn't find it.

Show outdated Hide outdated tests/integration/update.bats
Show outdated Hide outdated tests/rootless.sh
@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Sep 14, 2017

Member

Note to myself: We should add some documentation on how to use the lxcfs PAM module to allow a "normal" user to set up delegated cgroups on their host.

Member

cyphar commented Sep 14, 2017

Note to myself: We should add some documentation on how to use the lxcfs PAM module to allow a "normal" user to set up delegated cgroups on their host.

@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Sep 29, 2017

Member

Okay, the one thing left is the notification scheme for OOM and memory pressure. Effectively I think that we should not allow someone to register for notifications if we did not join a cgroup (and inherited the original cgroups) -- so we'd have to save somewhere whether the cgroup manager (silently) failed to setup cgroups.

I'll work on this after we merge this, since it's not a blocker IMO.

Member

cyphar commented Sep 29, 2017

Okay, the one thing left is the notification scheme for OOM and memory pressure. Effectively I think that we should not allow someone to register for notifications if we did not join a cgroup (and inherited the original cgroups) -- so we'd have to save somewhere whether the cgroup manager (silently) failed to setup cgroups.

I'll work on this after we merge this, since it's not a blocker IMO.

williammartin and others added some commits Sep 15, 2017

Support cgroups with limits as rootless
Signed-off-by: Ed King <eking@pivotal.io>
Signed-off-by: Gabriel Rosenhouse <grosenhouse@pivotal.io>
Signed-off-by: Konstantinos Karampogias <konstantinos.karampogias@swisscom.com>
tests: improve rootless_cg testing
This ensures that we don't hard-code the set of cgroups on the host, as
well as making the permissions granted by rootless.sh much more
restrictive (to improve the scope of testing).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
@cyphar

This comment has been minimized.

Show comment
Hide comment
@cyphar

cyphar Oct 16, 2017

Member

LGTM. The other issues can be fixed in future patches (that I'm working on).

/cc @opencontainers/runc-maintainers

Approved with PullApprove

Member

cyphar commented Oct 16, 2017

LGTM. The other issues can be fixed in future patches (that I'm working on).

/cc @opencontainers/runc-maintainers

Approved with PullApprove

@crosbymichael

This comment has been minimized.

Show comment
Hide comment
@crosbymichael
Member

crosbymichael commented Oct 16, 2017

LGTM

Approved with PullApprove

@crosbymichael crosbymichael merged commit ff4481d into opencontainers:master Oct 16, 2017

2 checks passed

code-review/pullapprove Approved by crosbymichael, cyphar
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@williammartin

This comment has been minimized.

Show comment
Hide comment
@williammartin

williammartin Oct 16, 2017

Contributor

Great stuff. Thanks @cyphar and all!

Contributor

williammartin commented Oct 16, 2017

Great stuff. Thanks @cyphar and all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment