New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for setting sysctls #19265

Merged
merged 1 commit into from Apr 13, 2016

Conversation

Projects
None yet
@rhatdan
Contributor

rhatdan commented Jan 12, 2016

This patch will allow users to specify namespace specific "kernel parameters"
for running inside of a container.

Dan Walsh dwalsh@redhat.com

Signed-off-by: Dan Walsh dwalsh@redhat.com

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jan 12, 2016

Contributor

This pull request replaces #16632

Contributor

rhatdan commented Jan 12, 2016

This pull request replaces #16632

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Jan 12, 2016

Contributor

Opened pull request for engine-api docker/engine-api#38

Contributor

rhatdan commented Jan 12, 2016

Opened pull request for engine-api docker/engine-api#38

Show outdated Hide outdated opts/opts.go
// ValidateSysctl validates an sysctl and returns it.
func ValidateSysctl(val string) (string, error) {
validSysctlMap := map[string]bool{
"kernel.msgmax": true,

This comment has been minimized.

@cpuguy83

cpuguy83 Jan 29, 2016

Contributor

Are all these actually namespaced from 3.10+?

@cpuguy83

cpuguy83 Jan 29, 2016

Contributor

Are all these actually namespaced from 3.10+?

This comment has been minimized.

@rhatdan

rhatdan Jan 29, 2016

Contributor
man namespaces
...
   IPC namespaces (CLONE_NEWIPC)
...

       *  The System V IPC interfaces in /proc/sys/kernel, namely: msgmax,
          msgmnb, msgmni, sem, shmall, shmmax, shmmni, and shm_rmid_forced.
@rhatdan

rhatdan Jan 29, 2016

Contributor
man namespaces
...
   IPC namespaces (CLONE_NEWIPC)
...

       *  The System V IPC interfaces in /proc/sys/kernel, namely: msgmax,
          msgmnb, msgmni, sem, shmall, shmmax, shmmni, and shm_rmid_forced.

This comment has been minimized.

@mysterlune

mysterlune Feb 6, 2016

would you consider adding kernel.perf_event_paranoid and kernel.kptr_restrict to the whitelist?

as a use case: i'm working on a proof of concept around flame graphs, visualizing perf event data in a containerized node.js app following along the work here https://gist.github.com/trevnorris/9616784.

have had a devil of a time sorting out getting these kernel parameters to stick in a running container. came upon your PR, and am stoked to see a --sysctl flag in the works.

@mysterlune

mysterlune Feb 6, 2016

would you consider adding kernel.perf_event_paranoid and kernel.kptr_restrict to the whitelist?

as a use case: i'm working on a proof of concept around flame graphs, visualizing perf event data in a containerized node.js app following along the work here https://gist.github.com/trevnorris/9616784.

have had a devil of a time sorting out getting these kernel parameters to stick in a running container. came upon your PR, and am stoked to see a --sysctl flag in the works.

This comment has been minimized.

@rhatdan

rhatdan Feb 8, 2016

Contributor

@mysterlune Do you know if these sysctls are namespaced? If you set them inside of the container, do the settings show up outside of the container? If not, then we would add them, if yes then we will not add them.

You could test this by running a --privileged container.

@rhatdan

rhatdan Feb 8, 2016

Contributor

@mysterlune Do you know if these sysctls are namespaced? If you set them inside of the container, do the settings show up outside of the container? If not, then we would add them, if yes then we will not add them.

You could test this by running a --privileged container.

This comment has been minimized.

@mysterlune

mysterlune Feb 8, 2016

hey @rhatdan,

so i ran the container with the --privileged flag as such (as you can see, i'm on a mac; unsure if this matters to this experiment):

docker run -d -p 8080:8080 --name foobar -v /Users/rlune/gitproj/observable_node_poc:/src -v /Users/rlune/gitproj/observable_node_poc:/tmp --privileged observable-node-poc

... and got:

d137a82d7731ab095c484d0e70dd96081c275cf43c50100128ad5ea44e40cb01

... then, exec'd a bash session:

docker exec -it d137a82d7731ab095c484d0e70dd96081c275cf43c50100128ad5ea44e40cb01 bash

... checked the host machine for the kernel flags next...:

16:24 rlune@oakm111430f01:~/gitproj/observable_node_poc $ sysctl -a | grep kernel.perf_event_paranoid
16:25 rlune@oakm111430f01:~/gitproj/observable_node_poc $ sysctl -a | grep kernel.kptr_restrict

... received no output for each lookup. then, set the flags on the instance running in the container:

[root@d137a82d7731 app]# sysctl -w kernel.kptr_restrict=1
kernel.kptr_restrict = 1
[root@d137a82d7731 app]# sysctl -w kernel.perf_event_paranoid=1
kernel.perf_event_paranoid = 1

... read the keys on the image:

[root@d137a82d7731 app]# sysctl -a | grep kernel.kptr_restrict      
kernel.kptr_restrict = 1
sysctl: reading key "net.ipv6.conf.all.stable_secret"
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.eth0.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"

[and got the same sort of output for the `kernel.perf_event_paranoid` also]

... then checked the host machine again:

16:24 rlune@oakm111430f01:~/gitproj/observable_node_poc $ sysctl -a | grep kernel.perf_event_paranoid
16:25 rlune@oakm111430f01:~/gitproj/observable_node_poc $ sysctl -a | grep kernel.kptr_restrict

... and did not see any lookup output for those keys.

is this test sufficient to determine that the kernel.perf_event_paranoid and kernel.kptr_restrict keys are namespaced in a way that will allow their addition to this PR?

@mysterlune

mysterlune Feb 8, 2016

hey @rhatdan,

so i ran the container with the --privileged flag as such (as you can see, i'm on a mac; unsure if this matters to this experiment):

docker run -d -p 8080:8080 --name foobar -v /Users/rlune/gitproj/observable_node_poc:/src -v /Users/rlune/gitproj/observable_node_poc:/tmp --privileged observable-node-poc

... and got:

d137a82d7731ab095c484d0e70dd96081c275cf43c50100128ad5ea44e40cb01

... then, exec'd a bash session:

docker exec -it d137a82d7731ab095c484d0e70dd96081c275cf43c50100128ad5ea44e40cb01 bash

... checked the host machine for the kernel flags next...:

16:24 rlune@oakm111430f01:~/gitproj/observable_node_poc $ sysctl -a | grep kernel.perf_event_paranoid
16:25 rlune@oakm111430f01:~/gitproj/observable_node_poc $ sysctl -a | grep kernel.kptr_restrict

... received no output for each lookup. then, set the flags on the instance running in the container:

[root@d137a82d7731 app]# sysctl -w kernel.kptr_restrict=1
kernel.kptr_restrict = 1
[root@d137a82d7731 app]# sysctl -w kernel.perf_event_paranoid=1
kernel.perf_event_paranoid = 1

... read the keys on the image:

[root@d137a82d7731 app]# sysctl -a | grep kernel.kptr_restrict      
kernel.kptr_restrict = 1
sysctl: reading key "net.ipv6.conf.all.stable_secret"
sysctl: reading key "net.ipv6.conf.default.stable_secret"
sysctl: reading key "net.ipv6.conf.eth0.stable_secret"
sysctl: reading key "net.ipv6.conf.lo.stable_secret"

[and got the same sort of output for the `kernel.perf_event_paranoid` also]

... then checked the host machine again:

16:24 rlune@oakm111430f01:~/gitproj/observable_node_poc $ sysctl -a | grep kernel.perf_event_paranoid
16:25 rlune@oakm111430f01:~/gitproj/observable_node_poc $ sysctl -a | grep kernel.kptr_restrict

... and did not see any lookup output for those keys.

is this test sufficient to determine that the kernel.perf_event_paranoid and kernel.kptr_restrict keys are namespaced in a way that will allow their addition to this PR?

This comment has been minimized.

@rhatdan

rhatdan Feb 8, 2016

Contributor

Not sure if it is enough, but it certainly looks good. I guess if you ran another container and made sure they were different.

@jeremyeder PTAL

@rhatdan

rhatdan Feb 8, 2016

Contributor

Not sure if it is enough, but it certainly looks good. I guess if you ran another container and made sure they were different.

@jeremyeder PTAL

This comment has been minimized.

@mysterlune

mysterlune Feb 8, 2016

dang it... it doesn't look like those flags are namespaced if i run two containers and set the flags within one of them... they show up as such in the other...

so, how does a flag get "namespaced" so that this collision doesn't happen?

@mysterlune

mysterlune Feb 8, 2016

dang it... it doesn't look like those flags are namespaced if i run two containers and set the flags within one of them... they show up as such in the other...

so, how does a flag get "namespaced" so that this collision doesn't happen?

This comment has been minimized.

@rhatdan

rhatdan Feb 8, 2016

Contributor

You write a kernel patch. :^(

@rhatdan

rhatdan Feb 8, 2016

Contributor

You write a kernel patch. :^(

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Feb 9, 2016

Contributor

Now that we are in docker-1.11 could we get this pull request moving. @calavera @cpuguy83

Contributor

rhatdan commented Feb 9, 2016

Now that we are in docker-1.11 could we get this pull request moving. @calavera @cpuguy83

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Feb 18, 2016

Member

We're okay with this, moving to code review, unless @crosbymichael has major concerns

Member

thaJeztah commented Feb 18, 2016

We're okay with this, moving to code review, unless @crosbymichael has major concerns

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah
Member

thaJeztah commented Feb 29, 2016

ping @calavera @cpuguy83 PTAL

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Feb 29, 2016

Contributor

Just need to update the engine-api vendor, otherwise code LGTM.

Contributor

cpuguy83 commented Feb 29, 2016

Just need to update the engine-api vendor, otherwise code LGTM.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Mar 2, 2016

Member

Looks like this needs another rebase @rhatdan 😢

Member

thaJeztah commented Mar 2, 2016

Looks like this needs another rebase @rhatdan 😢

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Mar 2, 2016

Contributor

Done.

Contributor

rhatdan commented Mar 2, 2016

Done.

@thaJeztah thaJeztah referenced this pull request Mar 8, 2016

Closed

sysctl tunables #4717

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Mar 8, 2016

Contributor

We are looking into allowing ping to work when running in prctl(NO_NEW_PRIVS) mode.

If we get this patch you would be able to set net.ipv4.ping_group_range "0 4294967295", and NO_NEW_PRIVS and users could still use ping inside of their container, without using any SETUID apps.

Contributor

rhatdan commented Mar 8, 2016

We are looking into allowing ping to work when running in prctl(NO_NEW_PRIVS) mode.

If we get this patch you would be able to set net.ipv4.ping_group_range "0 4294967295", and NO_NEW_PRIVS and users could still use ping inside of their container, without using any SETUID apps.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Mar 8, 2016

Member

@rhatdan can you rebase? looks like your change to engine-api is in master now, so the "vendor" check should also pass after a rebase (https://github.com/docker/docker/blob/master/vendor/src/github.com/docker/engine-api/types/container/host_config.go#L266)

Member

thaJeztah commented Mar 8, 2016

@rhatdan can you rebase? looks like your change to engine-api is in master now, so the "vendor" check should also pass after a rebase (https://github.com/docker/docker/blob/master/vendor/src/github.com/docker/engine-api/types/container/host_config.go#L266)

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Mar 8, 2016

Contributor

@thaJeztah done

Contributor

rhatdan commented Mar 8, 2016

@thaJeztah done

@justincormack

This comment has been minimized.

Show comment
Hide comment
@justincormack

justincormack Mar 9, 2016

Contributor

@rhatdan yes I am interested in the ping use case, mainly as it would allow dropping cap_net_raw. Not sure if it would be best presented as a specific option though in addition to a general facility.

Contributor

justincormack commented Mar 9, 2016

@rhatdan yes I am interested in the ping use case, mainly as it would allow dropping cap_net_raw. Not sure if it would be best presented as a specific option though in addition to a general facility.

@rhatdan

This comment has been minimized.

Show comment
Hide comment
@rhatdan

rhatdan Mar 9, 2016

Contributor

One problem with switching the ping, is I am not sure if all ping commands can handle it. We see this working on Fedora machines, but I am not sure if it works on RHEL7. Probably won't on older distributions like RHEL6.

Contributor

rhatdan commented Mar 9, 2016

One problem with switching the ping, is I am not sure if all ping commands can handle it. We see this working on Fedora machines, but I am not sure if it works on RHEL7. Probably won't on older distributions like RHEL6.

@justincormack

This comment has been minimized.

Show comment
Hide comment
@justincormack

justincormack Mar 9, 2016

Contributor

Yes, I haven't tested it on many distros yet, it would clearly have to be an option, but being able to drop cap_net_raw is a bit advantage.

Contributor

justincormack commented Mar 9, 2016

Yes, I haven't tested it on many distros yet, it would clearly have to be an option, but being able to drop cap_net_raw is a bit advantage.

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Mar 10, 2016

Contributor

LGTM

Contributor

cpuguy83 commented Mar 10, 2016

LGTM

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Mar 10, 2016

Contributor

@LK4D4 Found an issue I didn't think of.
When using --net=host or --ipc=host, the --sysctl settings do adjust their respective host properties.

I think this is generally undesirable to allow.

Contributor

cpuguy83 commented Mar 10, 2016

@LK4D4 Found an issue I didn't think of.
When using --net=host or --ipc=host, the --sysctl settings do adjust their respective host properties.

I think this is generally undesirable to allow.

@LK4D4

This comment has been minimized.

Show comment
Hide comment
@LK4D4

LK4D4 Mar 10, 2016

Contributor

@cpuguy83 @rhatdan however, libcontainer does validation in a newer version. So we will be ok after an update. @mrunalp wrote the patch to libcontainer, so I think he can help with a review here, too.

Contributor

LK4D4 commented Mar 10, 2016

@cpuguy83 @rhatdan however, libcontainer does validation in a newer version. So we will be ok after an update. @mrunalp wrote the patch to libcontainer, so I think he can help with a review here, too.

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Apr 13, 2016

Member

thanks! LGTM

Member

thaJeztah commented Apr 13, 2016

thanks! LGTM

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah
Member

thaJeztah commented Apr 13, 2016

ping @vdemeester ptal

@vdemeester

This comment has been minimized.

Show comment
Hide comment
@vdemeester

vdemeester Apr 13, 2016

Member

LGTM 🐰
gccgo is SUCCESS and windowsTP5 fails with a known flakey test.. merging 😉

Member

vdemeester commented Apr 13, 2016

LGTM 🐰
gccgo is SUCCESS and windowsTP5 fails with a known flakey test.. merging 😉

@vdemeester vdemeester merged commit 988508a into moby:master Apr 13, 2016

6 of 8 checks passed

windowsTP5 Jenkins build Docker-PRs-WoW-TP5 483 has failed
Details
gccgo Jenkins build Docker-PRs-gccgo 4358 is running
Details
docker/dco-signed All commits signed
Details
documentation success
Details
experimental Jenkins build Docker-PRs-experimental 17530 has succeeded
Details
janky Jenkins build Docker-PRs 26344 has succeeded
Details
userns Jenkins build Docker-PRs-userns 8564 has succeeded
Details
win2lin Jenkins build Docker-PRs-Win2Lin 24784 has succeeded
Details
@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Apr 13, 2016

Member

🎉 thanks @rhatdan!

Member

thaJeztah commented Apr 13, 2016

🎉 thanks @rhatdan!

@brunoborges

This comment has been minimized.

Show comment
Hide comment
@brunoborges

brunoborges Apr 28, 2016

Has anybody tested --sysctl to tweak net.ipv4.tcp_syn_retries to reduce global timeout for TCP connections inside the container? Does it work?

brunoborges commented Apr 28, 2016

Has anybody tested --sysctl to tweak net.ipv4.tcp_syn_retries to reduce global timeout for TCP connections inside the container? Does it work?

ingvagabund referenced this pull request in hodovska/kubernetes May 9, 2016

@jimmyxian jimmyxian referenced this pull request Jun 6, 2016

Open

Keep track of the missing API in Swarm #2333

4 of 23 tasks complete

runcom added a commit to runcom/docker that referenced this pull request Jun 8, 2016

BACKPORT: Add support for setting sysctls
Upstream reference: moby#19265

This patch will allow users to specify namespace specific "kernel parameters"
for running inside of a container.

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
@vingrad

This comment has been minimized.

Show comment
Hide comment
@vingrad

vingrad Jun 9, 2016

How can I use sysctl option with a compose file?

vingrad commented Jun 9, 2016

How can I use sysctl option with a compose file?

@thaJeztah

This comment has been minimized.

Show comment
Hide comment
@thaJeztah

thaJeztah Jun 9, 2016

Member

@vingrad this feature is not released yet (it will be in docker 1.12), also, that's a better question for the docker compose issue tracker; https://github.com/docker/compose/issues

Member

thaJeztah commented Jun 9, 2016

@vingrad this feature is not released yet (it will be in docker 1.12), also, that's a better question for the docker compose issue tracker; https://github.com/docker/compose/issues

@mdshuai mdshuai referenced this pull request Jul 12, 2016

Closed

Add sysctl proposal #26057

12 of 12 tasks complete

@BillMills BillMills referenced this pull request Aug 13, 2016

Closed

Docker help #185

@ebr ebr referenced this pull request Aug 15, 2016

Closed

Support --sysctl #5730

@afirth afirth referenced this pull request Jul 20, 2017

Closed

How to solve this warning #82

liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017

Add support for setting sysctls
This patch will allow users to specify namespace specific "kernel parameters"
for running inside of a container.

cherry-pick from: moby#19265

conflicts:
        contrib/completion/bash/docker
        docs/reference/commandline/create.md
        docs/reference/commandline/run.md
        man/docker-create.1.md
        man/docker-run.1.md
        runconfig/opts/parse.go

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Signed-off-by: Lei Jitang <leijitang@huawei.com>

liusdu pushed a commit to liusdu/moby that referenced this pull request Oct 30, 2017

Merge branch 'sysctl' into 'huawei-1.11.2'
Add support for setting sysctls

This patch will allow users to specify namespace specific "kernel parameters"
for running inside of a container.

cherry-pick from: moby#19265
conflicts:
        contrib/completion/bash/docker
        docs/reference/commandline/create.md
        docs/reference/commandline/run.md
        man/docker-create.1.md
        man/docker-run.1.md
        runconfig/opts/parse.go

Signed-off-by: Dan Walsh <dwalsh@redhat.com>
Signed-off-by: Lei Jitang <leijitang@huawei.com>

This is a feature request from euleros.


See merge request docker/docker!385
@merryisfriend

This comment has been minimized.

Show comment
Hide comment
@merryisfriend

merryisfriend Dec 6, 2017

Now can we use sysctl option with a dockerfile? I have some of parameters to set in container. @rhatdan

merryisfriend commented Dec 6, 2017

Now can we use sysctl option with a dockerfile? I have some of parameters to set in container. @rhatdan

@cpuguy83

This comment has been minimized.

Show comment
Hide comment
@cpuguy83

cpuguy83 Dec 6, 2017

Contributor

No, you can't persist sysctls in an image.

Contributor

cpuguy83 commented Dec 6, 2017

No, you can't persist sysctls in an image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment