Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting some "safe" per-container sysctls #5095

Closed
erictune opened this issue Mar 5, 2015 · 14 comments
Closed

Allow setting some "safe" per-container sysctls #5095

erictune opened this issue Mar 5, 2015 · 14 comments
Labels
area/isolation area/kubelet-api area/security lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@erictune
Copy link
Member

erictune commented Mar 5, 2015

A GCE ContainerVM used on IRC asked how to set SOMAXCONN within his container.
That is not possible in an unprivileged container because /proc/sys is not writeable.

If the user tries to set containerManifest.containers[i].privileged=true then a validation error occurs.
I can't tell if that validation error is occuring only at the kubelet or also at some layer of the containerVM system. At any rate, the user would at least need to set --allow_privileged=true on the kubelet. Not sure if that is possible with ContainerVM.

For kubernetes, we would not want the user to have to set a blanket allow_privileged on kubelet and apiserver just to tune this parameter.

@erictune erictune added area/security area/kubelet-api sig/node Categorizes an issue or PR as relevant to SIG Node. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Mar 5, 2015
@bgrant0607
Copy link
Member

This (and similar) is a system-wide setting, no? A quick search shows that several applications (e.g., riak, php, mysql) advise tuning /proc/sys parameters. We'll likely need a model for setting these parameters based on pod/container requirements (e.g., black-box configuration vs. semantic awareness of the knobs), and will need to decide what to do when different applications request different values (e.g., scheduling exclusion vs. max. of all requirements).

@thockin
Copy link
Member

thockin commented Mar 5, 2015

It's not clear whether this one in particular is inherited from the system
or re-initialized for each netns

On Thu, Mar 5, 2015 at 9:16 AM, Brian Grant notifications@github.com
wrote:

This (and similar) is a system-wide setting, no? A quick search shows that
several applications (e.g., riak, php, mysql) advise tuning /proc/sys
parameters. We'll likely need a model for setting these parameters based on
pod/container requirements (e.g., black-box configuration vs. semantic
awareness of the knobs), and will need to decide what to do when different
applications request different values (e.g., scheduling exclusion vs. max.
of all requirements).


Reply to this email directly or view it on GitHub
#5095 (comment)
.

@erictune
Copy link
Member Author

The reporter made it sound like it was a per-interface setting.

@erictune
Copy link
Member Author

I have it on good authority that the sysctls in netns_core_table in net/core/sysctl_net_core.c are namespaced, and the others, which are in net_core_table, are not. As of 3.10, only somaxconn is per-netns.

@edevil
Copy link
Contributor

edevil commented Jul 2, 2015

Any progress on this?

@janosi
Copy link
Contributor

janosi commented May 3, 2016

Docker 1.12 seems to receive support for per-container sysctls.
moby/moby#19265
Could it be the way for Kubernetes, i.e.managing sysctls in containers through the Docker API?

@thockin
Copy link
Member

thockin commented May 4, 2016

@dchen1107 @vishh would be nice to queue this up, even though we'll
probably NOT get 1.12 qualified for our 1.3

On Tue, May 3, 2016 at 9:51 AM, janosi notifications@github.com wrote:

Docker 1.12 seems to receive support for per-container sysctls.
moby/moby#19265 moby/moby#19265
Could it be the way for Kubernetes, i.e.managing sysctls in containers
through the Docker API?


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#5095 (comment)

@vishh
Copy link
Contributor

vishh commented May 4, 2016

@thockin @erictune Given that moby/moby#19265 is done, how do you think sysctl options will be represented in our API?

@thockin
Copy link
Member

thockin commented May 4, 2016

I expect it would be similar in structure. Either a per-pod or
per-container (or both) map of string-string ? Can you think of any reason
not to make it simple? I don't think we can/should abstract it much
further. Maybe we don't want to spell it "sysctl" - maybe "osParams" or
something plausibly generic? I didn't look at Docker's API, but presumably
Windows and Mac will have to handle this...

On Wed, May 4, 2016 at 11:07 AM, Vish Kannan notifications@github.com
wrote:

@thockin https://github.com/thockin @erictune
https://github.com/erictune Given that moby/moby#192
moby/moby#192 is done, how do you think
sysctl options will be represented in our API?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#5095 (comment)

@janosi
Copy link
Contributor

janosi commented May 4, 2016

@vishh I do not know whether it matters or not, but you referenced wrong Docker github issue.

On pod vs. container level: one would think, it depends on whether a given parameter is in a pod level namespace or not, and consequently maybe both is needed (pod and container). According to the Docker issue above, the following sysctls are supported currently:
IPC namespace: kernel.msgmax, kernel.msgmnb, kernel.msgmni, kernel.sem, kernel.shmall, kernel.shmmax, kernel.shmmni, kernel.shm_rmid_forced, fs.mqueue.*
Network namespace : net.*

As I understand, both IPC and Network namespaces are on pod level. So, for the time being, if one considers only the current set of sysctls, a pod level option would be fine. But of course, it is your decision if you want to prepare for a future option in a per-container namespace.

One more thing: Docker does not allow the definition of these sysctls if the host namespace is used for the relevant parameter. I wonder if you would like to implement similar logic based on the other parameters of the pod, or you just let Docker to do its job.

@sttts sttts mentioned this issue May 23, 2016
12 tasks
@fejta-bot
Copy link

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2017
@est
Copy link

est commented Dec 25, 2017

looks like it's already in safe list?

#54132

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 24, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/isolation area/kubelet-api area/security lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

9 participants