Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ShmSize support #34928

Closed
wants to merge 1 commit into from
Closed

Add ShmSize support #34928

wants to merge 1 commit into from

Conversation

janosi
Copy link
Contributor

@janosi janosi commented Oct 17, 2016

What this PR does / why we need it: Support for defining the size of /dev/shm in the pod's IPC namespace

Which issue this PR fixes : fixes #28272

Special notes for your reviewer:

Release note:


This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Can a kubernetes member verify that this patch is reasonable to test? If so, please reply with "@k8s-bot ok to test" on its own line.

Regular contributors should join the org to skip this step.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed, please reply here (e.g. I signed it!) and we'll verify. Thanks.


  • If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
  • If you signed the CLA as a corporation, please let us know the company's name.

@janosi
Copy link
Contributor Author

janosi commented Oct 17, 2016

Nokia has signed CNCF CLA for Kubernetes contribution. So, no Google CLA is needed.

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note-label-needed labels Oct 17, 2016
@bgrant0607 bgrant0607 assigned vishh and dchen1107 and unassigned bgrant0607 Oct 17, 2016
@bgrant0607 bgrant0607 added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-label-needed labels Oct 17, 2016
@bgrant0607 bgrant0607 self-assigned this Oct 17, 2016
@bgrant0607
Copy link
Member

cc @derekwaynecarr @Random-Liu

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we choose to add support for this feature, it needs to be plumbed through the CRI interfaces as well.

see:
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/api/v1alpha1/runtime/api.pb.go#L1229

/cc @euank

@@ -1566,6 +1566,11 @@ type PodSpec struct {
// If specified, the fully qualified Pod hostname will be "<hostname>.<subdomain>.<pod namespace>.svc.<cluster domain>".
// If not specified, the pod will not have a domainname at all.
Subdomain string `json:"subdomain,omitempty"`
//Optional: Docker "--shm-size" support. Defines the size of /dev/shm in the IPC namespace of the pod.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are generally trying to avoid docker specific defaulting behaviors in the API as we move towards the CRI model supporting multiple container runtimes. docker appears to default to 64MB, so if we did the same, the defautling would have to happen in our own defaults.go workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse me,, I am not sure I understand the point here. The planned functionality was: if the user defines ShmSize in the pod manifest, the defined size is sent to the container runtime. If ShmSize is not defined in the pod manifest, no size definition is sent to the container runtime, and the container runtime applies its default value.
In this case the container runtime is Docker, that is true. Though, I do not understand this one "avoid docker specific defaulting behaviors in the API ". Where did we apply Docker specific defaulting behavior on the API? ShmSize is optional, is that the problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janosi -- to clarify, rather than default to the container runtime default shm size, we would define a Kubernetes default shm size. This way, the API is consistent across any container runtime (rkt, docker, cri-o, hyper, etc.). So I am asking that you remove "docker" from the documentation text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derekwaynecarr That is clear now.

@janosi
Copy link
Contributor Author

janosi commented Oct 18, 2016

@derekwaynecarr As ShmSize is the size of /dev/shm in the IPC namespace of the pod, do you mean "PodSandbox" as the proper place for ShmSize on CRI interface instead of "LinuxContainerConfig"?

@euank
Copy link
Contributor

euank commented Oct 18, 2016

Yeah, this would be a pod sandbox thing since the containers all share the same IPC namespace.

@vishh
Copy link
Contributor

vishh commented Oct 19, 2016

Given the /dev/shm is a tmpfs and we have memcg already in-place, I'd prefer setting shm size to a much higher value by default in kubelet rather than exposing it via the API. Ideally, we'd set it to the pod's memory limit.

I'm tempted to close this PR unless you want to re-purpose this to set a higher limits by default @janosi.

@derekwaynecarr
Copy link
Member

I am fine with a higher cluster-wide default that is configurable. I don't
really want to have a scheduler to have to use this in a scheduling
decision. I always found it fun that your /dev/shm could be higher than
your configured memory limit. That said, I don't know how high I want to
set the value by default. Do apps look at the configured size and tune? I
would not want a BestEffort pod to attempt to use much more of tmpfs than
usual...

On Wednesday, October 19, 2016, Vish Kannan notifications@github.com
wrote:

Given the /dev/shm is a tmpfs and we have memcg already in-place, I'd
prefer setting shm size to a much higher value by default in kubelet rather
than exposing it via the API. Ideally, we'd set it to the pod's memory
limit.

I'm tempted to close this PR unless you want to re-purpose this to set a
higher limits by default @janosi https://github.com/janosi.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34928 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AF8dbDx_w601bGt-rab1fGu-rpNR0wdXks5q1mm5gaJpZM4KYVed
.

@janosi
Copy link
Contributor Author

janosi commented Oct 24, 2016

@vishh @derekwaynecarr Do I understand right, that you would accept any solution but this per-pod shm size? So, @vishh would prefer an automatic way of working that sets the shm size to the size of the memory of the pod, while @derekwaynecarr would prefer a way of working that enables the configuration of shm size but on cluster (I guess, Kubernetes cluster, or namespace) level?
And because you are the decision makers here, this PR goes nowhere but closed state?
Is my understanding correct?

@derekwaynecarr
Copy link
Member

@janosi -- i am trying to collect information on how applications look at /dev/shm to tune their behavior by default to know if its better to set a high value by default, or a value explicitly per pod. my concern with a high value by default is if applications will induce oom with the change. i know for oracle users would want to tune it (https://docs.oracle.com/cd/B28359_01/server.111/b32009/appc_linux.htm)

@bgrant0607 bgrant0607 removed their assignment Oct 24, 2016
@janosi
Copy link
Contributor Author

janosi commented Oct 25, 2016

My opinion is, that the pod level configuration option would provide a more fine grained resource control. If we consider e.g. namespace quotas, how big a memory quota should be if we assume that the size of /dev/shm should be calculated into the memory consumption of the pods? If I were an administrator to create a namespace, it would be easier for me to calculate the required memory quota if the pod definitions explicitly defined the required /dev/shm for a pod.
If the size of /dev/shm was a kind of "one size fits all" value most probably namespaces would have overestimated memory quota numbers, that prevents the optimal usage of HW resources.

On the other hand, if someone would like to have a cluster-wide default /dev/shm size, that is also possible even if there is the option to define the size of /dev/shm on pod level. In that case it has to be defined which setting has precedence (most probably the pod level one would have precedence).

@k8s-github-robot k8s-github-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Oct 26, 2016
@janosi
Copy link
Contributor Author

janosi commented Oct 26, 2016

@derekwaynecarr @vishh I would liketo know your decision on this PR before I start working on the changes @derekwaynecarr requested.

@k8s-github-robot
Copy link

@janosi PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 27, 2016
@vishh
Copy link
Contributor

vishh commented Oct 27, 2016

To move forward with this PR/proposal, it will be useful to understand if
applications alter their behavior based on the size of /dev/shm.
Quota is less of a concern for me.

On Wed, Oct 26, 2016 at 11:17 AM, Laszlo Janosi notifications@github.com
wrote:

@derekwaynecarr https://github.com/derekwaynecarr @vishh
https://github.com/vishh I would liketo know your decision on this PR
before I start working on the changes @derekwaynecarr
https://github.com/derekwaynecarr requested.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34928 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AGvIKJSIu_hKAKjput3iCo139cGKM0R8ks5q35k7gaJpZM4KYVed
.

@derekwaynecarr
Copy link
Member

Like @vishh my concern is if applications will do bad things by default
with a larger /dev/shm, and in the case of unbounded Burstable pods, havoc
could ensue. Absent that knowledge I am inclined to have a per pod
specified value with clear defaulting applied if not specified at the
Kubernetes level. It would be good to get more data to reach a conclusion.
I know we have users that requested the core ability to specify
--shm-size on docker run before that option even existed and I am trying to
hear back for their input as well before settling on this to userstand how
their apps will respond.

On Thursday, October 27, 2016, Vish Kannan notifications@github.com wrote:

To move forward with this PR/proposal, it will be useful to understand if
applications alter their behavior based on the size of /dev/shm.
Quota is less of a concern for me.

On Wed, Oct 26, 2016 at 11:17 AM, Laszlo Janosi <notifications@github.com
javascript:_e(%7B%7D,'cvml','notifications@github.com');>
wrote:

@derekwaynecarr https://github.com/derekwaynecarr @vishh
https://github.com/vishh I would liketo know your decision on this PR
before I start working on the changes @derekwaynecarr
https://github.com/derekwaynecarr requested.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/kubernetes/kubernetes/pull/
34928#issuecomment-256433131>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGvIKJSIu_
hKAKjput3iCo139cGKM0R8ks5q35k7gaJpZM4KYVed>
.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#34928 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AF8dbP8wJ7PL5l8pbidrM0hY-lP9xxN0ks5q4MYhgaJpZM4KYVed
.

@derekwaynecarr
Copy link
Member

@vishh -- rather than a cluster wide default, if we did not expose this on the pod spec (which I have not settled on), we could default the value to pod memory limit (if specified), or local node allocatable. In this manner it would be consistent with the downward API. For most applications, the concern is that the value is at least as large as the request.

@vishh
Copy link
Contributor

vishh commented Oct 28, 2016

@derekwaynecarr Yeah that is what I suggested earlier - #34928 (comment)

I'd be OK with bounding it to pod's memory limits for now. On the other hand, unless applications infer limits via yet another API (shm), as long as the limits are large enough, it should be OK right?

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 16, 2016
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] Needs approval from an approver in each of these OWNERS Files:

We suggest the following people:
cc @bgrant0607
You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link

This PR hasn't been active in 90 days. Closing this PR. Please reopen if you would like to work towards merging this change, if/when the PR is ready for the next round of review.

cc @davidopp @dchen1107 @derekwaynecarr @janosi @vishh

You can add 'keep-open' label to prevent this from happening again, or add a comment to keep it open another 90 days

@RXminuS
Copy link

RXminuS commented Feb 1, 2017

Where are we on this one? I've been eagerly waiting to finally be able to run our pods...we NEEED more /dev/shm!

@vishh
Copy link
Contributor

vishh commented Feb 1, 2017 via email

@RXminuS
Copy link

RXminuS commented Feb 1, 2017

For those stuck with this issue you can make a hostPath volume to /dev/shm and volume mount to the same location and give the container privileged: true annotation. Not perfect but does the trick for now.

@asteroidcow
Copy link

For those stuck with this issue, the OpenShift documentation has something which I use for getting a large SHM - https://docs.openshift.org/latest/dev_guide/shared_memory.html

I commented it here - #31613 (comment)

@palonsoro
Copy link

This is a very interesting workaround, however it lets you change the default limit from docker's 64MB to the memory limit of the pod (shared with the memory used by container processes themselves).

Nice if you need something working now, but a good solution would imply letting you to choose shm size independently.

@JoshBroomberg
Copy link

@vishh what happened to this? I don't see either the podspec API option or a higher default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Posix Shared Memory across containers in a pod