New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support sharing /dev/shm among containers in a pod. #4823

Closed
mrunalp opened this Issue Feb 25, 2015 · 15 comments

Comments

Projects
None yet
7 participants
@mrunalp
Contributor

mrunalp commented Feb 25, 2015

The containers in a pod use the IPC namespace of the pod infra container. However, this only allows sharing of System V IPC objects. Posix shared memory objects need access to /dev/shm.

I propose that we do this by mounting a tmpfs outside of the pod and then bind mount it to all the containers in the pod as /dev/shm.

@thockin mentioned that there could be a tmpfs volume per pod in #4625

I think that this use case fits using such volumes.

ping @thockin @bgrant0607

@pmorie

This comment has been minimized.

Show comment
Hide comment
@pmorie

pmorie Feb 25, 2015

Member

@mrunalp @thockin @bgrant0607 This seems like a case like the one discussed in #4602 where it would be convenient to use tmpfs storage for an arbitrary volume plugin - ie, EmptyDir in tmpfs instead of local storage.

Member

pmorie commented Feb 25, 2015

@mrunalp @thockin @bgrant0607 This seems like a case like the one discussed in #4602 where it would be convenient to use tmpfs storage for an arbitrary volume plugin - ie, EmptyDir in tmpfs instead of local storage.

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Feb 27, 2015

Member

FWIW, sys V shm has caused a bunch of problems with resource accounting for us in the past, though I'm not totally opposed to it if people are willing to live with that. We may need to add a policy control for whether it's allowed in the cluster if those problems still exist.

I agree that we should make tmpfs generally available.

Member

bgrant0607 commented Feb 27, 2015

FWIW, sys V shm has caused a bunch of problems with resource accounting for us in the past, though I'm not totally opposed to it if people are willing to live with that. We may need to add a policy control for whether it's allowed in the cluster if those problems still exist.

I agree that we should make tmpfs generally available.

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Mar 1, 2015

Member

I agree this is useful and correct. I was aware of it, but blocked on a
general tmpfs solution.

I'm going to argue this is a stretch for 1.0, unless we cobble something
together that isn't so general purpose. e.g. mount a tmpfs for every pod
in a known place and always add an extra volume mount to every container.

On Thu, Feb 26, 2015 at 11:07 PM, Brian Grant notifications@github.com
wrote:

FWIW, sys V shm has caused a bunch of problems with resource accounting
for us in the past, though I'm not totally opposed to it if people are
willing to live with that. We may need to add a policy control for whether
it's allowed in the cluster if those problems still exist.

I agree that we should make tmpfs generally available.

Reply to this email directly or view it on GitHub
#4823 (comment)
.

Member

thockin commented Mar 1, 2015

I agree this is useful and correct. I was aware of it, but blocked on a
general tmpfs solution.

I'm going to argue this is a stretch for 1.0, unless we cobble something
together that isn't so general purpose. e.g. mount a tmpfs for every pod
in a known place and always add an extra volume mount to every container.

On Thu, Feb 26, 2015 at 11:07 PM, Brian Grant notifications@github.com
wrote:

FWIW, sys V shm has caused a bunch of problems with resource accounting
for us in the past, though I'm not totally opposed to it if people are
willing to live with that. We may need to add a policy control for whether
it's allowed in the cluster if those problems still exist.

I agree that we should make tmpfs generally available.

Reply to this email directly or view it on GitHub
#4823 (comment)
.

@mrunalp

This comment has been minimized.

Show comment
Hide comment
@mrunalp

mrunalp Mar 2, 2015

Contributor

@thockin @bgrant0607 @pmorie Thanks for your comments.
How about /var/lib/pods/<pod_id>/path/to/tmpfs for the location on host?
Also, does this look good for representing a tmpfs volume?

type VolumeSource struct {
       /// New field for tmpfs....
       Tmpfs *TmpfsVolumeSource
 }


// TmpfsVolumeSource represents a tmpfs on the host meant to be mounted into all the containers in a pod
type TmpfsVolumeSource struct {
       Path     string `json:"path" description:"path of the directory on the host"`
       Size     int    `json:"size" description:"Size of the tmpfs in bytes"`
       ReadOnly bool   `json:"readOnly,omitempty" description:"read-only if true, read-write otherwise (false or unspecified)"`
}
Contributor

mrunalp commented Mar 2, 2015

@thockin @bgrant0607 @pmorie Thanks for your comments.
How about /var/lib/pods/<pod_id>/path/to/tmpfs for the location on host?
Also, does this look good for representing a tmpfs volume?

type VolumeSource struct {
       /// New field for tmpfs....
       Tmpfs *TmpfsVolumeSource
 }


// TmpfsVolumeSource represents a tmpfs on the host meant to be mounted into all the containers in a pod
type TmpfsVolumeSource struct {
       Path     string `json:"path" description:"path of the directory on the host"`
       Size     int    `json:"size" description:"Size of the tmpfs in bytes"`
       ReadOnly bool   `json:"readOnly,omitempty" description:"read-only if true, read-write otherwise (false or unspecified)"`
}
@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Mar 4, 2015

Member

If we layer this on the work that Paul is doing for secrets, we could simply do something like:

os.Mkdir(kubelet.getTmpfsRootDir() + "/shm")

And then just pass that in via docker. The real problem is garbage collection - we need to clean it up when a pod goes away, and simply rm -rf is not good enough

Member

thockin commented Mar 4, 2015

If we layer this on the work that Paul is doing for secrets, we could simply do something like:

os.Mkdir(kubelet.getTmpfsRootDir() + "/shm")

And then just pass that in via docker. The real problem is garbage collection - we need to clean it up when a pod goes away, and simply rm -rf is not good enough

@bgrant0607

This comment has been minimized.

Show comment
Hide comment
@bgrant0607

bgrant0607 Mar 4, 2015

Member

@thockin Why aren't you keen on a volume source? (Not that we need to do this at all for 1.0.)

Member

bgrant0607 commented Mar 4, 2015

@thockin Why aren't you keen on a volume source? (Not that we need to do this at all for 1.0.)

@thockin

This comment has been minimized.

Show comment
Hide comment
@thockin

thockin Mar 4, 2015

Member

If we offer a volume for tmpfs I don't want to see a tmpfs volume, I want
to see a way of speccing an emptydir that is backed by volatile media. I
want to talk in terms of QoS not in terms of implementation technology. or
at least I think we do.

That is a bigger issue and not urgent. This scaled-down form, however,
could get done before 1.0 fairly easily.

On Tue, Mar 3, 2015 at 9:55 PM, Brian Grant notifications@github.com
wrote:

@thockin https://github.com/thockin Why aren't you keen on a volume
source? (Not that we need to do this at all for 1.0.)


Reply to this email directly or view it on GitHub
#4823 (comment)
.

Member

thockin commented Mar 4, 2015

If we offer a volume for tmpfs I don't want to see a tmpfs volume, I want
to see a way of speccing an emptydir that is backed by volatile media. I
want to talk in terms of QoS not in terms of implementation technology. or
at least I think we do.

That is a bigger issue and not urgent. This scaled-down form, however,
could get done before 1.0 fairly easily.

On Tue, Mar 3, 2015 at 9:55 PM, Brian Grant notifications@github.com
wrote:

@thockin https://github.com/thockin Why aren't you keen on a volume
source? (Not that we need to do this at all for 1.0.)


Reply to this email directly or view it on GitHub
#4823 (comment)
.

@mrunalp

This comment has been minimized.

Show comment
Hide comment
@mrunalp

mrunalp Mar 5, 2015

Contributor

@thockin @bgrant0607 Thanks for your comments. So, IIUC in the scaled down version it will be a matter of requesting tmpfs storage in the kubelet and then just mounting it into pods. We need to make sure the clean up happens correctly.

There is the question of tmpfs size, though. Do we add a field to allow pods to request the size of /dev/shm?

Contributor

mrunalp commented Mar 5, 2015

@thockin @bgrant0607 Thanks for your comments. So, IIUC in the scaled down version it will be a matter of requesting tmpfs storage in the kubelet and then just mounting it into pods. We need to make sure the clean up happens correctly.

There is the question of tmpfs size, though. Do we add a field to allow pods to request the size of /dev/shm?

@pmorie

This comment has been minimized.

Show comment
Hide comment
@pmorie

pmorie Mar 5, 2015

Member

@mrunalp I believe you could use the new method I added to the volume.Host interface, GetTmpfsPodVolumeDir, in #4625

Member

pmorie commented Mar 5, 2015

@mrunalp I believe you could use the new method I added to the volume.Host interface, GetTmpfsPodVolumeDir, in #4625

@mrunalp

This comment has been minimized.

Show comment
Hide comment
@mrunalp

mrunalp Mar 6, 2015

Contributor

I think ideally we would want separate mounts per pod for this scenario and allow pods to specify what size they want for their instance.

Contributor

mrunalp commented Mar 6, 2015

I think ideally we would want separate mounts per pod for this scenario and allow pods to specify what size they want for their instance.

@mrunalp

This comment has been minimized.

Show comment
Hide comment
@mrunalp

mrunalp Mar 21, 2015

Contributor

I started making changes for this utilizing #5166. One question I have is should we just add /dev/shm by default? It would mean adding a tmpfs backed empty dir to pod.Spec.Volumes and also adding it to each container's VolumeMounts.

Contributor

mrunalp commented Mar 21, 2015

I started making changes for this utilizing #5166. One question I have is should we just add /dev/shm by default? It would mean adding a tmpfs backed empty dir to pod.Spec.Volumes and also adding it to each container's VolumeMounts.

@mrunalp

This comment has been minimized.

Show comment
Hide comment
@mrunalp

mrunalp Mar 21, 2015

Contributor

Also, docker changes were necessary for this to work. I made a PR moby/moby#11353 which got merged last week.

Contributor

mrunalp commented Mar 21, 2015

Also, docker changes were necessary for this to work. I made a PR moby/moby#11353 which got merged last week.

@fejta-bot

This comment has been minimized.

Show comment
Hide comment
@fejta-bot

fejta-bot Dec 17, 2017

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot commented Dec 17, 2017

Issues go stale after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@fejta-bot

This comment has been minimized.

Show comment
Hide comment
@fejta-bot

fejta-bot Jan 16, 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot commented Jan 16, 2018

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle rotten
/remove-lifecycle stale

@fejta-bot

This comment has been minimized.

Show comment
Hide comment
@fejta-bot

fejta-bot Feb 15, 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta-bot commented Feb 15, 2018

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment