[Feature Request] Support disk quota for docker runtime #54384

Colstuwjx · 2017-10-23T07:55:26Z

As docker already support docker run --storage-opt size=1G xxx such way to limit container disk usage, it would be better if k8s introduced this feature.

On the other hand, it also would be OK if k8s support pass --storage-opt in containers spec.
Thanks.

The text was updated successfully, but these errors were encountered:

k82cn · 2017-10-23T08:06:07Z

/sig node

k82cn · 2017-10-23T08:06:41Z

/kind feature

andyxning · 2017-12-10T06:59:27Z

Any updates?

M0nsieurChat · 2017-12-13T12:50:40Z

We would be interested as well ! Service provider with little to no control of what data is stored where by our customers. This would be a nice workaround against disk resource exhaustion of nodes whenever the customer craps outside its persistent volumes.

andyxning · 2017-12-13T12:57:47Z

Just a notification about storage-opt size=10g: it only works with overlay2 with backing fs being xfs

andyxning · 2017-12-13T14:05:33Z

Before we add support for configuring size command line argument we need to first think about:

How to specify the value for size.
Where to specify the value for size.
This feature is only available for overlay2&&xfs. How do we detect this pre-requirement correctly.

Maybe @kubernetes/sig-storage-feature-requests @jingxu97 @vishh can give some advises.

fejta-bot · 2018-03-13T14:49:32Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

andyxning · 2018-03-13T15:13:07Z

/remove-lifecycle stale

BSWANG · 2018-04-10T03:11:50Z

@andyxning Kubernetes 1.8 add Local ephemeral storage resource type. Is it possible to pass ephemeral-storage value to storage-opt size when primary partition support xfs+pquota.

Colstuwjx · 2018-04-10T04:41:30Z

In my opinion, I think ephemeral-storage is not a quota for POD, instead, it should be named with limitation(kubelet uses du to GetMetrics).

Since kubelet would like to evict pod if the ephemeral-storage limit exceeded, and adding storage-opt with docker run would set container overlay fs quota, and would throws no space left error if the quota is exceeded, rather than killing the pod.

Currently, I think ephemeral-storage is also an option for disk resource limit.
Thanks.

NickrenREN · 2018-04-10T05:06:18Z

Is it possible to pass ephemeral-storage value to storage-opt size when primary partition support xfs+pquota.

I do not think we need to pass ephemeral-storage to storage-opt, since ephemeral-storage will also limit container overlay usage. but as @Colstuwjx said, we will evict the pod if container usage exceeds the limit.

BSWANG · 2018-04-10T05:57:53Z

@Colstuwjx @NickrenREN Others resource limitations like cpu, memory is the hard limit, container can't use more than resource above limit value on the system/kernel layer. So those limitations not relate to pod eviction.
Since ephemeral-storage not use disk quota to implement limitation, the kubelet must evict pod when disk usage exceed ephemeral-storage limitation. If using disk quota to do limitation, applications in containers will got an error of no space left, And then applications can do something to decrease disk usage. I think this behavior will more consistent with others limitations.
Otherwise kubelet's periodically accounting isn's reliable enough. du command often execute timeout when container have lots of files in readwrite layer or exceeding device's iops/bandwidth capability.

NickrenREN · 2018-04-10T08:14:09Z

@BSWANG I agree kubernetes acts differently for container ephemeral-storage and cpu&memory limit.
But regarding the param passing, i want to say : ephemeral-storage is kind of different from cpu and memory. It only cares about the root partition, if an optional runtime partition is used, kubernetes will not calculate the image layer or writable layers usage of containers.

BTW: referring to the docker doc : https://docs.docker.com/engine/reference/commandline/run/#set-storage-driver-options-per-container, there are still some limits for storage-opt

fejta-bot · 2018-07-09T08:30:36Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

RobertKrawitz · 2018-07-24T17:23:15Z

The problem with ephemeral storage is even worse than the du latency; it's possible for a pod (attached) to create a file and delete it, but keep the file descriptor around. Without quota support, any pod or container within it can use as much ephemeral storage as it wants and never get caught doing so. File descriptors can get passed around in various ways (e. g. over a UNIX domain socket), so containers sharing an emptydir volume but otherwise no parent/child process hierarchy can still share in this; the disk space won't go away until everything having the file open does.

Remove the unlink "/tmp/foo" and the pod gets evicted as it should (but not after chewing through an enormous amount of disk space, if it misses the reaper!)

diskhog.yaml.txt

wongma7 · 2018-07-27T15:23:15Z

@RobertKrawitz it looks like jingxu97 was working on "Local Ephemeral Storage Capacity Isolation" this but idk the current status kubernetes/enhancements#361 , you may wish to contact her

nikhita · 2018-08-10T15:13:49Z

/remove-lifecycle stale

shinebayar-g · 2019-02-09T08:46:28Z

Any support on this? I think this feature would be definitely useful.

RobertKrawitz · 2019-02-11T21:34:21Z

In progress (not this issue per se); see #66928

xuchenCN · 2020-01-17T01:49:31Z

Hi @RobertKrawitz Thanks for the PR , I notice that PR will implement a quota enforcement right?
'--storage-opt size=xxx' will coming soon?

Colstuwjx · 2020-01-17T03:34:49Z

I have been checked the document, and I think, for now, it's focusing on monitoring, not enforcement of quota limit.

From my sight, there is some considerations:

it should limit pod storage space, to avoid abusing of storage consuming in multi-tenant case, monitor is the basically thing, limit and enforcement is really needed for our users;
we are suffering from perf issue about using du to monitor, I want the LocalStorageCapacityIsolationFSQuotaMonitoring gone beta ASAP, but the beta criterion still need to be determined as @dashpole said ;
current implement of LocalStorageCapacityIsolationFSQuotaMonitoring directly use kernel and file system level features, I think it's okay for emptyDir and local storage, but it makes split of CSI and LocalStorage into two lines, I am wondering whether we should add a uniformed DiskQuota field, just like CPU and Memory, and let them implement the details;

Thanks @RobertKrawitz great job!

pacoxu · 2021-06-24T06:50:42Z

If I understand correctly, this should be solved by kubernetes/enhancements#1029. Currently, it is an alpha-level feature.

I asked to promote it to beta and after some discussions, we think there are several things to do before promoting it to beta. But for this issue, I think we can close it as resolved. And it should be tracked in kubernetes/enhancements#1029.

See kubernetes/enhancements#1029 (comment) for more details.

Action Items in 1.22～1.23

add some metrics or make more visibility

benchmark the feature

check e2e testings: e2e evolution

Action Items in 1.24-1.25

promote it to beta

adisky · 2021-06-25T08:15:20Z

/triage accepted

k8s-triage-robot · 2023-02-08T03:22:51Z

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

Confirm that this issue is still relevant with /triage accepted (org members only)
Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

k8s-ci-robot · 2023-02-08T03:22:54Z

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 23, 2017

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Oct 23, 2017

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 23, 2017

k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Oct 23, 2017

k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Dec 13, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 9, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2018

bgrant0607 added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Aug 15, 2018

bgrant0607 added the area/reliability label May 8, 2019

k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jun 25, 2021

k8s-ci-robot removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Feb 8, 2023

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Support disk quota for docker runtime #54384

[Feature Request] Support disk quota for docker runtime #54384

Colstuwjx commented Oct 23, 2017

k82cn commented Oct 23, 2017

k82cn commented Oct 23, 2017

andyxning commented Dec 10, 2017

M0nsieurChat commented Dec 13, 2017

andyxning commented Dec 13, 2017 •

edited

andyxning commented Dec 13, 2017

fejta-bot commented Mar 13, 2018

andyxning commented Mar 13, 2018

BSWANG commented Apr 10, 2018

Colstuwjx commented Apr 10, 2018 •

edited

NickrenREN commented Apr 10, 2018

BSWANG commented Apr 10, 2018 •

edited

NickrenREN commented Apr 10, 2018

fejta-bot commented Jul 9, 2018

RobertKrawitz commented Jul 24, 2018 •

edited

wongma7 commented Jul 27, 2018

nikhita commented Aug 10, 2018

shinebayar-g commented Feb 9, 2019 •

edited

RobertKrawitz commented Feb 11, 2019

xuchenCN commented Jan 17, 2020

Colstuwjx commented Jan 17, 2020

pacoxu commented Jun 24, 2021

Action Items in 1.22～1.23

Action Items in 1.24-1.25

adisky commented Jun 25, 2021

k8s-triage-robot commented Feb 8, 2023

k8s-ci-robot commented Feb 8, 2023

[Feature Request] Support disk quota for docker runtime #54384

[Feature Request] Support disk quota for docker runtime #54384

Comments

Colstuwjx commented Oct 23, 2017

k82cn commented Oct 23, 2017

k82cn commented Oct 23, 2017

andyxning commented Dec 10, 2017

M0nsieurChat commented Dec 13, 2017

andyxning commented Dec 13, 2017 • edited

andyxning commented Dec 13, 2017

fejta-bot commented Mar 13, 2018

andyxning commented Mar 13, 2018

BSWANG commented Apr 10, 2018

Colstuwjx commented Apr 10, 2018 • edited

NickrenREN commented Apr 10, 2018

BSWANG commented Apr 10, 2018 • edited

NickrenREN commented Apr 10, 2018

fejta-bot commented Jul 9, 2018

RobertKrawitz commented Jul 24, 2018 • edited

wongma7 commented Jul 27, 2018

nikhita commented Aug 10, 2018

shinebayar-g commented Feb 9, 2019 • edited

RobertKrawitz commented Feb 11, 2019

xuchenCN commented Jan 17, 2020

Colstuwjx commented Jan 17, 2020

pacoxu commented Jun 24, 2021

Action Items in 1.22～1.23

Action Items in 1.24-1.25

adisky commented Jun 25, 2021

k8s-triage-robot commented Feb 8, 2023

k8s-ci-robot commented Feb 8, 2023

andyxning commented Dec 13, 2017 •

edited

Colstuwjx commented Apr 10, 2018 •

edited

BSWANG commented Apr 10, 2018 •

edited

RobertKrawitz commented Jul 24, 2018 •

edited

shinebayar-g commented Feb 9, 2019 •

edited