-
Notifications
You must be signed in to change notification settings - Fork 40.6k
Ephemeral storage monitoring via filesystem quotas #66928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ephemeral storage monitoring via filesystem quotas #66928
Conversation
bebd4e6
to
ac80c5f
Compare
12644a2
to
50aaa4e
Compare
dddaee9
to
cfa9fbc
Compare
b0fb1ae
to
82e0eb9
Compare
82376ef
to
ab5f1f7
Compare
/retest |
ping @jingxu97 |
I spoke with @jingxu97 offline, who took another pass and said it looked good. |
Thanks @dashpole ping @derekwaynecarr |
@dashpole thanks. /hold cancel |
/assign @msau42 |
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
/hold to give you a chance to address any feedback
if volumeSpec.Volume.EmptyDir != nil && | ||
volumeSpec.Volume.EmptyDir.SizeLimit != nil && | ||
volumeSpec.Volume.EmptyDir.SizeLimit.Value() > 0 && | ||
volumeSpec.Volume.EmptyDir.SizeLimit.Value() < sizeLimit.Value() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not have SizeLimits for other "ephermeral volume types", e.g. for SecretVolume
, ConfigMapVolume
, etc.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This project only applies to EmptyDir volumes (and potentially anything else covered by local ephemeral storage, but there are no plans to cover other volume types under that umbrella).
Local ephemeral storage is defined to be EmptyDir volumes, writable layers, and logs. Writable layers and logs aren't volumes and would have to be covered through cadvisor. Secrets and configmaps are not defined to be local ephemeral storage.
if err == nil { | ||
volumeutil.SetReady(ed.getMetaDir()) | ||
if mounterArgs.DesiredSize != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to keep in mind, if this will be supported for other ephemeral volume types (e.g. SecretVolumes
, ConfigMapVolumes
, etc.) in the future, those volume types do periodic "remounting" to ensure the data is fresh. Might be worth thinking through what happens in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
@@ -397,9 +411,14 @@ func (ed *emptyDir) TearDownAt(dir string) error { | |||
} | |||
|
|||
func (ed *emptyDir) teardownDefault(dir string) error { | |||
// Remove any quota | |||
err := quota.ClearQuota(ed.mounter, dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to worry about accumulating quota IDs in these error scenarios? Meaning if this happens can the accumulation cause other parts of the system to stop functioning?
mountErr := volumeMounter.SetUp(volume.MounterArgs{ | ||
FsGroup: fsGroup, | ||
DesiredSize: volumeToMount.DesiredSizeLimit, | ||
PodUID: string(volumeToMount.Pod.UID), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to add this argument here. It should be available via the emptyDir
data structure in the SetUp
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, I can remove it; I agree it's not really needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for accumulating quotas, it's not impossible that something could go wrong with the teardown and leave the quota in place applying to nothing. It is possible to remove quotas later if need be. We leave records in /etc/projects and /etc/projid with clearly defined names to make it possible to trace back to Kubernetes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, I can remove it; I agree it's not really needed here.
If you have time go for it, happy to reapply lgtm/approval. If not, no big deal.
@@ -0,0 +1,105 @@ | |||
// +build linux | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: pkg/volume/util/quota/
seems like a very generic name. Worth being more specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could potentially be fsquota or the like, if you feel strongly about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight concern about confusing readers. But don't feel strongly. We could rename in the future if there is a collision.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: derekwaynecarr, RobertKrawitz, saad-ali The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
@saad-ali please let me know whether you deem any of the changes you discussed to be critical here. |
/hold cancel On further thought, /approve means that it's OK, and this was just to let me reply. |
@RobertKrawitz: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/test pull-kubernetes-bazel-test |
|
||
// DesiredSizeLimit indicates the desired upper bound on the size of the volume | ||
// (if so implemented) | ||
DesiredSizeLimit *resource.Quantity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry to comment the late. But where this DesiredSizeLimit is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be used when we implement quota enforcement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we hold from introducing this until we have a plan to use it? It seems like we introduced this field and almost an year later nobody is using it.
Use XFS-style quotas to monitor ephemeral storage consumption where possible. Reference https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0030-20180906-quotas-for-ephemeral-storage.md