Skip to content

Ephemeral storage monitoring via filesystem quotas #66928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

RobertKrawitz
Copy link
Contributor

@RobertKrawitz RobertKrawitz commented Aug 2, 2018

Use XFS-style quotas to monitor ephemeral storage consumption where possible. Reference https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/0030-20180906-quotas-for-ephemeral-storage.md

kubelet now allows use of XFS quotas (on XFS and suitably configured ext4fs filesystems) to monitor storage consumption for ephemeral storage (currently for emptydir volumes only).  This method of monitoring consumption is faster and more accurate than the old method of walking the filesystem tree.  It does not enforce limits, only monitors consumption.  To utilize this functionality, you must set the feature  gate `LocalStorageCapacityIsolationFSQuotaMonitoring=true`. For ext4fs filesystems, you must create the
filesystem with `mkfs.ext4 -O project <block_device>` and run `tune2fs -Q prjquota `block device`; XFS
filesystems need no additional preparation.  The filesystem must be mounted with option `project` in
`/etc/fstab`.  If your primary partition is the root filesystem, you must also add `rootflags=pquota` to your
GRUB config file.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 2, 2018
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 7, 2018
@RobertKrawitz RobertKrawitz force-pushed the ephemeral-storage-quota-exp branch from bebd4e6 to ac80c5f Compare August 7, 2018 21:49
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 7, 2018
@RobertKrawitz RobertKrawitz force-pushed the ephemeral-storage-quota-exp branch 6 times, most recently from 12644a2 to 50aaa4e Compare August 8, 2018 18:21
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Aug 10, 2018
@RobertKrawitz RobertKrawitz force-pushed the ephemeral-storage-quota-exp branch 6 times, most recently from dddaee9 to cfa9fbc Compare August 14, 2018 18:55
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 15, 2018
@RobertKrawitz RobertKrawitz force-pushed the ephemeral-storage-quota-exp branch from b0fb1ae to 82e0eb9 Compare August 15, 2018 20:15
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 15, 2018
@RobertKrawitz RobertKrawitz force-pushed the ephemeral-storage-quota-exp branch from 82376ef to ab5f1f7 Compare August 16, 2018 20:26
@RobertKrawitz
Copy link
Contributor Author

/retest

@RobertKrawitz
Copy link
Contributor Author

ping @jingxu97

@dashpole
Copy link
Contributor

I spoke with @jingxu97 offline, who took another pass and said it looked good.

@RobertKrawitz
Copy link
Contributor Author

Thanks @dashpole

ping @derekwaynecarr

@derekwaynecarr
Copy link
Member

@dashpole thanks.

/hold cancel
/approve

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 30, 2019
@RobertKrawitz
Copy link
Contributor Author

/assign @childsb @luxas

@childsb @luxas can you look at this -- we're trying to get this into 1.15. Thanks!

@dashpole
Copy link
Contributor

/assign @msau42
for approvals in /pkg/volume and /test/utils

@saad-ali
Copy link
Member

/assign

Copy link
Member

@saad-ali saad-ali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/hold to give you a chance to address any feedback

if volumeSpec.Volume.EmptyDir != nil &&
volumeSpec.Volume.EmptyDir.SizeLimit != nil &&
volumeSpec.Volume.EmptyDir.SizeLimit.Value() > 0 &&
volumeSpec.Volume.EmptyDir.SizeLimit.Value() < sizeLimit.Value() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not have SizeLimits for other "ephermeral volume types", e.g. for SecretVolume, ConfigMapVolume, etc.?

Copy link
Contributor Author

@RobertKrawitz RobertKrawitz May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This project only applies to EmptyDir volumes (and potentially anything else covered by local ephemeral storage, but there are no plans to cover other volume types under that umbrella).

Local ephemeral storage is defined to be EmptyDir volumes, writable layers, and logs. Writable layers and logs aren't volumes and would have to be covered through cadvisor. Secrets and configmaps are not defined to be local ephemeral storage.

if err == nil {
volumeutil.SetReady(ed.getMetaDir())
if mounterArgs.DesiredSize != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to keep in mind, if this will be supported for other ephemeral volume types (e.g. SecretVolumes, ConfigMapVolumes, etc.) in the future, those volume types do periodic "remounting" to ensure the data is fresh. Might be worth thinking through what happens in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@@ -397,9 +411,14 @@ func (ed *emptyDir) TearDownAt(dir string) error {
}

func (ed *emptyDir) teardownDefault(dir string) error {
// Remove any quota
err := quota.ClearQuota(ed.mounter, dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to worry about accumulating quota IDs in these error scenarios? Meaning if this happens can the accumulation cause other parts of the system to stop functioning?

mountErr := volumeMounter.SetUp(volume.MounterArgs{
FsGroup: fsGroup,
DesiredSize: volumeToMount.DesiredSizeLimit,
PodUID: string(volumeToMount.Pod.UID),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to add this argument here. It should be available via the emptyDir data structure in the SetUp method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, I can remove it; I agree it's not really needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for accumulating quotas, it's not impossible that something could go wrong with the teardown and leave the quota in place applying to nothing. It is possible to remove quotas later if need be. We leave records in /etc/projects and /etc/projid with clearly defined names to make it possible to trace back to Kubernetes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, I can remove it; I agree it's not really needed here.

If you have time go for it, happy to reapply lgtm/approval. If not, no big deal.

@@ -0,0 +1,105 @@
// +build linux

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: pkg/volume/util/quota/ seems like a very generic name. Worth being more specific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could potentially be fsquota or the like, if you feel strongly about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight concern about confusing readers. But don't feel strongly. We could rename in the future if there is a collision.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: derekwaynecarr, RobertKrawitz, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 30, 2019
@saad-ali
Copy link
Member

/hold
To give you a chance to address any feedback

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 30, 2019
@RobertKrawitz
Copy link
Contributor Author

@saad-ali please let me know whether you deem any of the changes you discussed to be critical here.

@RobertKrawitz
Copy link
Contributor Author

/hold cancel

On further thought, /approve means that it's OK, and this was just to let me reply.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 30, 2019
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented May 31, 2019

@RobertKrawitz: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-local-e2e-containerized 31edcd999489109e15cefdaa3551d4016e62eb4e link /test pull-kubernetes-local-e2e-containerized

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@RobertKrawitz
Copy link
Contributor Author

/test pull-kubernetes-bazel-test

@k8s-ci-robot k8s-ci-robot merged commit cf76868 into kubernetes:master May 31, 2019
@RobertKrawitz RobertKrawitz deleted the ephemeral-storage-quota-exp branch May 31, 2019 16:56

// DesiredSizeLimit indicates the desired upper bound on the size of the volume
// (if so implemented)
DesiredSizeLimit *resource.Quantity
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to comment the late. But where this DesiredSizeLimit is used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be used when we implement quota enforcement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we hold from introducing this until we have a plan to use it? It seems like we introduced this field and almost an year later nobody is using it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/kubeadm area/kubectl area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.