Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reserve overhead when validating that a Filesystem has enough space #1319

Merged
merged 8 commits into from Oct 1, 2020

Conversation

maya-r
Copy link
Contributor

@maya-r maya-r commented Jul 29, 2020

When validating disk space, reserve space for filesystem overhead

The amount of available space in a filesystem is not exactly
the advertise amount. Things like indirect blocks or metadata
may use up some of this space. Reserving it to avoid reaching
full capacity by default.

This value is configurable from the CDIConfig object spec,
both globally and per-storageclass.

The default value is 0.055, or "5.5% of the space is
reserved". This value was chosen because some filesystems
reserve 5% of the space as overhead for the root user and
this space doubles as reservation for the worst case
behaviour for unclear space usage. I've chosen a value
that is slightly higher.

This validation is only necessary because we use sparse
images instead of fallocated ones, which was done to have
reasonable alerts regarding space usage from various
storage providers.


Update CDIConfig filesystemOverhead status, validate, and
pass the final value to importer/upload pods.

Only the status values controlled by the config controller
are used, and it's filled out for all available storage
classes in the cluster.

Use this value in Validate calls to ensure that some of the
space is reserved for the filesystem overhead to guard from
accidents.

Caveats:

Doesn't use Default: to define the default of 0.055, instead
it is hard-coded in reconcile. It seems like we can't use a
default value.

Release note:

By default, reserve 5.5% of a filesystem for filesystem overhead.
This value is configurable from the CDIConfig object spec, both globally and per-storageclass.

kubectl edit cdiconfig config:
spec:
  filesystemOverhead:
    global: "override-0.055-for-all-values"
    storageClass:
      mystorageclass: "override-per-storageclass"

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Jul 29, 2020
@maya-r maya-r marked this pull request as draft July 29, 2020 20:46
@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL labels Jul 29, 2020
@maya-r
Copy link
Contributor Author

maya-r commented Jul 29, 2020

Missing doc, unit tests, functional tests.
Code works, so I figured it's time to put it as a WIP pull request.

@coveralls
Copy link

coveralls commented Jul 29, 2020

Coverage Status

Coverage decreased (-0.2%) to 74.611% when pulling 89a137d on maya-r:reserve-fs-overhead into 93fa687 on kubevirt:master.

@awels
Copy link
Member

awels commented Jul 30, 2020

/retest

@@ -211,6 +216,62 @@ func (r *CDIConfigReconciler) reconcileDefaultPodResourceRequirements(config *cd
return nil
}

func (r *CDIConfigReconciler) reconcileFilesystemOverhead(config *cdiv1.CDIConfig) error {
var globalOverhead cdiv1.Percent = "0.055"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make "0.055" a const name defaultGlobalOverhead or something?

pkg/controller/config-controller.go Show resolved Hide resolved
@@ -412,6 +413,10 @@ func (r *ImportReconciler) createImportEnvVar(pvc *corev1.PersistentVolumeClaim)
if err != nil {
return nil, err
}
podEnvVar.filesystemOverhead, err = GetFilesystemOverhead(r.uncachedClient, pvc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you are using the uncachedClient instead of the client here?

pkg/controller/util.go Show resolved Hide resolved
pkg/controller/util.go Show resolved Hide resolved
pkg/controller/util.go Show resolved Hide resolved
@@ -226,7 +226,7 @@ var _ = Describe("Validate", func() {
table.Entry("should return error on bad json", mockExecFunction(badValidateJSON, "", expectedLimits), "unexpected end of JSON input", imageName),
table.Entry("should return error on bad format", mockExecFunction(badFormatValidateJSON, "", expectedLimits), fmt.Sprintf("Invalid format raw2 for image %s", imageName), imageName),
table.Entry("should return error on invalid backing file", mockExecFunction(backingFileValidateJSON, "", expectedLimits), fmt.Sprintf("Image %s is invalid because it has backing file backing-file.qcow2", imageName), imageName),
table.Entry("should return error when PVC is too small", mockExecFunction(hugeValidateJSON, "", expectedLimits), fmt.Sprintf("Virtual image size %d is larger than available size %d. A larger PVC is required.", 52949672960, 42949672960), imageName),
table.Entry("should return error when PVC is too small", mockExecFunction(hugeValidateJSON, "", expectedLimits), fmt.Sprintf("Virtual image size %d is larger than available size %d (PVC size %d, reserved overhead %f%%). A larger PVC is required.", 52949672960, 42949672960, 52949672960, 0.0), imageName),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a non 0 overhead test as well?

Type: "object",
Properties: map[string]extv1beta1.JSONSchemaProps{
"global": {
Description: "How much space of a Filesystem volume should be reserved for safety. This value is the global one to be used unless a per-storageClass value is chosen.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add "Pattern", here with a regular expression, this will stop invalid values from reaching our controllers.

pkg/apis/core/v1beta1/types.go Show resolved Hide resolved
Description: "How much space of a Filesystem volume should be reserved for safety. This value is the global one to be used unless a per-storageClass value is chosen.",
Type: "string",
},
"storageClass": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to specify more here about what goes in, in particular we want the pattern on value as well. It should look something like this:

"storageClass": {
  Items: &extv1beta1.JSONSchemaPropsOrArray{
    Schema: &extv1beta1.JSONSchemaProps{
      Description: "Some description explaining what the storage class map does, same as in types.go",
      Type: "array",
      Properties: map[string]extv1beta1.JSONSchemaProps{
      "storageClass": {
        Description: "Name of the storage class",
	Type:        "object",
       },
       "overhead": {
          Description: "The overhead as a decimal",
	  Type:        "object",
          Pattern: "regex for pattern",
	},
    },

Not entirely sure this will work since you used a map. But I think the validation is useful

// XXX: validation doesn't actually seem to do anything.
// Actually relies on the reconcile not to use bogus values.
// +kubebuilder:validation:Pattern=`^(0\.[0-9]+|0)$`
type Percent string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a string better than using a float?

May be nice to have a NewPercent function that validates and returns a Percent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An actual float is problematic for portability reasons, and so the tooling discourages it.

storageClassNameOverhead, found := perStorageConfig[storageClassName]

if found {
valid, err := validOverhead(storageClassNameOverhead)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can validation be enforced by CRD schema?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a regexp on the field now, to enforce 0 <= overhead <= 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw: I only managed to add it to the global overhead one, not to the map[storageclass]overhead.

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 4, 2020
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 9, 2020
@maya-r
Copy link
Contributor Author

maya-r commented Sep 9, 2020

I think I'm going to have to add validation to a lot of other data sources, but that's probably a separate PR

@maya-r maya-r marked this pull request as ready for review September 9, 2020 15:33
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 9, 2020
@maya-r
Copy link
Contributor Author

maya-r commented Sep 10, 2020

The tests are fine if run independently, I just introduced some flakiness

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 13, 2020
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 23, 2020
@maya-r
Copy link
Contributor Author

maya-r commented Sep 23, 2020

The NFS tests still fail locally as being flaky. Simply adding a wait between the tests helps, but I don't know what else I could be doing to un-flake them.

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 24, 2020
@kubevirt-bot kubevirt-bot added size/XL and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL labels Sep 24, 2020
The amount of available space in a filesystem is not exactly
the advertise amount. Things like indirect blocks or metadata
may use up some of this space. Reserving it to avoid reaching
full capacity by default.

This value is configurable from the CDIConfig object spec,
both globally and per-storageclass.

The default value is 0.055, or "5.5% of the space is
reserved". This value was chosen because some filesystems
reserve 5% of the space as overhead for the root user and
this space doubles as reservation for the worst case
behaviour for unclear space usage. I've chosen a value
that is slightly higher.

This validation is only necessary because we use sparse
images instead of fallocated ones, which was done to have
reasonable alerts regarding space usage from various
storage providers.

---

Update CDIConfig filesystemOverhead status, validate, and
pass the final value to importer/upload pods.

Only the status values controlled by the config controller
are used, and it's filled out for all available storage
classes in the cluster.

Use this value in Validate calls to ensure that some of the
space is reserved for the filesystem overhead to guard from
accidents.

Caveats:

Doesn't use Default: to define the default of 0.055, instead
it is hard-coded in reconcile. It seems like we can't use a
default value.

Validates the per-storageClass values in reconcile, and
doesn't reject bad values.

Signed-off-by: Maya Rashish <mrashish@redhat.com>
Signed-off-by: Maya Rashish <mrashish@redhat.com>
Signed-off-by: Maya Rashish <mrashish@redhat.com>
@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented Sep 24, 2020

@maya-r: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-containerized-data-importer-e2e-os-3.11.0-crio d53bec6 link /test pull-containerized-data-importer-e2e-os-3.11.0-crio

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@maya-r
Copy link
Contributor Author

maya-r commented Sep 30, 2020

/test pull-containerized-data-importer-e2e-k8s-1.17-ceph

Copy link
Member

@awels awels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good, a few comments and one want. I would like to see a functional test in cdiconfig_test.go that verifies that if you add an invalid value for the percent field, that the update fails. So things like -0.04 or 1.00 or 0.9999 which don't match the regex.

pkg/controller/util.go Show resolved Hide resolved
if err := utils.DeletePV(client, pv); err != nil {
return err
}
if err := utils.WaitTimeoutForPVDeleted(client, pv, timeout); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimization here, right now you are doing the following:

  1. Delete PV
  2. Wait For PV to actually be gone
  3. Go to 1
    So you are sequentially waiting for the PV to be deleted before starting a new delete. How about you do
for i := 1; i <= pvCount; i++ {
  pv := nfsPVDef(strconv.Itoa(i), utils.NfsService.Spec.ClusterIP)
  if err := utils.DeletePV(client, pv); err != nil {
    return err
  }
}
// kicked off deletion of all PVs I care about
for i := 1; i <= pvCount; i++ {
  if err := utils.WaitTimeoutForPVDeleted(client, pv, timeout); err != nil {
    return err
  }
}

In this order you start the deletion of the PVs before waiting for any of them to really be deleted. That way the deletion is in parallell while the waiting is sequential. And hopefully the wait will be less because when you get to the wait for the later PVs, they will be gone already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the way it is now, it looks like you are waiting for just the last PV (since the PV variable is set to the last at the end of the first loop). But there is no guarantee in the order in which the PVs are deleted, it is possible at this point that some of the earlier ones aren't gone yet. I would either just re-create the pv defs in the second loop, or store them in a slice and loop over the slice in the second loop to ensure all the PVs are gone at the end.

storageClassNameOverhead, found := perStorageConfig[storageClassName]

if found {
valid, err := validOverhead(storageClassNameOverhead)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a regexp on the field now, to enforce 0 <= overhead <= 1

@@ -440,6 +442,11 @@ func (r *ImportReconciler) createImportEnvVar(pvc *corev1.PersistentVolumeClaim)
if err != nil {
return nil, err
}
fsOverhead, err := GetFilesystemOverhead(r.uncachedClient, pvc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to use the uncached client here? I see the cached client being used in the uploadServer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, missed it. thanks

Intended to help with flakes, but didn't make a difference.
Probably still worth doing.

Signed-off-by: Maya Rashish <mrashish@redhat.com>
Signed-off-by: Maya Rashish <mrashish@redhat.com>
Note that this change isn't expected to make a difference, as we
check if the targetStorageClass is nil later on and have the same
behaviour, but this is probably more correct API usage.

Signed-off-by: Maya Rashish <mrashish@redhat.com>
Signed-off-by: Maya Rashish <mrashish@redhat.com>
Copy link
Member

@awels awels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a small logic error in the nfs deletion logic. Otherwise I am happy.

if err := utils.DeletePV(client, pv); err != nil {
return err
}
if err := utils.WaitTimeoutForPVDeleted(client, pv, timeout); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the way it is now, it looks like you are waiting for just the last PV (since the PV variable is set to the last at the end of the first loop). But there is no guarantee in the order in which the PVs are deleted, it is possible at this point that some of the earlier ones aren't gone yet. I would either just re-create the pv defs in the second loop, or store them in a slice and loop over the slice in the second loop to ensure all the PVs are gone at the end.

Wait for all of them, not just the last one.

Signed-off-by: Maya Rashish <mrashish@redhat.com>
Copy link
Member

@awels awels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Oct 1, 2020
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: awels

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 1, 2020
@kubevirt-bot kubevirt-bot merged commit b91887e into kubevirt:master Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants