Skip to content

Conversation

wongma7
Copy link
Contributor

@wongma7 wongma7 commented Jul 26, 2018

What this PR does / why we need it: currently, storage classes are allowed to be created with reclaimPolicy=="" which has no documented meaning. Our docs say that if storageclass.reclaimPolicy is unspecified/nil, the default reclaimPolicy is Delete but they say nothing about empty string/"". So we should make "" invalid.

If the empty string from storageClass.ReclaimPolicy gets passed to a PV, the PV defaulting resets it to Retain. IMO this is okay since PV.Spec.PersistentVolumeReclaimPolicy is not a pointer and we can consider this PV empty string behaviour "legacy" behaviour. But storageclasses should know better and should not be able to have reclaimPolicy=="" when we have documented that it must be either "nil" or one of the 2 valid policies.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Special notes for your reviewer: commit 1 fixes an issue where the error validation tests would succeed no matter what because none of them had volumeBindingMode set.

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 26, 2018
@k8s-ci-robot k8s-ci-robot requested review from deads2k and mbohlool July 26, 2018 16:53
@wongma7
Copy link
Contributor Author

wongma7 commented Jul 26, 2018

@kubernetes/sig-storage-pr-reviews @kubernetes/sig-storage-api-reviews
/kind bug

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. kind/bug Categorizes issue or PR as related to a bug. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Jul 26, 2018
@childsb
Copy link
Contributor

childsb commented Jul 27, 2018

/approve

@jsafrane
Copy link
Member

/lgtm
for the code

/assign @liggitt
for approval, validation will reject StorageClasses that were working before, but IMO it's a good idea here.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 27, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: childsb, jsafrane, wongma7
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: smarterclayton

If they are not already assigned, you can assign the PR to them by writing /assign @smarterclayton in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@liggitt
Copy link
Member

liggitt commented Jul 27, 2018

/hold

Tightening validation in a way that causes existing persisted data to be considered invalid is a breaking API change.

What is the current behavior of a storage class with reclaimPolicy:""?

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 27, 2018
@wongma7
Copy link
Contributor Author

wongma7 commented Jul 27, 2018

The current behaviour is that PVs provisioned for the storageclass are created with reclaimPolicy Retain

@liggitt
Copy link
Member

liggitt commented Jul 27, 2018

A possible approach to improve validation for objects which are fatally flawed is being explored in #64907, but I'm not sure that applies here (the produced objects do work, they just behave in potentially unexpected ways).

@liggitt
Copy link
Member

liggitt commented Jul 27, 2018

The options I see:

  1. making "" invalid breaks existing data, which isn't acceptable
  2. changing "" to default to "Delete" in API defaulting means a storage class read from a v1.12 and written to a v1.11 server (HA cluster version skew) would be rejected by the v1.11 server as trying to mutate the retainPolicy on update.
  3. making the PV controller turn a storage class with retainPolicy:"" into a PV with retainPolicy: Delete is an API-invisible behavior change. It would not affect existing PVs, but would mean a storage class from 1.11 would start to work differently in 1.12
  4. update documentation to indicate that a storage class with retainPolicy:"" produces PVs with retainPolicy:"Retain"

@wongma7
Copy link
Contributor Author

wongma7 commented Jul 27, 2018

@liggitt
I would add 1 more option which I like the best:

  1. Similar to 2 but we change "" to default to "Retain". The argument for this is that this is what PVs do (I neglected to say in comment above, but the reason storageClass.reclaimPolicy=="" results in PV.spec.reclaimPolicy is because of pv defaulting code https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/core/v1/defaults.go#L238). And we avoid the "a storage class from 1.11 would start to work differently in 1.12" problem. Argument against this is that users typically think of "" as equivalent to nil/"unspecified" (see: confusion over "" vs nil pvc.spec.storageClassName). But I would rather they be surprised that their "" has become Retain than surprised that their "" has become Delete, because the latter case might result in unexpected data loss.

3 is also okay I think. Our docs say the default provisioning policy is Delete and again, users think of nil=="". In other words, instead of blindly passing "" to the pv defaulting code, we change it to Delete first.

4 I guess you mean "Retain" in the last word there. I know the "" vs nil pvc.spec.storageClassName documentation sets the precedent that it's possible (but not easy) to teach users ""!=nil.

Also, I reeeally wish 1 were possible. Isn't there some way to automatically update all storageClasses with "" to "Retain" during an upgrade?

@wongma7
Copy link
Contributor Author

wongma7 commented Jul 27, 2018

On second thought the “unexpected data loss” risk might be overstated :) so options2 and 3 get some more points from me, but I still want 5 the most

@k8s-github-robot
Copy link

/test all

Tests are more than 96 hours old. Re-running tests.

@k8s-ci-robot
Copy link
Contributor

@wongma7: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-kops-aws 64e63e2 link /test pull-kubernetes-e2e-kops-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 3, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 3, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closed this PR.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. release-note-none Denotes a PR that doesn't merit a release note. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants