Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retroactive default StorageClass assignement #3333

Open
2 of 4 tasks
jsafrane opened this issue Jun 2, 2022 · 25 comments
Open
2 of 4 tasks

Retroactive default StorageClass assignement #3333

jsafrane opened this issue Jun 2, 2022 · 25 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lead-opted-in Denotes that an issue has been opted in to a release sig/storage Categorizes an issue or PR as relevant to SIG Storage. stage/beta Denotes an issue tracking an enhancement targeted for Beta status tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team
Milestone

Comments

@jsafrane
Copy link
Member

jsafrane commented Jun 2, 2022

Enhancement Description

@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 2, 2022
@jsafrane
Copy link
Member Author

jsafrane commented Jun 2, 2022

/sig storage

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 2, 2022
@jsafrane
Copy link
Member Author

jsafrane commented Jun 2, 2022

/milestone v1.25

@k8s-ci-robot k8s-ci-robot added this to the v1.25 milestone Jun 2, 2022
@jsafrane
Copy link
Member Author

jsafrane commented Jun 2, 2022

/kind feature
/stage alpha

@k8s-ci-robot k8s-ci-robot added stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status kind/feature Categorizes issue or PR as related to a new feature. labels Jun 2, 2022
@Priyankasaggu11929 Priyankasaggu11929 added the tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team label Jun 3, 2022
@jsafrane jsafrane changed the title Reconcile default StorageClass in PersistentVolumeClaims Retroactive default StorageClass assignement Jun 3, 2022
@jasonbraganza
Copy link
Member

jasonbraganza commented Jun 6, 2022

Hello @jsafrane, @RomanBednar 👋, 1.25 Enhancements team here.

Just checking in as we approach enhancements freeze on 18:00 PST on Thursday June 16, 2022.

For note, This enhancement is targeting for stage alpha for 1.25 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • KEP file using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable
  • KEP has a updated detailed test plan section filled out
  • KEP has up to date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements.

Looks like for this one, we would need to update the following:

For note, the status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well. Thank you!

@jasonbraganza
Copy link
Member

jasonbraganza commented Jun 13, 2022

Hello @jsafrane, @RomanBednar 👋, just a quick check-in again, as we approach the 1.25 enhancements freeze.

Please plan to get the lone item above that is pending, done before enhancements freeze on Thursday, June 16, 2022 at 18:00 PM PT.

For note, the current status of the enhancement is atat-risk. Thank you!

@Priyankasaggu11929
Copy link
Member

Priyankasaggu11929 commented Jun 17, 2022

With KEP PR #3337 merged, the enhancement is ready for the upcoming enhancements freeze.

For note, the status is now marked as tracked in the enhancements tracking sheet. Thank you!

@didicodes
Copy link

didicodes commented Jul 13, 2022

Hello @jsafrane 👋, 1.25 Release Docs Shadow here.

This enhancement is marked as ‘Needs Docs’ for the 1.25 release. Please follow the steps detailed in the documentation to open a PR against the dev-1.25 branch in the k/website repo. This PR can be just a placeholder at this time and must be created by August 4.


Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release. Thank you!

@Atharva-Shinde
Copy link

Atharva-Shinde commented Jul 25, 2022

Hi @jsafrane, Enhancements team here again 👋

Checking in as we approach Code Freeze at 01:00 UTC on Wednesday, 3rd August 2022.

Please ensure that the following items are completed before the code-freeze:

  • All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • All PRs are fully merged by the code freeze deadline.

Currently, the status of the enhancement is marked as at-risk

Thanks :)

@Atharva-Shinde
Copy link

Atharva-Shinde commented Aug 1, 2022

Hey @jsafrane, reaching out again as we approach Code Freeze at 01:00 UTC on this Wednesday i.e 3rd August 2022.
Try to get all the action items which are mentioned in the comment above done before the code-freeze :)
The status of the enhancement is still marked as at-risk

@jsafrane
Copy link
Member Author

jsafrane commented Aug 1, 2022

I linked this PR: kubernetes/kubernetes#111467

@Priyankasaggu11929
Copy link
Member

Priyankasaggu11929 commented Aug 3, 2022

Thanks @jsafrane. With k/k PR kubernetes/kubernetes#111467 merged now, the enhancement is marked as tracked.

@xing-yang
Copy link
Contributor

xing-yang commented Sep 7, 2022

/milestone v1.26

@RomanBednar
Copy link
Contributor

RomanBednar commented Sep 19, 2022

Performance was tested with kind on single node with 10 pods and 10 volumes per pod (=100 PVCs). Dynamic provisioning was enabled during the test. Cluster has been modified to use Immediate volume binding mode.

Results show that enabling the feature does not have negative impact on performance.

Test cluster modifications:

1) Custom PVC template with selected-node label (required for immediate volume binding):

$ cat ./testing/experimental/storage/pod-startup/volume-types/persistentvolume/pvc-custom.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: {{.Name}}
  labels:
    app: {{.Name}}
    group: {{.Group}}
{{ if .Provisioner }}
  annotations:
    volume.beta.kubernetes.io/storage-provisioner: {{.Provisioner}}
    volume.kubernetes.io/selected-node: kind-control-plane
{{ end }}
spec:
  accessModes:
    - ReadWriteOnce
  {{ if .StorageClass }}
  storageClassName: {{.StorageClass}}
  {{ end }}
  resources:
    requests:
      storage: {{.VolSize}}

2) Storage class with Immediate volume binding mode:

apiVersion: storage.k8s.io/v1 
kind: StorageClass 
metadata: 
 annotations: 
   storageclass.kubernetes.io/is-default-class: "true" 
 name: standard 
provisioner: rancher.io/local-path 
reclaimPolicy: Delete 
volumeBindingMode: Immediate

These are the test overrides I used:

PODS_PER_NODE: 10
VOLUMES_PER_POD: 10
NODES_PER_NAMESPACE: 1
START_PODS: false
VOL_SIZE: 128Mi
STORAGE_CLASS: standard
PROVISIONER: rancher.io/local-path
GATHER_METRICS: false
WAIT_FOR_PVS_DELETED: false
STEP_TIME_SECONDS: 500
# The custom PVC template mentioned above:
VOLUME_TEMPLATE_PATH: "volume-types/persistentvolume/pvc-custom.yaml"

Results:

1) Cluster with RetroactiveDefaultStorageClass feature disabled

$ kubectl -n kube-system get pod/kube-controller-manager-kind-no-feature-control-plane -o json | jq '.spec.containers[0].command[12]'
"--feature-gates=KubeletInUserNamespace=true"

$ kubectl -n kube-system get pod/kube-apiserver-kind-no-feature-control-plane -o json | jq '.spec.containers[0].command[11]'
"--feature-gates=KubeletInUserNamespace=true"

$ go run cmd/clusterloader.go -v=3 --report-dir=/tmp/clusterloader2-prov-no-feature --kubeconfig=/tmp/kubeconfig-no-feature --provider=kind --nodes=1 --testconfig=testing/experimental/storage/pod-startup/config.yaml --mastername=kind-no-feature-control-plane --master-internal-ip=10.89.0.3 --testoverrides=testing/experimental/storage/pod-startup/volume_binding/override.yaml --testoverrides=testing/experimental/storage/pod-startup/volume-types/persistentvolume/override.yaml --testoverrides=provision-override.yaml

Test result (junit):

<?xml version="1.0" encoding="UTF-8"?>
  <testsuite name="ClusterLoaderV2" tests="0" failures="0" errors="0" time="290.548">
      <testcase name="storage overall (testing/experimental/storage/pod-startup/config.yaml)" classname="ClusterLoaderV2" time="290.542663876"></testcase>
      <testcase name="storage: [step: 01] Provisioning volumes" classname="ClusterLoaderV2" time="10.064247582"></testcase>
      <testcase name="storage: [step: 02] Waiting for PVs to be bound [00] - WaitForPVCsToBeBound" classname="ClusterLoaderV2" time="220.084696883"></testcase>
      <testcase name="storage: [step: 03] Deleting volumes" classname="ClusterLoaderV2" time="10.113884574"></testcase>
  </testsuite>

2) cluster with RetroactiveDefaultStorageClass feature enabled

$ kubectl -n kube-system get pod/kube-controller-manager-kind-control-plane -o json | jq '.spec.containers[0].command[12]'
"--feature-gates=KubeletInUserNamespace=true,RetroactiveDefaultStorageClass=true"

$ kubectl -n kube-system get pod/kube-apiserver-kind-control-plane -o json | jq '.spec.containers[0].command[11]'
"--feature-gates=KubeletInUserNamespace=true,RetroactiveDefaultStorageClass=true"

$ go run cmd/clusterloader.go -v=3 --report-dir=/tmp/clusterloader2-prov --kubeconfig=/tmp/kubeconfig --provider=kind --nodes=1 --testconfig=testing/experimental/storage/pod-startup/config.yaml --mastername=kind-control-plane --master-internal-ip=10.89.0.2 --testoverrides=testing/experimental/storage/pod-startup/volume_binding/override.yaml --testoverrides=testing/experimental/storage/pod-startup/volume-types/persistentvolume/override.yaml --testoverrides=provision-override.yaml

Test result (junit):

<?xml version="1.0" encoding="UTF-8"?>
  <testsuite name="ClusterLoaderV2" tests="0" failures="0" errors="0" time="285.547">
      <testcase name="storage overall (testing/experimental/storage/pod-startup/config.yaml)" classname="ClusterLoaderV2" time="285.542948724"></testcase>
      <testcase name="storage: [step: 01] Provisioning volumes" classname="ClusterLoaderV2" time="10.082875582"></testcase>
      <testcase name="storage: [step: 02] Waiting for PVs to be bound [00] - WaitForPVCsToBeBound" classname="ClusterLoaderV2" time="220.08929458"></testcase>
      <testcase name="storage: [step: 03] Deleting volumes" classname="ClusterLoaderV2" time="10.097375571"></testcase>
  </testsuite>

@Atharva-Shinde
Copy link

Atharva-Shinde commented Sep 21, 2022

Hey @jsafrane 👋, 1.26 Enhancements team here!

Just checking in as we approach Enhancements Freeze on 18:00 PDT on Thursday 6th October 2022.

This enhancement is targeting for stage beta for 1.26 (correct me, if otherwise)

Here's where this enhancement currently stands:

  • KEP file using the latest template has been merged into the k/enhancements repo.
  • KEP status is marked as implementable
  • KEP has an updated detailed test plan section filled out
  • KEP has up to date graduation criteria
  • KEP has a production readiness review that has been completed and merged into k/enhancements.

For this KEP, we would need to:

  • Update the kep.yaml to reflect the current milestone information
  • Update the production readiness review with latest stage information
  • Include the new updated PR of this KEP in the Issue Description and get it merged before Enhancements Freeze to make this enhancement eligible for 1.26 release.

The status of this enhancement is marked as at risk. Please keep the issue description up-to-date with appropriate stages as well.
Thank you :)

@rhockenbury
Copy link

rhockenbury commented Sep 21, 2022

/stage beta

@k8s-ci-robot k8s-ci-robot added stage/beta Denotes an issue tracking an enhancement targeted for Beta status and removed stage/alpha Denotes an issue tracking an enhancement targeted for Alpha status labels Sep 21, 2022
@Atharva-Shinde
Copy link

Atharva-Shinde commented Oct 5, 2022

Hello @jsafrane 👋, just a quick check-in again, as we approach the 1.26 Enhancements freeze.

Please plan to get the action items mentioned in my comment above done before Enhancements freeze on 18:00 PDT on Thursday 6th October 2022 i.e tomorrow

For note, the current status of the enhancement is marked at-risk :)

@rhockenbury
Copy link

rhockenbury commented Oct 7, 2022

With #3544 merged, we have this down as tracked for v1.26. Thanks!

@RomanBednar
Copy link
Contributor

RomanBednar commented Oct 18, 2022

Rollout-rollback-rollout testing was performed, feature behaves as expected and no issues were observed. Posting my tests results:

Perform pre-upgrade tests

Set default storage class:

$ kc patch sc/csi-hostpath-sc -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/csi-hostpath-sc patched

PVC does not get updated and remains pending:

$ kc get pvc
NAME      STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
csi-pvc   Pending                              

Upgrade cluster

Check available 1.25 versions:

$ yum search kubeadm --showduplicates --quiet | grep 1.25
kubeadm-1.25.0-0.x86_64 : Command-line utility for administering a Kubernetes cluster.
kubeadm-1.25.1-0.x86_64 : Command-line utility for administering a Kubernetes cluster.
kubeadm-1.25.2-0.x86_64 : Command-line utility for administering a Kubernetes cluster.

Install/update kubeadm:

$ sudo yum install -y kubeadm-1.25.2-0

Prepare kubeadm config file that enables FeatureGate:

$ cat /mnt/clusterconf-examples/featuregate.yaml 
## Example kubeadm configuration for enabling a feature gate.
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
apiServer:
  extraArgs:
    feature-gates: RetroactiveDefaultStorageClass=true
controllerManager:
  extraArgs:
    cluster-cidr: 10.244.0.0/16
    feature-gates: RetroactiveDefaultStorageClass=true

Perform kubeadm upgrade:

$ sudo kubeadm upgrade plan --config /mnt/clusterconf-examples/featuregate.yaml
$ sudo kubeadm upgrade apply --config /mnt/clusterconf-examples/featuregate.yaml v1.25.2

Perform kubelet upgrade:

$ sudo yum install -y kubelet-1.25.2-0
$ sudo systemctl daemon-reload 
$ sudo systemctl restart kubelet

Perform post-upgrade tests

Verify that PVC got SC assigned right after upgrade and PV was provisioned and bound:

$ kc get pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
csi-pvc   Bound    pvc-06a964ca-f997-4780-8627-b5c3bf5a87d8   1Gi        RWO            csi-hostpath-sc   87m

Downgrade cluster

$ yum history | grep -E "kubeadm|kubelet"
    10 | install -y kubelet-1.25.2-0                                                                                                                                                                                                                                       | 2022-10-12 11:06 | Upgrade        |    1   
     8 | install -y kubeadm-1.25.2-0                                                                                                                                                                                                                                       | 2022-10-12 09:45 | Upgrade        |    1   
     7 | install -y kubelet-1.24.5-0 kubeadm-1.24.5-0 kubectl

$ sudo yum -y history undo 8 && sudo yum -y history undo 10

Perform post-rollback tests

Remove default SC:

$ kc patch sc/csi-hostpath-sc -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'
storageclass.storage.k8s.io/csi-hostpath-sc patched

Create new PVC without SC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: csi-pvc-2
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
$ kc get pvc
NAME        STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
csi-pvc     Bound     pvc-06a964ca-f997-4780-8627-b5c3bf5a87d8   1Gi        RWO            csi-hostpath-sc   96m
csi-pvc-2   Pending     

Add default SC again:

$ kc patch sc/csi-hostpath-sc -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/csi-hostpath-sc patched

Verify that the new PVC did not get updated with SC this time:

$ kc get pvc/csi-pvc-2
NAME        STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
csi-pvc-2   Pending    

Upgrade cluster again

Install/update kubeadm:

$ sudo yum install -y kubeadm-1.25.2-0

Perform kubeadm upgrade:

$ sudo kubeadm upgrade plan --config /mnt/clusterconf-examples/featuregate.yaml
$ sudo kubeadm upgrade apply --config /mnt/clusterconf-examples/featuregate.yaml v1.25.2

Perform kubelet upgrade:

$ sudo yum install -y kubelet-1.25.2-0
$ sudo systemctl daemon-reload 
$ sudo systemctl restart kubelet

Perform post-upgrade tests again

Verify that PVC got SC assigned right after upgrade and PV was provisioned and bound:

$ kc get pvc
NAME        STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
csi-pvc     Bound    pvc-06a964ca-f997-4780-8627-b5c3bf5a87d8   1Gi        RWO            csi-hostpath-sc   117m
csi-pvc-2   Bound    pvc-2e765394-f32c-42fb-b3db-ffe203612bac   1Gi        RWO            csi-hostpath-sc   24m

@RomanBednar
Copy link
Contributor

RomanBednar commented Oct 20, 2022

Tested version skew and sharing results below. First case is with feature enabled in API server and disabled in KCM and the second case is the other way around.

Case 1 - API on / KCM off

API server KCM Behavior
on off Existing Kubernetes behavior, only users can change pvc.spec.storageClassName=nil to a SC name.

Controller will not update PVC:

$ kc get pvc
NAME      STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
myclaim   Pending                                                     4s

$ kc patch sc/standard -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/standard patched

$ kc get pvc
NAME      STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
myclaim   Pending  

But API server will allow change from nil value:

$ kc get pvc/myclaim -o json | jq '.spec.storageClassName'
null

$ kc patch pvc/myclaim -p '{"spec":{"storageClassName":"standard"}}'
persistentvolumeclaim/myclaim patched (no change)

$ kc get pvc/myclaim -o json | jq '.spec.storageClassName'
"standard"

$ kc patch pvc/myclaim -p '{"spec":{"storageClassName":"test"}}'
The PersistentVolumeClaim "myclaim" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests for bound claim

Case 2 - API off / KCM on

API server KCM Behavior
off on PV controller may try to change pvc.spec.storageClassName=nil to a new default SC name, which will fail on the API server.

Attempt retroactive SC update:

$ kc get pvc
NAME      STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
myclaim   Pending                                                     4s

$ kc patch sc/standard -p '{"metadata":{"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/standard patched

$ kc get pvc
NAME      STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
myclaim   Pending                                                     13s

Check KCM to verify PV controller attempted the update but it was rejected by API server as expected:

I1019 08:42:12.056353  579577 storageclass.go:50] GetDefaultClass added: standard
I1019 08:42:12.056372  579577 pv_controller.go:953] assigning StorageClass[standard] to PersistentVolumeClaim[default/myclaim]
E1019 08:42:12.079545  579577 pv_controller_base.go:275] could not sync volume "default/myclaim": can't update PersistentVolumeClaim["default/myclaim"]: PersistentVolumeClaim "myclaim" is invalid: spec: Forbidden: spec is immutable after creation except resources.requests for bound claims
  core.PersistentVolumeClaimSpec{
  	... // 2 identical fields
  	Resources:        {Requests: {s"storage": {i: {...}, Format: "BinarySI"}}},
  	VolumeName:       "",
- 	StorageClassName: nil,
+ 	StorageClassName: &"standard",
  	VolumeMode:       &"Filesystem",
  	DataSource:       nil,
  	DataSourceRef:    nil,
  }

@rhockenbury
Copy link

rhockenbury commented Oct 29, 2022

Hi @RomanBednar 👋,

Checking in as we approach 1.26 code freeze at 17:00 PDT on Tuesday 8th November 2022.

Please ensure the following items are completed:

  • All PRs to the Kubernetes repo that are related to your enhancement are linked in the above issue description (for tracking purposes).
  • All PRs are fully merged by the code freeze deadline.

For this enhancement, it looks like the following PRs are open and need to be merged before code freeze:

Let me know if there's any other PRs that we should be tracking.

As always, we are here to help should questions come up. Thanks!

@rhockenbury
Copy link

rhockenbury commented Nov 9, 2022

With kubernetes/kubernetes#113329 merged, I have this marked as tracked for code freeze.

@krol3
Copy link

krol3 commented Nov 9, 2022

Hello @jsafrane 👋, 1.26 Release Docs Lead here. This enhancement is marked as ‘Needs Docs’ for 1.26 release.

Please follow the steps detailed in the documentation to open a PR against dev-1.26 branch in the k/website repo. This PR can be just a placeholder at this time, and must be created by November 9. Also, take a look at Documenting for a release to familiarize yourself with the docs requirement for the release.

Any doubt, reach us! Thank you!

@krol3
Copy link

krol3 commented Nov 14, 2022

Hello @RomanBednar and @jsafrane 👋 please take a look at Documenting for a release - PR Ready for Review to get your PR ready for review before deadline Tuesday 15th November 2022. Thank you!

@krol3
Copy link

krol3 commented Nov 16, 2022

Hi @RomanBednar and @jsafrane, about Documentation for 1.26, this KEP needs to be updated to state that Retroactive default StorageClass assignment is beta in 1.26
cc: @reylejano @Rishit-dagli

@krol3
Copy link

krol3 commented Nov 17, 2022

Hi @RomanBednar , @jsafrane , @xing-yang
Thank you for your doc PR here

cc: @Rishit-dagli @reylejano

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lead-opted-in Denotes that an issue has been opted in to a release sig/storage Categorizes an issue or PR as relevant to SIG Storage. stage/beta Denotes an issue tracking an enhancement targeted for Beta status tracked/yes Denotes an enhancement issue is actively being tracked by the Release Team
Projects
Status: Graduating
Development

No branches or pull requests

10 participants