Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage capacity GA #710

Merged
merged 5 commits into from Apr 27, 2022
Merged

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Mar 3, 2022

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

Kubernetes 1.24 will mark the v1beta1 CSIStorageCapacity API as deprecated and introduces it as v1. To avoid warnings from client-go, external-provisioner must use that v1 API when running on Kubernetes >= 1.24.

Special notes for your reviewer:

To avoid making Kubernetes 1.24 a hard requirement for the next external-provisioner release, a small(is) shim layer converts objects back and forth. external-provisioner detects automatically whether the v1 API is available.

This PR has to be updated once client-go 1.24(pre?) is available.

Does this PR introduce a user-facing change?:

When running on a Kubernetes cluster where the v1 CSIStorageCapacity API is available, external-provisioner automatically switches to that version instead of using the deprecated v1beta1 API. Kubernetes 1.24 will mark the v1beta1 CSIStorageCapacity API as deprecated and introduces it as v1.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 3, 2022
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 29, 2022
@pohly pohly force-pushed the storage-capacity-ga branch 2 times, most recently from aef196e to 3a790ee Compare April 1, 2022 07:26
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 1, 2022
@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Apr 1, 2022

@pohly: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-csi-external-provisioner-distributed-on-kubernetes-1-21 38ce766 link true /test pull-kubernetes-csi-external-provisioner-distributed-on-kubernetes-1-21

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@pohly
Copy link
Contributor Author

pohly commented Apr 1, 2022

/hold

Let's test with pull-kubernetes-csi-external-provisioner-distributed-on-kubernetes-master once it is available: kubernetes/test-infra#25840

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 1, 2022
@pohly
Copy link
Contributor Author

pohly commented Apr 1, 2022

We also need (?) to wait for Kubernetes 1.24.0, unless we are fine with client-go 1.24.0-beta.1.

@pohly
Copy link
Contributor Author

pohly commented Apr 1, 2022

/test pull-kubernetes-csi-external-provisioner-distributed-on-kubernetes-master

@pohly
Copy link
Contributor Author

pohly commented Apr 1, 2022

Works as intended:

I0401 09:52:26.183417       1 csi-provisioner.go:458] producing CSIStorageCapacity objects with fixed topology segment [topology.hostpath.csi/node: csi-prow-worker2]
I0401 09:52:26.187797       1 csi-provisioner.go:498] using the CSIStorageCapacity v1beta1 API
I0401 09:52:26.187958       1 capacity.go:339] Capacity Controller: topology changed: added [0xc0009300a8 = topology.hostpath.csi/node: csi-prow-worker2], removed []
I0401 09:52:26.188838       1 controller.go:732] Using saving PVs to API server in background
I0401 09:52:26.189254       1 reflector.go:219] Starting reflector *v1beta1.CSIStorageCapacity (1h0m0s) from k8s.io/client-go/informers/factory.go:134
I0401 09:52:26.189272       1 reflector.go:255] Listing and watching *v1beta1.CSIStorageCapacity from k8s.io/client-go/informers/factory.go:134
I0401 09:52:26.189333       1 reflector.go:219] Starting reflector *v1.StorageClass (1h0m0s) from k8s.io/client-go/informers/factory.go:134
I0401 09:52:26.189348       1 reflector.go:255] Listing and watching *v1.StorageClass from k8s.io/client-go/informers/factory.go:134
I0401 09:52:26.189493       1 reflector.go:219] Starting reflector *v1.PersistentVolumeClaim (15m0s) from k8s.io/client-go/informers/factory.go:134
I0401 09:52:26.189507       1 reflector.go:255] Listing and watching *v1.PersistentVolumeClaim from k8s.io/client-go/informers/factory.go:134
I0401 09:52:26.193029       1 capacity.go:373] Capacity Controller: storage class csi-hostpath-fast was updated or added
I0401 09:52:26.193120       1 capacity.go:480] Capacity Controller: enqueuing {segment:0xc0009300a8 storageClassName:csi-hostpath-fast}
I0401 09:52:26.193158       1 capacity.go:373] Capacity Controller: storage class csi-hostpath-slow was updated or added
I0401 09:52:26.193193       1 capacity.go:480] Capacity Controller: enqueuing {segment:0xc0009300a8 storageClassName:csi-hostpath-slow}
I0401 09:52:26.289487       1 shared_informer.go:285] caches populated
I0401 09:52:26.289549       1 shared_informer.go:285] caches populated
I0401 09:52:26.289598       1 capacity.go:243] Starting Capacity Controller
I0401 09:52:26.289634       1 controller.go:811] Starting provisioner controller hostpath.csi.k8s.io_csi-hostpathplugin-22bd2_b7174ed0-949b-4f86-8152-472b81ccdd28!
I0401 09:52:26.289749       1 clone_controller.go:66] Starting CloningProtection controller
I0401 09:52:26.289652       1 shared_informer.go:285] caches populated
I0401 09:52:26.289778       1 clone_controller.go:82] Started CloningProtection controller
I0401 09:52:26.289777       1 capacity.go:339] Capacity Controller: topology changed: added [0xc0009300a8 = topology.hostpath.csi/node: csi-prow-worker2], removed []
I0401 09:52:26.289879       1 capacity.go:480] Capacity Controller: enqueuing {segment:0xc0009300a8 storageClassName:csi-hostpath-fast}
I0401 09:52:26.289900       1 capacity.go:480] Capacity Controller: enqueuing {segment:0xc0009300a8 storageClassName:csi-hostpath-slow}
I0401 09:52:26.289937       1 volume_store.go:97] Starting save volume queue
I0401 09:52:26.289955       1 capacity.go:279] Initial number of topology segments 1, storage classes 3, potential CSIStorageCapacity objects 2
I0401 09:52:26.289965       1 capacity.go:290] Checking for existing CSIStorageCapacity objects
I0401 09:52:26.289964       1 reflector.go:219] Starting reflector *v1.StorageClass (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller/controller.go:848
I0401 09:52:26.289976       1 reflector.go:255] Listing and watching *v1.StorageClass from sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller/controller.go:848
I0401 09:52:26.290031       1 reflector.go:219] Starting reflector *v1.PersistentVolume (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller/controller.go:845
I0401 09:52:26.290047       1 reflector.go:255] Listing and watching *v1.PersistentVolume from sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller/controller.go:845
I0401 09:52:26.290065       1 capacity.go:255] Started Capacity Controller
I0401 09:52:26.290088       1 capacity.go:518] Capacity Controller: enqueuing {segment:0xc0009300a8 storageClassName:csi-hostpath-slow} for periodic update
I0401 09:52:26.290098       1 capacity.go:518] Capacity Controller: enqueuing {segment:0xc0009300a8 storageClassName:csi-hostpath-fast} for periodic update
I0401 09:52:26.290132       1 capacity.go:574] Capacity Controller: refreshing {segment:0xc0009300a8 storageClassName:csi-hostpath-fast}
I0401 09:52:26.290170       1 connection.go:183] GRPC call: /csi.v1.Controller/GetCapacity
I0401 09:52:26.290178       1 connection.go:184] GRPC request: {"accessible_topology":{"segments":{"topology.hostpath.csi/node":"csi-prow-worker2"}},"parameters":{"kind":"fast"},"volume_capabilities":[{"AccessType":{"Mount":null},"access_mode":{}}]}
I0401 09:52:26.292848       1 connection.go:186] GRPC response: {"available_capacity":107374182400,"maximum_volume_size":{"value":107374182400},"minimum_volume_size":{}}
I0401 09:52:26.293794       1 connection.go:187] GRPC error: <nil>
I0401 09:52:26.293868       1 capacity.go:643] Capacity Controller: creating new object for {segment:0xc0009300a8 storageClassName:csi-hostpath-fast}, new capacity 100Gi
I0401 09:52:26.300154       1 capacity.go:648] Capacity Controller: created csisc-c6wxg with resource version 911 for {segment:0xc0009300a8 storageClassName:csi-hostpath-fast} with capacity 100Gi
I0401 13:37:56.189893       1 csi-provisioner.go:439] using v1/Pod csi-hostpathplugin-tmhjg as owner of CSIStorageCapacity objects
I0401 13:37:56.189942       1 csi-provisioner.go:458] producing CSIStorageCapacity objects with fixed topology segment [topology.hostpath.csi/node: csi-prow-worker]
I0401 13:37:56.193863       1 csi-provisioner.go:502] using the CSIStorageCapacity v1 API
I0401 13:37:56.194592       1 capacity.go:339] Capacity Controller: topology changed: added [0xc000623bd8 = topology.hostpath.csi/node: csi-prow-worker], removed []
I0401 13:37:56.195596       1 controller.go:732] Using saving PVs to API server in background
I0401 13:37:56.196194       1 reflector.go:219] Starting reflector *v1.CSIStorageCapacity (1h0m0s) from k8s.io/client-go/informers/factory.go:134
I0401 13:37:56.196206       1 reflector.go:255] Listing and watching *v1.CSIStorageCapacity from k8s.io/client-go/informers/factory.go:134
I0401 13:37:56.196201       1 reflector.go:219] Starting reflector *v1.PersistentVolumeClaim (15m0s) from k8s.io/client-go/informers/factory.go:134
I0401 13:37:56.196222       1 reflector.go:255] Listing and watching *v1.PersistentVolumeClaim from k8s.io/client-go/informers/factory.go:134
I0401 13:37:56.196267       1 reflector.go:219] Starting reflector *v1.StorageClass (1h0m0s) from k8s.io/client-go/informers/factory.go:134
I0401 13:37:56.196283       1 reflector.go:255] Listing and watching *v1.StorageClass from k8s.io/client-go/informers/factory.go:134
I0401 13:37:56.201265       1 capacity.go:373] Capacity Controller: storage class csi-hostpath-fast was updated or added
I0401 13:37:56.201316       1 capacity.go:480] Capacity Controller: enqueuing {segment:0xc000623bd8 storageClassName:csi-hostpath-fast}
I0401 13:37:56.201365       1 capacity.go:373] Capacity Controller: storage class csi-hostpath-slow was updated or added
I0401 13:37:56.201372       1 capacity.go:480] Capacity Controller: enqueuing {segment:0xc000623bd8 storageClassName:csi-hostpath-slow}
I0401 13:37:56.296125       1 shared_informer.go:285] caches populated
I0401 13:37:56.296212       1 shared_informer.go:285] caches populated
I0401 13:37:56.296234       1 controller.go:811] Starting provisioner controller hostpath.csi.k8s.io_csi-hostpathplugin-tmhjg_7b7ed5d8-d3bc-4bff-9f0a-3b89add4b7fd!
I0401 13:37:56.296291       1 capacity.go:243] Starting Capacity Controller
I0401 13:37:56.296326       1 shared_informer.go:285] caches populated
I0401 13:37:56.296342       1 capacity.go:339] Capacity Controller: topology changed: added [0xc000623bd8 = topology.hostpath.csi/node: csi-prow-worker], removed []
I0401 13:37:56.296408       1 capacity.go:480] Capacity Controller: enqueuing {segment:0xc000623bd8 storageClassName:csi-hostpath-fast}
I0401 13:37:56.296433       1 capacity.go:480] Capacity Controller: enqueuing {segment:0xc000623bd8 storageClassName:csi-hostpath-slow}
I0401 13:37:56.296452       1 capacity.go:279] Initial number of topology segments 1, storage classes 3, potential CSIStorageCapacity objects 2
I0401 13:37:56.296471       1 capacity.go:290] Checking for existing CSIStorageCapacity objects
I0401 13:37:56.296540       1 capacity.go:255] Started Capacity Controller
I0401 13:37:56.296575       1 capacity.go:518] Capacity Controller: enqueuing {segment:0xc000623bd8 storageClassName:csi-hostpath-fast} for periodic update
I0401 13:37:56.296592       1 capacity.go:518] Capacity Controller: enqueuing {segment:0xc000623bd8 storageClassName:csi-hostpath-slow} for periodic update
I0401 13:37:56.296615       1 clone_controller.go:66] Starting CloningProtection controller
I0401 13:37:56.296667       1 clone_controller.go:82] Started CloningProtection controller
I0401 13:37:56.296696       1 volume_store.go:97] Starting save volume queue
I0401 13:37:56.296936       1 reflector.go:219] Starting reflector *v1.PersistentVolume (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller/controller.go:845
I0401 13:37:56.296966       1 reflector.go:255] Listing and watching *v1.PersistentVolume from sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller/controller.go:845
I0401 13:37:56.297324       1 reflector.go:219] Starting reflector *v1.StorageClass (15m0s) from sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller/controller.go:848
I0401 13:37:56.297362       1 reflector.go:255] Listing and watching *v1.StorageClass from sigs.k8s.io/sig-storage-lib-external-provisioner/v8/controller/controller.go:848
I0401 13:37:56.297445       1 capacity.go:574] Capacity Controller: refreshing {segment:0xc000623bd8 storageClassName:csi-hostpath-fast}
I0401 13:37:56.297509       1 connection.go:183] GRPC call: /csi.v1.Controller/GetCapacity
I0401 13:37:56.297518       1 connection.go:184] GRPC request: {"accessible_topology":{"segments":{"topology.hostpath.csi/node":"csi-prow-worker"}},"parameters":{"kind":"fast"},"volume_capabilities":[{"AccessType":{"Mount":null},"access_mode":{}}]}
I0401 13:37:56.299712       1 connection.go:186] GRPC response: {"available_capacity":107374182400,"maximum_volume_size":{"value":107374182400},"minimum_volume_size":{}}
I0401 13:37:56.300024       1 connection.go:187] GRPC error: <nil>
I0401 13:37:56.300042       1 capacity.go:643] Capacity Controller: creating new object for {segment:0xc000623bd8 storageClassName:csi-hostpath-fast}, new capacity 100Gi
I0401 13:37:56.307787       1 capacity.go:725] Capacity Controller: CSIStorageCapacity csisc-bsq4x with resource version 1373 matches {segment:0xc000623bd8 storageClassName:csi-hostpath-fast}
I0401 13:37:56.308313       1 capacity.go:648] Capacity Controller: created csisc-bsq4x with resource version 1373 for {segment:0xc000623bd8 storageClassName:csi-hostpath-fast} with capacity 100Gi

@pohly pohly changed the title WIP: storage capacity GA storage capacity GA Apr 4, 2022
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 4, 2022
@pohly
Copy link
Contributor Author

pohly commented Apr 4, 2022

We discussed this today and concluded that this can be merged with client-go 1.24-beta.0. We can update to 1.24 once it is available.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 22, 2022
The new API is needed because it includes CSIStorageCapacity v1.
This ensures that we stay up-to-date. In particular this includes CSI 1.6.
We only need the interface for CSIStorageCapacity. This enables writing a shim
between v1beta1 and v1 of that API.
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 25, 2022
@pohly
Copy link
Contributor Author

pohly commented Apr 25, 2022

@xing-yang: this PR is ready for merging again after I rebased. Can you review?

/hold cancel

Testing with pull-kubernetes-csi-external-provisioner-distributed-on-kubernetes-master was successful, but we can also test once more:

/test pull-kubernetes-csi-external-provisioner-distributed-on-kubernetes-master

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 25, 2022
clientFactory := capacity.NewV1ClientFactory(clientset)
cInformer := factoryForNamespace.Storage().V1().CSIStorageCapacities()

invalidCapacity := &storagev1.CSIStorageCapacity{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments before line 486 explaining why you are trying to create an object with an invalid name? Although people will get it when reading the code below, it is better to clarify earlier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,177 @@
/*
Copyright 2020 The Kubernetes Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2022

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@xing-yang
Copy link
Contributor

xing-yang commented Apr 27, 2022

To avoid making Kubernetes 1.24 a hard requirement for the next external-provisioner release, a small(is) shim layer converts objects back and forth. external-provisioner detects automatically whether the v1 API is available.

When are you planning to remove this shim layer? 3 releases after 1.24 when we can remove v1beta1?
When we release the next external-provisioner, we'll document in the release note that 1.24 is the recommended version for this feature because that's when it moves to GA.

@pohly
Copy link
Contributor Author

pohly commented Apr 27, 2022

When are you planning to remove this shim layer? 3 releases after 1.24 when we can remove v1beta1?

As soon as the oldest supported release is guaranteed to have the v1 API.

The code itself uses the v1 API and directly uses the normal client-go API when
v1 is supported by the server. If the server doesn't support that API, wrappers
around the v1beta1 API convert CSIStorageCapacity objects back and forth as
needed.

The reason for this more complex solution is that it avoids a breaking change
in the external-provisioner and thus provides a smoother transition path: CSI
driver developers can update to the next external-provisioner release and use
it both with Kubernetes < 1.24 and >= 1.24.
@xing-yang
Copy link
Contributor

Can you also added this line "Kubernetes 1.24 will mark the v1beta1 CSIStorageCapacity API as deprecated and introduces it as v1." to the release note?

@pohly
Copy link
Contributor Author

pohly commented Apr 27, 2022

Done.

@xing-yang
Copy link
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 27, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pohly, xing-yang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 27, 2022
@k8s-ci-robot k8s-ci-robot merged commit 3d0a019 into kubernetes-csi:master Apr 27, 2022
@humblec humblec mentioned this pull request Jun 10, 2022
@nbalacha
Copy link
Contributor

This PR assumes that the pod will have access to the "default" storage namespace in order to create the CSIStorageCapacity , which is not always the case. Our pods run in a different namespace and the "NAMESPACE" env variable contains the namespace to which the pod have permissions.
Shouldn't this create check be performed in the same "NAMESPACE" value?

@humblec
Copy link
Contributor

humblec commented Jun 27, 2022

This PR assumes that the pod will have access to the "default" storage namespace in order to create the CSIStorageCapacity , which is not always the case. Our pods run in a different namespace and the "NAMESPACE" env variable contains the namespace to which the pod have permissions. Shouldn't this create check be performed in the same "NAMESPACE" value?

hmmmm https://github.com/kubernetes-csi/external-provisioner/pull/710/files#diff-963c2f5e2076b500e70c47c13f135cdbb29b81f270bc76d1c99f823fecd1b510R493 create the capacity object in default namespace..
Cc @pohly

@pohly
Copy link
Contributor Author

pohly commented Jun 27, 2022

That particular call is meant to fail. If it failed for "default" namespace with a permission error, then the check would have been successful... if the permission error had been accepted as one of the allowed error outcomes. That's not currently the case.

I assume you get the unexpected error when checking for the V1 CSIStorageCapacity API error?

Either using the configured namespace or adding the error code should solve this. Using the namespace is simpler.

@nbalacha: Can you try that fix and prepare a PR?

@nbalacha
Copy link
Contributor

That particular call is meant to fail. If it failed for "default" namespace with a permission error, then the check would have been successful... if the permission error had been accepted as one of the allowed error outcomes. That's not currently the case.

I assume you get the unexpected error when checking for the V1 CSIStorageCapacity API error?

Either using the configured namespace or adding the error code should solve this. Using the namespace is simpler.

@nbalacha: Can you try that fix and prepare a PR?

That is correct. I get
"F0625 23:37:41.981066 1 csi-provisioner.go:509] unexpected error when checking for the V1 CSIStorageCapacity API: csistoragecapacities.storage.k8s.io is forbidden: User "system:serviceaccount:openshift-storage:topolvm-controller" cannot create resource "csistoragecapacities" in API group "storage.k8s.io" in the namespace "default""

The call is expected to fail but it should not with a permission issue.

I would expect the fix to be to use the configured ns instead of default.

createdCapacity, err := clientset.StorageV1().CSIStorageCapacities(namespace).Create(ctx, invalidCapacity, metav1.CreateOptions{})

If this is fine, I can send a fix.

@humblec
Copy link
Contributor

humblec commented Jun 27, 2022

imo, namespace fix looks reasonable to me. Also, this may hit more users as most of the deployments run in namespaces other than default and RBAC is not available on default ns. In that case, we have to have a minor release with the fix.

@pohly
Copy link
Contributor Author

pohly commented Jun 27, 2022

I agree on the fix. It just would be good to know that it has been tested, because our CI obviously doesn't cover that case.

@nbalacha
Copy link
Contributor

I agree on the fix. It just would be good to know that it has been tested, because our CI obviously doesn't cover that case.

#753

Thanks @pohly . I have verified that it works on my setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants