New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow disabling controller/node service via command line flags #354
Conversation
Welcome @rfranzke! |
Hi @rfranzke. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: rfranzke The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/assign @feiskyer |
/ok-to-test |
### `disk.csi.azure.com` driver services | ||
|
||
Traditionally, you run the CSI controllers with the Azure Disk driver in the same Kubernetes cluster. | ||
Though, there may be cases where you will only want to run a subset of the available driver services (for example, one scenario is running the controllers outside of the cluster they are serving (while the Azure Disk driver still runs inside the served cluster), but there might be others scenarios). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @rfranzke thanks for the contribution.
Your scenario is running controller out side of cluster, then what about just modify following script and run csi-azuredisk-controller
out side of cluster?
https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/master/deploy/csi-azuredisk-controller.yaml
Then you don't need to modify the csi driver code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andyzhangx, sure, if that's possible, even better. What exactly do you suggest to modify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so your have two k8s clusters? just run this script in outside cluster:
https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/master/deploy/csi-azuredisk-controller.yaml
You may take care about the identity, it's now an azure.json
file
value: "/etc/kubernetes/azure.json" |
here is the tips to set that identity file: https://github.com/kubernetes-sigs/azuredisk-csi-driver#prerequisite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, maybe a quick description about the setup: Cluster A
runs the control plane (KCM, CCM, CSI controllers (provisioner, attacher, etc.)) for the cluster B
. In cluster B
only the node part of the CSI shall be run (driver w/ node service, driver-registrar, liveness-probe). Hence, I will basically end up with two instances of the Azure Disk CSI plugin running responsible for cluster B
: The controller service shall run outside in cluster A
, and the node service shall run inside in cluster B
.
I tried what you suggested, but then the CSI driver in A
also starts the unused node service, and the CSI driver in B
also starts the unused controller service (here, it's probably even more dangerous, because now two controller services are running and might conflict with each other (?)).
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rfranzke actually that's the design goal of CSI driver, one csi driver plugin could be used in both controller and node service, it's easy for deployment. There won't be conflict with that since in controller deployment, there should be csi-provisioner
sidecar container, and in node deployment, there should be csi-node-driver-registrar
sidecar container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andyzhangx I tried it again with v0.6.0 and it indeed already works out-of-the-box for Azure Disk CSI. Probably I misconfigured something yesterday. So, we don't need to code changes for now. I still think that allowing this optional separation makes sense and also helps that future changes don't break this scenario. Though, if you don't think it's a valueable improvement right now then I'll close this PR for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, just now the CSI driver running inside the cluster got a nil
pointer exception:
I0409 09:34:34.984567 1 utils.go:113] GRPC request: volume_id:"/subscriptions/0abd6c55-1c19-4eab-b070-913dec9f2ab0/resourceGroups/my-resource-group/providers/Microsoft.Compute/disks/pv-1317b820-7092-4da1-a83e-d5312e351d45" volume_path:"/var/lib/kubelet/pods/ef50f85b-8b81-4c5a-880e-be9132baf7dc/volumes/kubernetes.io~csi/pv-1317b820-7092-4da1-a83e-d5312e351d45/mount"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x126ea8f]
goroutine 80 [running]:
sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk.(*Driver).checkDiskExists(0xc00011d4d0, 0x17f74c0, 0xc0002d9860, 0xc000192000, 0xba, 0x469702, 0x1)
/root/go/src/sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk/azuredisk.go:178 +0xcf
sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk.(*Driver).NodeGetVolumeStats(0xc00011d4d0, 0x17f74c0, 0xc0002d9860, 0xc000336320, 0xc00011d4d0, 0x22e44a2, 0xc00011aa80)
/root/go/src/sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk/nodeserver.go:331 +0x25e
sigs.k8s.io/azuredisk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetVolumeStats_Handler.func1(0x17f74c0, 0xc0002d9860, 0x151ba00, 0xc000336320, 0x1, 0x1, 0x17d, 0x18)
/root/go/src/sigs.k8s.io/azuredisk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:5630 +0x86
sigs.k8s.io/azuredisk-csi-driver/pkg/csi-common.logGRPC(0x17f74c0, 0xc0002d9860, 0x151ba00, 0xc000336320, 0xc0002f23c0, 0xc0002f23e0, 0xc00001fb30, 0x48b578, 0x14d8960, 0xc0002d9860)
/root/go/src/sigs.k8s.io/azuredisk-csi-driver/pkg/csi-common/utils.go:115 +0x1ed
sigs.k8s.io/azuredisk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetVolumeStats_Handler(0x159c8a0, 0xc00011d4d0, 0x17f74c0, 0xc0002d9860, 0xc0000b6ea0, 0x167d9b8, 0x17f74c0, 0xc0002d9860, 0xc0000c89a0, 0x159)
/root/go/src/sigs.k8s.io/azuredisk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:5632 +0x14b
sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc000310000, 0x1813b80, 0xc000310900, 0xc0000fa000, 0xc0001418f0, 0x2365b00, 0x0, 0x0, 0x0)
/root/go/src/sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc/server.go:1024 +0x4f4
sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc.(*Server).handleStream(0xc000310000, 0x1813b80, 0xc000310900, 0xc0000fa000, 0x0)
/root/go/src/sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc/server.go:1313 +0xd97
sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0000a2900, 0xc000310000, 0x1813b80, 0xc000310900, 0xc0000fa000)
/root/go/src/sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc/server.go:722 +0xbb
created by sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
/root/go/src/sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc/server.go:720 +0xa1
Do you think it's related to my setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the test. About the cloud's metadata services, in controller, we already disabled it:
azuredisk-csi-driver/pkg/azuredisk/controllerserver.go
Lines 405 to 407 in 7948c4e
// Disable UseInstanceMetadata to mitigate a timeout issue from IMDS | |
// https://github.com/kubernetes-sigs/azuredisk-csi-driver/issues/168 | |
d.cloud.Config.UseInstanceMetadata = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok to close this PR now since it already works with your testing, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rfranzke the error could be due to disksClient is nil:
azuredisk-csi-driver/pkg/azuredisk/azuredisk.go
Lines 178 to 179 in 22f446b
if _, rerr := d.cloud.DisksClient.Get(ctx, resourceGroup, diskName); rerr != nil { | |
return rerr.Error() |
How did you set the credentials? read-from-secret only works on master branch now.
/close as discussed in #354 (review) |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR adds two new command line flags
--run-controller-service
and--run-node-service
(both default totrue
) which allow to particularly disable individual services if not required.This enables the use-case of running the CSI controllers (csi-provisioner, csi-attacher, etc., + the driver controller) separately outside of the cluster they are serving. The disk plugin (node service part of CSI driver) still runs inside the cluster.
Release note: