Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow disabling controller/node service via command line flags #354

Closed
wants to merge 1 commit into from

Conversation

rfranzke
Copy link

@rfranzke rfranzke commented Apr 9, 2020

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR adds two new command line flags --run-controller-service and --run-node-service (both default to true) which allow to particularly disable individual services if not required.

This enables the use-case of running the CSI controllers (csi-provisioner, csi-attacher, etc., + the driver controller) separately outside of the cluster they are serving. The disk plugin (node service part of CSI driver) still runs inside the cluster.

Release note:

It is now possible to disable the controller service by setting `--run-controller-service=false`. Similarly, it is possible to disable the node service by setting `--run-node-service=false`. The latter enables running the controller server of the Azure Disk driver separately/outside of the cluster it is serving.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 9, 2020
@k8s-ci-robot
Copy link
Contributor

Welcome @rfranzke!

It looks like this is your first PR to kubernetes-sigs/azuredisk-csi-driver 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/azuredisk-csi-driver has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @rfranzke. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 9, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rfranzke
To complete the pull request process, please assign feiskyer
You can assign the PR to them by writing /assign @feiskyer in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rfranzke
Copy link
Author

rfranzke commented Apr 9, 2020

/assign @feiskyer

@andyzhangx
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 9, 2020
### `disk.csi.azure.com` driver services

Traditionally, you run the CSI controllers with the Azure Disk driver in the same Kubernetes cluster.
Though, there may be cases where you will only want to run a subset of the available driver services (for example, one scenario is running the controllers outside of the cluster they are serving (while the Azure Disk driver still runs inside the served cluster), but there might be others scenarios).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @rfranzke thanks for the contribution.
Your scenario is running controller out side of cluster, then what about just modify following script and run csi-azuredisk-controller out side of cluster?
https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/master/deploy/csi-azuredisk-controller.yaml
Then you don't need to modify the csi driver code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andyzhangx, sure, if that's possible, even better. What exactly do you suggest to modify?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so your have two k8s clusters? just run this script in outside cluster:
https://github.com/kubernetes-sigs/azuredisk-csi-driver/blob/master/deploy/csi-azuredisk-controller.yaml

You may take care about the identity, it's now an azure.json file

value: "/etc/kubernetes/azure.json"

here is the tips to set that identity file: https://github.com/kubernetes-sigs/azuredisk-csi-driver#prerequisite

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe a quick description about the setup: Cluster A runs the control plane (KCM, CCM, CSI controllers (provisioner, attacher, etc.)) for the cluster B. In cluster B only the node part of the CSI shall be run (driver w/ node service, driver-registrar, liveness-probe). Hence, I will basically end up with two instances of the Azure Disk CSI plugin running responsible for cluster B: The controller service shall run outside in cluster A, and the node service shall run inside in cluster B.

I tried what you suggested, but then the CSI driver in A also starts the unused node service, and the CSI driver in B also starts the unused controller service (here, it's probably even more dangerous, because now two controller services are running and might conflict with each other (?)).

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rfranzke actually that's the design goal of CSI driver, one csi driver plugin could be used in both controller and node service, it's easy for deployment. There won't be conflict with that since in controller deployment, there should be csi-provisioner sidecar container, and in node deployment, there should be csi-node-driver-registrar sidecar container.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyzhangx I tried it again with v0.6.0 and it indeed already works out-of-the-box for Azure Disk CSI. Probably I misconfigured something yesterday. So, we don't need to code changes for now. I still think that allowing this optional separation makes sense and also helps that future changes don't break this scenario. Though, if you don't think it's a valueable improvement right now then I'll close this PR for now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, just now the CSI driver running inside the cluster got a nil pointer exception:

I0409 09:34:34.984567       1 utils.go:113] GRPC request: volume_id:"/subscriptions/0abd6c55-1c19-4eab-b070-913dec9f2ab0/resourceGroups/my-resource-group/providers/Microsoft.Compute/disks/pv-1317b820-7092-4da1-a83e-d5312e351d45" volume_path:"/var/lib/kubelet/pods/ef50f85b-8b81-4c5a-880e-be9132baf7dc/volumes/kubernetes.io~csi/pv-1317b820-7092-4da1-a83e-d5312e351d45/mount"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x126ea8f]

goroutine 80 [running]:
sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk.(*Driver).checkDiskExists(0xc00011d4d0, 0x17f74c0, 0xc0002d9860, 0xc000192000, 0xba, 0x469702, 0x1)
	/root/go/src/sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk/azuredisk.go:178 +0xcf
sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk.(*Driver).NodeGetVolumeStats(0xc00011d4d0, 0x17f74c0, 0xc0002d9860, 0xc000336320, 0xc00011d4d0, 0x22e44a2, 0xc00011aa80)
	/root/go/src/sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk/nodeserver.go:331 +0x25e
sigs.k8s.io/azuredisk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetVolumeStats_Handler.func1(0x17f74c0, 0xc0002d9860, 0x151ba00, 0xc000336320, 0x1, 0x1, 0x17d, 0x18)
	/root/go/src/sigs.k8s.io/azuredisk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:5630 +0x86
sigs.k8s.io/azuredisk-csi-driver/pkg/csi-common.logGRPC(0x17f74c0, 0xc0002d9860, 0x151ba00, 0xc000336320, 0xc0002f23c0, 0xc0002f23e0, 0xc00001fb30, 0x48b578, 0x14d8960, 0xc0002d9860)
	/root/go/src/sigs.k8s.io/azuredisk-csi-driver/pkg/csi-common/utils.go:115 +0x1ed
sigs.k8s.io/azuredisk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetVolumeStats_Handler(0x159c8a0, 0xc00011d4d0, 0x17f74c0, 0xc0002d9860, 0xc0000b6ea0, 0x167d9b8, 0x17f74c0, 0xc0002d9860, 0xc0000c89a0, 0x159)
	/root/go/src/sigs.k8s.io/azuredisk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:5632 +0x14b
sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc.(*Server).processUnaryRPC(0xc000310000, 0x1813b80, 0xc000310900, 0xc0000fa000, 0xc0001418f0, 0x2365b00, 0x0, 0x0, 0x0)
	/root/go/src/sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc/server.go:1024 +0x4f4
sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc.(*Server).handleStream(0xc000310000, 0x1813b80, 0xc000310900, 0xc0000fa000, 0x0)
	/root/go/src/sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc/server.go:1313 +0xd97
sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0000a2900, 0xc000310000, 0x1813b80, 0xc000310900, 0xc0000fa000)
	/root/go/src/sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc/server.go:722 +0xbb
created by sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc.(*Server).serveStreams.func1
	/root/go/src/sigs.k8s.io/azuredisk-csi-driver/vendor/google.golang.org/grpc/server.go:720 +0xa1

Do you think it's related to my setup?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the test. About the cloud's metadata services, in controller, we already disabled it:

// Disable UseInstanceMetadata to mitigate a timeout issue from IMDS
// https://github.com/kubernetes-sigs/azuredisk-csi-driver/issues/168
d.cloud.Config.UseInstanceMetadata = false
, so that's reason why Azure CSI driver works?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok to close this PR now since it already works with your testing, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rfranzke the error could be due to disksClient is nil:

if _, rerr := d.cloud.DisksClient.Get(ctx, resourceGroup, diskName); rerr != nil {
return rerr.Error()

How did you set the credentials? read-from-secret only works on master branch now.

@rfranzke
Copy link
Author

rfranzke commented Apr 9, 2020

/close as discussed in #354 (review)

@rfranzke rfranzke closed this Apr 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants