Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to only run selected CSI services #2316

Conversation

NotTheEvilOne
Copy link
Contributor

@NotTheEvilOne NotTheEvilOne commented Aug 5, 2023

What this PR does / why we need it:
This PR adds support to both cinder-csi-plugin and manila-csi-plugin to execute the controller and node services independently.

This enables use cases where the CSI driver controller deployment is not running on an OpenStack cloud e.g. in a multi-cloud environment. Furthermore it decouples the node ID discovery (either given or metadata server) from the controller service making workaround like --nodeid=fake-id unnecessary. The node ID parameter is no longer required therefore.

Which issue this PR fixes(if applicable):
Fixes #2523

Release note:

Add support to cinder-csi-plugin and manila-csi-plugin to only run selected services

@k8s-ci-robot k8s-ci-robot added the do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. label Aug 5, 2023
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 5, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Aug 5, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @NotTheEvilOne!

It looks like this is your first PR to kubernetes/cloud-provider-openstack 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/cloud-provider-openstack has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 5, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @NotTheEvilOne. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 5, 2023
@dulek
Copy link
Contributor

dulek commented Aug 7, 2023

/ok-to-test

I understand the intent here, but it feels smelly to me. Shouldn't we just provide 2 binaries instead? What's the point of running 2 services in a single container anyway?

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 7, 2023
@NotTheEvilOne
Copy link
Contributor Author

/ok-to-test

I understand the intent here, but it feels smelly to me. Shouldn't we just provide 2 binaries instead? What's the point of running 2 services in a single container anyway?

Thanks for your valuable questions. I'm sorry to come back to the topic late as I've been on holiday. I both agree and disagree with your idea of two separate binaries.

Looking at the Go code:

  • of both CSI driver implementations breaking changes (including but not limited to of course the different binary names) would be required and still a larger amount of boiler plate code (OpenStack client initializiation) would be shared (copy&paste) by both main entry points for controller and node binaries. Furthermore it's common to run both the controller and node on the same k8s node so UNIX socket's may become another issue.
  • implementations for GCP [1] and AWS [2] implement similar command line parameters to just select what the CSI driver should offer.

For that reason it seems to me that the one-binary approach is the least invasive implementation for the cinder and manila CSI drivers.

[1] https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/96226fcfe521d0f295c2d258e0c37954e93cfcc9/cmd/gce-pd-csi-driver/main.go#L41
[2] https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/a85fb6358eae7b83a083eb8003cf929b3f31d413/charts/aws-ebs-csi-driver/templates/controller.yaml#L72

@dulek
Copy link
Contributor

dulek commented Aug 22, 2023

/ok-to-test
I understand the intent here, but it feels smelly to me. Shouldn't we just provide 2 binaries instead? What's the point of running 2 services in a single container anyway?

Thanks for your valuable questions. I'm sorry to come back to the topic late as I've been on holiday. I both agree and disagree with your idea of two separate binaries.

Looking at the Go code:

  • of both CSI driver implementations breaking changes (including but not limited to of course the different binary names) would be required and still a larger amount of boiler plate code (OpenStack client initializiation) would be shared (copy&paste) by both main entry points for controller and node binaries. Furthermore it's common to run both the controller and node on the same k8s node so UNIX socket's may become another issue.
  • implementations for GCP [1] and AWS [2] implement similar command line parameters to just select what the CSI driver should offer.

Alright, this is a solid argument that other CSI drivers do a similar thing. Could we try to maintain some kind of consistency and rename --provide-*-service to --run-*-service then?

@NotTheEvilOne
Copy link
Contributor Author

Alright, this is a solid argument that other CSI drivers do a similar thing. Could we try to maintain some kind of consistency and rename --provide-*-service to --run-*-service then?

This consistency is not yet found. In fact even another Google CSI driver (gcp-filestore-csi-driver) names it differently (only controller and node as bool) and the Alibaba Cloud CSI driver implements it as "run-as-controller".

I've the feeling that "run" is the wrong word as it implies that another process or service is spawned which is not the case. Only registered services changes based on the bool parameters, that's why I named it "provide". I'll change it on demand of course :). Just let me know.

Some tests fail currently, I'll need to check if proper testing can be implemented for the new methods introduced.

@dulek
Copy link
Contributor

dulek commented Aug 24, 2023

/lgtm

Please take care of formal requirements - add a release note and fill CLA.

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Aug 24, 2023
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 23, 2023
@NotTheEvilOne NotTheEvilOne force-pushed the pr/csi-split-controller-and-node-server branch from fb21909 to a95a7bf Compare January 17, 2024 08:41
@k8s-ci-robot k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 17, 2024
@NotTheEvilOne
Copy link
Contributor Author

Hi,

I rebased the changes to be compatible with the current state again. I would highly appreciate your support to get this merged :)

Copy link
Contributor

@dulek dulek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me, question inline.

Comment on lines -109 to -111
if err := cmd.MarkPersistentFlagRequired("nodeid"); err != nil {
klog.Fatalf("Unable to mark flag nodeid to be required: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still do it when provideNodeService is true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a new validation in Driver.SetupNodeService() ensuring that the nodeid is not empty if required.

Marking it as required sounds wrong if provideNodeService is false, marking it only required if provideNodeService is true seems to be weird. The description of the PersistentFlag may be extended to note that it is required for the node service?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, true, we could update the description, otherwise looks good. Please update that, I'll do /approve anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like changes in this (and other) file are caused by your IDE formatting the code. It's fine, but it makes reviewing more difficult.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for that. I think the last commit got a bit mixed up as well when I rebased to HEAD. Let me know if I should restructure the commits :)

@dulek
Copy link
Contributor

dulek commented Jan 18, 2024

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dulek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 18, 2024
@dulek
Copy link
Contributor

dulek commented Jan 19, 2024

@jichenjc, @kayrus, @zetaab: Hi! Can you give your outlook here?

@jichenjc
Copy link
Contributor

@NotTheEvilOne can you help create an issue to track this change and
maybe we need update to the doc to reflect the change

@NotTheEvilOne
Copy link
Contributor Author

@NotTheEvilOne can you help create an issue to track this change and maybe we need update to the doc to reflect the change

I added issue #2523 for that PR. Should I add the documentation to this PR or another one so that this one can be merged if no further issues exist?

@dulek
Copy link
Contributor

dulek commented Jan 23, 2024

We would probably prefer updating the docs in this PR. It can be done in a new commit.

This commit updates the documentation for the CSI controller and node service providing parameters.
@dulek
Copy link
Contributor

dulek commented Jan 23, 2024

Docs look good for me, leaving the final approval to others.

@dulek
Copy link
Contributor

dulek commented Jan 23, 2024

These CI errors look unrelated.

/retest

@dulek
Copy link
Contributor

dulek commented Jan 24, 2024

/retest

2 similar comments
@NotTheEvilOne
Copy link
Contributor Author

/retest

@NotTheEvilOne
Copy link
Contributor Author

/retest

@dulek
Copy link
Contributor

dulek commented Jan 29, 2024

#2529 should fix the CIs.

@dulek
Copy link
Contributor

dulek commented Jan 30, 2024

/retest

@NotTheEvilOne
Copy link
Contributor Author

Looks like we finally made it :) Do you think we can merge it then? :)

@dulek
Copy link
Contributor

dulek commented Feb 2, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 2, 2024
@k8s-ci-robot k8s-ci-robot merged commit 7f1daa8 into kubernetes:master Feb 2, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support only running selected CSI services for the cinder-csi-plugin and manila-csi-plugin
4 participants