New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding new volume plugin for Rook #46843
Conversation
Hi @kokhang. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: kokhang Assign the PR to them by writing The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these OWNERS Files:
You can indicate your approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nitpicks.
api/openapi-spec/swagger.json
Outdated
@@ -2,7 +2,7 @@ | |||
"swagger": "2.0", | |||
"info": { | |||
"title": "Kubernetes", | |||
"version": "v1.7.0" | |||
"version": "v1.5.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
umm. :-) (not a great start on the review).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. I dont know how that happened. I just ran the script under /hack. Ill re run it again.
api/openapi-spec/swagger.json
Outdated
} | ||
}, | ||
"readOnly": { | ||
"description": "ReadOnly here will force the ReadOnly setting in VolumeMounts. Defaults to false.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReadOnly vs readOnly in the description; I'd think the first instance should be lowercased like the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is generated. Not sure if im able to change this key.
@kokhang, would you mind splitting this PR into two commits, one with "real" changes and second one with generated stuff? Now it's mixed together and it's hard to spot an error there. In addition, I don't like usage of TPR in Kubernetes code, it's obsoleted by CustomResourceDefinitions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kokhang Do you know that TPRs are going away? kubernetes/enhancements#95
pkg/volume/rook/attacher.go
Outdated
} | ||
|
||
// Create a VolumeAttach TPR instance | ||
tprName := generateTPRName(volumeSource.Cluster, volumeSource.VolumeGroup, volumeSource.VolumeID, string(nodeName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a TPR-based volume plugin doesn't seem appropriate in-tree, for many reasons:
- TPRs are being deprecated in favor of CustomResourceDefinitions
- the core depending on extension APIs seems like a layer violation. I'd expect something that wanted to manage attachment data this way to be added as a flex volume plugin.
- this exposes attachment data for PVs in namespaces which are based on user input in a pod spec that would allow cross-namespace access.
@kokhang is out on vacation so to keep the momentum going on this PR, @travisn and I will be helping out. Also we would like to propose this change for the 1.8-alpha so we have a bit more time to get it right and factor in all your feedback. @travisn will be splitting out the generated code commit to make this PR more readable. Regarding TPR vs CRD, we're happy to change to using CRDs. We'll address that in a follow up commit once all the CRD changes are merged. Regarding the layer violation, we agree that TPRs and CRDs are extension APIs. We'd argue that volume plugins are vendor-specific extensions too that happen to live in-tree today. It would seem equally odd if we were to create a first-class API object for volume attachments that only Rook uses. TPR/CRD seemed like a reasonable alternative, enables the Rook operator to remain declarative, and expose volume attachments to admins (for example Regarding flex volumes we don't think it a good experience for Rook users to install software at the host/kubelet level in order to use Rook. On certain cloud providers this requires creating new images which is high burden to adoption. Unfortunately the only way around that is to be an in-tree plugin today as is the case with other plugins. More context here rook/rook#432 (comment) if you're interested. Regarding exposing attachment data for PVs in namespaces -- good catch. Its not our intent that a pod spec could leak volume attachment info in an arbitrary namespace. We're going to change it so that the namespace used for the volume attachment TPR comes unconditionally from the StorageClass (volumeSpec->PV->StorageClass->clusterNamespace) or defaults to |
@liggitt @luxas @bassam @travisn: Even though TPR are being deprecated, the REST API is the same as CRD. In this plugin, im using a rest client to access these "TPR" resources. https://github.com/kubernetes/kubernetes.github.io/blob/release-1.7/docs/tasks/access-kubernetes-api/extend-api-custom-resource-definitions.md#creating-a-customresourcedefinition |
c73a621
to
6f73aad
Compare
Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This is a very interesting use of I'm going to side with @liggitt @luxas and @jsafrane on this. I don't think a first-party in-tree volume plugin should be creating I understand the argument that you can provide a better user experience (and maybe we should even consider an attachment object in the core API), but if you'd still like to do this, I'd like to see stronger justification. I suggest having the in-tree volume plugin make blocking calls to the rook control-plane for attach/detach, and once CSI is mature and this volume plugin moves out-of-tree then it can revert to the original proposed declarative model (there is nothing wrong with an external component creating |
@saad-ali thanks for the feedback.
One supporting point that I left out is that currently the Rook Operator is designed to be like other K8S controllers -- it watches API objects and reconciles desired and actual state. It does not currently listen on an API endpoint. We followed the design principles of other controllers and made it level-triggered (see architectural diagram below). While we can certainly add an edge-triggered RESTful API for attach/detach the implementation would likely turnaround and create the TPRs/CRDs anyway. It would also mean that the plugin would need to discover the new Rook API endpoints -- in its current state the rook plugin only communicates with the K8S API. This seems like unnecessary complication especially when all of this is transitional and will be moving out when CSI matures. Let us know if this discussion helps our case. We're committed to getting this work into 1.8 release either way. |
This has come up a few times but I really think that the Kubernetes extension APIs, particularly CRD, are something that plugin controllers should be able to rely upon. Inevitably these plugins require external configuration (e.g. auth, ips, etc) or storage of reconciliation state that is being reconciled on the target system (e.g. async storage requests, etc). And I feel that CRDs are exactly the place that controllers should be storing this stuff. And doubly so as we think about the architecture of these plugins moving out of tree. CRDs are built into the core API server and persist to the exact same data store as the rest of the API. So, what is the layering issue? That this controller is making requests to the API server based on a user initiated request? |
- The rook cluster namespace is where volumeattach TPRs are created - Removing volume source metadata since it is not needed - Getting devicePath from TPR
6f73aad
to
7478822
Compare
@saad-ali @smarterclayton @liggitt I value the discussion and feedback from the community. At this point the pushback seems to be in the "preference" or "historically different" category. Given the impact on the user experience (see yaml above) we would like to proceed with the current design. We don't want our users to manage endpoints and secrets when they don't need to. We hope that this approach can even influence other plugins and CSI (which we are participating in). Would it be helpful to get on a hangout call to discuss? Please advise of next steps. |
CRDs are available, but the custom API used by the volume plugin is not. A core API depending on a custom API is the issue.
I disagree. The mismatch is an in-tree volume plugin that requires the existence of an API in the kubernetes server that is not actually defined by all kubernetes servers. |
Thats true for all plugins today. For example, the pv-controller (in-tree, core-api) calls the ebs plugin (in-tree) which in turn calls custom aws api (out-of-tree) to create/attach a volume.
It's a plugin -- i.e. optional. A user wanting to use this plugin expects that any external dependencies (custom API etc.) needs to be made available. This is true regardless of whether the plugin goes directly to the custom API, or indirectly through a CRDs. It's also true for add ons like DNS. @liggitt if there are alternative designs that don't push users to manage endpoints and secrets, we're happy to explore them. Also, I'd like to understand the process the K8S team follows when there is a disagreement on design. Does this need a LGTM from all or some @kubernetes/api-approvers? Is this the correct list: https://github.com/kubernetes/kubernetes/blob/master/pkg/api/OWNERS? |
The important difference is that the EBS plugin assumes the AWS API is provided by servers which actually do provide that API. The gluster plugin requires specifying the service that provides the gluster API. As written, the rook volumeSource and backing plugin assume the rook API is provided by a server which does not actually provide that API.
If rook were a flex volume plugin (or equivalent) that required setting up a rook custom resource as part of installation, it would layer cleanly on top of the existing kubernetes API and architecture. |
why is this an important difference? This separation is at the heart of level-triggered / controller-based approaches, where CRUD operations are performed on a set of API objects representing desired state and a controller reconciles the state.
I would hope that gluster and other plugins could also simplify the user experience too by removing the need for an admin to manage endpoints and secrets. Deeper integration helps the overall Kubernetes ecosystem. @saad-ali hinted earlier that we should explore adding a first-class volume attachment API object, for example.
As I mentioned above, unfortunately flex volumes do not improve the user experience. they require installation of software on the host. On certain cloud providers this requires creating new images which is high burden to adoption. Also its my understanding that flex volumes don't have a full fidelity implementation of dynamic volume provisioning. |
@LiGgit and I had a good discussion (https://kubernetes.slack.com/messages/C09QZFCE5/) on slack. The TL;DR version is as follows. @liggitt please chime in if I missed something.
Proposal A: Rook plugin call new blocking API ("more core")The Rook controller implements an new API server. Storage classes would need to specify an endpoint to the API server along with secrets/credentials.
Proposal B: Rook plugin uses CRDs ("more extension")The rook plugin continues to use CRDs (as coded above). The plugin creates an instance of the CRD representing a volume attachment. The Rook controller will reconcile the CRD and attach the volume to the requested node. The storage class does not any endpoints or secrets.
I hope this sums it up for the reviewers. My perspective: this looks like a grey area and root cause is the tight coupling of volume plugins (an extension) to the core. Instead of complicating the user experience or asking that rook be delayed until CSI is implemented, I would hope that the reviewers would focus on expanding the Kubernetes eco-system and supporting the growing Rook users. If the rook plugin is an eye-sore then let's focus this energy into fixing volume plugins with CSI rather than penalizing Rook users/community. |
There are two elements, the representation in the kubernetes API (the pod spec or persistent volume source) and the backing volume plugin. If a volume plugin requires compiled-in, always-present representation in the kubernetes v1 API, it cannot reasonably be considered an extension. Flex volumes (both the API volumeSource representation and the volume plugin mechanism) are the only actual extension point for volume plugins today. If those are deficient, I agree efforts should be directed toward improving those extension points, as they are with CSI.
If the proper extension mechanism isn't ready, it seems short-sighted to add features anyway in a way we know we don't want long-term (especially if it involves APIs, which have long-term implications) |
return false | ||
} | ||
|
||
func (plugin *rookPlugin) SupportsMountOption() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw to support mount options for rook, you will have to also add Rook volume here - https://github.com/kubernetes/kubernetes/blob/master/pkg/volume/validation/pv_validation.go#L50
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great i will. Thank you
I agree with @liggitt. I would prefer to not add any more hardcoded volume sources to Kubernetes. (BTW, I don't receive github notifications.) |
We (the Rook team) have received a lot of feedback on this PR. Some of it passionate and all of it polite. This has been helpful and we are grateful. It feels like we’ve reached some sort-of stalemate and I thought sharing more of our motivations/goals would help in the search for a solution. With Rook we are trying to dramatically simplify the provisioning and consumption of storage and our vision goes well beyond the storage presentations/technologies currently part of Rook. We believe that the path to success involves embracing ease of use by administrators, developers, DevOps, SREs, etc. To that end we have been embracing Kubernetes and honestly want to be the best storage provider on the platform. With respect to this PR, we are looking for a solution that spares the user from knowing virtually anything about the internal workings in the storage layer. We don’t want the user to specify obscure end-points, addresses, ports, or secrets. We literally want the user to say “Hey Rook, I need 100Tb of block storage” in the simplest way possible. Furthermore we want to leverage Kubernetes’ authentication mechanism rather than implement our own. In the fullness of time there are several possible paths (ex: CSI) but for 1.8 we need to find a compromise solution. We understand the layering concern with CRD’s and are now proposing a solution using config maps which is a compromise because, unlike CRD’s, config maps are not first class objects (that is, you can’t view volume attachments with kubectl). Yet there is resistance on this path too. Here’s the ask. Given our goals for Rook how should we proceed? We are willing to do the lifting here because we see giving users the best experience as the goal. Does it make sense to have a meeting on this? /cc @saad-ali |
@DanKerns @bassam Rook team, I understand the frustration, and extend my apologies. We are still early in the 1.8 schedule so this is not at risk of missing the release yet. And rest assured, I really want to see this get in to 1.8 and will do my best to make that happen. This is the first volume plugin doing things differently. My natural instinct when I see that is to to ask why, and whether the added complexity is justified. In this case, I'm not yet sure. The added scrutiny is not because this approach is necessarily wrong, simply because it is different and accepting it sets a precedence. So please bear with us. For next steps, instead of back and forth online, how about we hold a meeting early next week with the folks in this thread to see if we can all agree on a solution. Does 10 AM PT Tuesday July 11 work for you? For those of you without a public email address in your profile, please email me with subject line "Rook Volume Plugin Meeting" so that I can add you to the invite. CC @thockin |
We held a meeting earlier today between the Rook team, members of the Kubernetes Steering Committee, and the Kubernetes Storage SIG. A lot of the tension in this PR arises from the fact that this is an in-tree volume plugin and therefore it inherits the powers of its callers (attach/detach controller and kubelet)--this combined with the unorthodox approach of a volume plugin creating API objects raises questions around abstraction layer violation, possible dependency issues, and security. These issues, we agree, can be circumvented if the volume plugin was out-of-tree. However, CSI, the future mechanism for out-of-tree volume plugins, is not yet ready; and Flex volume, the existing mechanism for out-of-tree volume plugins, has usability issues: most notably, the fact that it requires a driver file to be dropped somewhere on each node machine and a kubelet restart (direct machine access is a bar that may be too high a bar for some users and counter to the k8s ideal). As a compromise, we agreed that the storage SIG will pursue a solution to improve the deployment story for Flex volume drivers in 1.8 in order to improve the situation until CSI is ready. Specifically we would introduce a mechanism to deploy flex volume drivers to machines through the Kubernetes API (e.g. DaemonSets)--no direct machine access or kubelet reboots required. Around 1 month before code freeze, we will reassess with the Rook team if the proposed improvements are sufficient to satisfy the usability concerns. If they are, the Rook team will scrap this PR, and move towards a Flex volume driver. Otherwise this PR will be reconsidered for 1.8 pending a thorough security review. Looking beyond, Bassam and I will work together to make sure the storage controller centric model is something that works with CSI. |
Hi all, sorry that I wasn't able to join the meeting yesterday The Rook team already wanted to "go core" in v1.6 IIRC for their v0.4.0 release, but we talked design and the proposal to implement the already in-tree I can't say I support that given the layer violations and other possible side projects, but as pointed out k8s doesn't make it easy for Rook to go the full external route either. I want Rook to be a standalone project with its own release schedule, codebase and priorities. I think both projects will greatly benefit of that separation. Last time I talked to @bassam there were still some unresolved discussions regarding how we can go the external route, I'll sync up with him now to get the newest updates on that matter and help out. I'm sorry that I couldn't make the meeting and if I've reiterated something here that was discussed yesterday. I'll do my best to sync with the team and find a solution that is viable for everyone. |
@luxas we are still aiming for an out-of-tree solution but one that does not compromise our goal of providing the best UX for Rook users. That is why the k8s storage-sig agreed to improve flex volume for 1.8. Especially, but not limited, to deploying drivers via daemonset/containers and not having to restart kubelet and mess up with the k8s nodes. |
Rook is doing a flexvolume. Closing this |
What this PR does / why we need it: This adds a new volume plugin to Kubernetes to attach and mount Rook volumes in Kubernetes. Provisioning Rook volumes is done outside via external-provisioner. https://www.github.com/rook/rook
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #46451Special notes for your reviewer:
This plugin is only for attaching and mounting the Rook volume. Rook will use the external volume provisioner for create and deleting. The plugin is very thin. The attach/detach functions just create/delete TPRs to illustrate the desired state of getting the volume attached. Rook will watch for this TPR and attach the volume appropriately. So all the attach and detach logic will be done by Rook. The TPR would be registered by Rook.
Release note: