Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS CSI driver #10467

Merged
merged 1 commit into from
Jan 12, 2021
Merged

AWS CSI driver #10467

merged 1 commit into from
Jan 12, 2021

Conversation

olemarkus
Copy link
Member

No description provided.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 18, 2020
@olemarkus olemarkus force-pushed the ebs-csi branch 2 times, most recently from 450370d to 143a8e7 Compare December 18, 2020 19:21
@olemarkus olemarkus force-pushed the ebs-csi branch 5 times, most recently from 0adde02 to 9ae6852 Compare December 19, 2020 10:42
@olemarkus
Copy link
Member Author

/retest

if c.KubeControllerManager == nil || c.KubeControllerManager.ExternalCloudVolumePlugin != "aws" {
if c.CloudConfig == nil || c.CloudConfig.AWSEBSCSIDriver == nil || !fi.BoolValue(c.CloudConfig.AWSEBSCSIDriver.Enabled) {
allErrs = append(allErrs, field.Forbidden(field.NewPath("spec", "externalCloudControllerManager"),
"AWS external CCM cannot be used without enabling spec.cloudConfig.AWSEBSCSIDriver or setting spec.kubeControllerManaager.externalCloudVolumePlugin set to `aws`"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is needs to be a requirement. Users may have clusters that don't run any persistent volumes, for example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge right now is that everyone else need to actively set one of these values to get a working cluster. I think that when we by default enable either using external CCM for AWS and/or external CSI driver we can relax the relax the requirement.

I suspect that the CSI driver will be more popular than the external CCM for the time being anyway. CSI driver does bring in additional features.

@olemarkus
Copy link
Member Author

Not sure how much we really want to be configurable here.
In particular, there is a large amount of images involved and it could get messy allowing each of them to be configured.

Copy link
Member

@rifelpet rifelpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the helm chart's values file, I don't see any parameters that would be particularly important to expose as api fields for an initial implementation.

I think there are additional IAM permissions needed. The docs mention ec2:CreateSnapshot which neither the control plane nor node roles have, and the ebs-snapshot-controller StatefulSet isn't currently restricted to just the control plane nodes. Perhaps this is a good opportunity to use the UseServiceAccountIAM feature flag functionality.

- --endpoint=$(CSI_ENDPOINT)
- --logtostderr
- --k8s-tag-cluster-id={{ ClusterName }}
- --extra-volume-tags=KubernetesCluster={{ ClusterName }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flag is deprecated in favor of --extra-tags:
https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/f54138034204850e54c26f67dde3f8217339c09c/cmd/options/controller_options.go#L39-L40

Also we could consider populating this with the standard set of tags we include on resources, including ClusterSpec.CloudLabels

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

@olemarkus
Copy link
Member Author

I think there are additional IAM permissions needed. The docs mention ec2:CreateSnapshot which neither the control plane nor node roles have, and the ebs-snapshot-controller StatefulSet isn't currently restricted to just the control plane nodes. Perhaps this is a good opportunity to use the UseServiceAccountIAM feature flag functionality.

I like the IRSA concept, and I think strategically we should move in that direction, but is it mature enough to start depending on it now?

@olemarkus
Copy link
Member Author

Reading https://github.com/kubernetes-csi/external-snapshotter in more detail, it seems worthy having a dedicated addon for this. Including the snapshotter, CRDs + the webhook admission controller.

@olemarkus
Copy link
Member Author

olemarkus commented Dec 20, 2020

Looking through https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/container-storage-interface.md and the docs of the various plugins, I wonder if for kOps, it makes most sense to:

  • deploy the vendor CSI driver itself as a standalone addon. The controller running as a daemonset on masters
  • The snapshot controller as a standalone addon. It should also be a daemonset rather than a statefulset (the use of statefulset was somewhat inconsistent. The reason is to avoid running two of them at the same time. But the controller supports leader election, so for us a daemonset makes more sense. ). There should only be one installation of this controller per cluster. This addon should also contain the webhook and CRDs.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 21, 2020
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 22, 2020
@olemarkus
Copy link
Member Author

This is more or less done now.

Since we need a cert for the admission controller, this now blocked by #10321

@mikesplain
Copy link
Contributor

This is awesome, unfortunately I only found this PR after having set this up myself as well haha.

Only comment, would we want to also modify the storage class for this? For this to be usable you need to modify the provisioner: ebs.csi.aws.com in

I could see that being a follow up PR or a separate flag but just thought I'd mention it. Also an easy time to enable volume expansion by default.

@olemarkus
Copy link
Member Author

Updating the provisioner field is forbidden, so we need a new storageclass here. Or force-change the old one.

Do you know if changing this really is needed when CSIMigrationAWS feature gate is set?

@olemarkus
Copy link
Member Author

CSIMigrationAWS: Enables shims and translation logic to route volume operations from the AWS-EBS in-tree plugin to EBS CSI plugin. Supports falling back to in-tree EBS plugin if a node does not have EBS CSI plugin installed and configured. Requires CSIMigration feature flag enabled.
CSIMigrationAWSComplete: Stops registering the EBS in-tree plugin in kubelet and volume controllers and enables shims and translation logic to route volume operations from the AWS-EBS in-tree plugin to EBS CSI plugin. Requires CSIMigration and CSIMigrationAWS feature flags enabled and EBS CSI plugin installed and configured on all nodes in the cluster.

Looks like we should not create a new storage class. And we probably should set the CSIMigrationAWSComplete flag to ensure our setup works as expected.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: olemarkus

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 8, 2021
@olemarkus
Copy link
Member Author

I removed the snapshotter out of the PR now, so we shouldn't have any webhooks issues.
Also set the feature flag that will block the in-cluster EBS plugin so we 100% know the out-of-tree plugin is used.

@olemarkus olemarkus changed the title WIP: AWS CSI driver AWS CSI driver Jan 8, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 8, 2021
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 9, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 9, 2021
Copy link
Member

@rifelpet rifelpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments but looks good overall. We'll want docs for this as well as a release note. Being sure to mention the migration steps involving the feature gates, or just linking to wherever that process is documented.

k8s/crds/kops.k8s.io_clusters.yaml Show resolved Hide resolved
Comment on lines 221 to 220
if _, found := clusterSpec.Kubelet.FeatureGates["ExperimentalCriticalPodAnnotation"]; !found {
if b.IsKubernetesLT("1.16") {
clusterSpec.Kubelet.FeatureGates["ExperimentalCriticalPodAnnotation"] = "true"
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is duplicated from Line 210

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching!

@rifelpet
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 12, 2021
@k8s-ci-robot k8s-ci-robot merged commit a140168 into kubernetes:master Jan 12, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Jan 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/addons area/api cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants