Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aws-cloud-controller-manager config to addons #9704

Merged
merged 1 commit into from
Dec 2, 2020

Conversation

nckturner
Copy link
Contributor

@nckturner nckturner commented Aug 7, 2020

  • Added addon for aws-cloud-controller (aws-cloud-controller.addons.k8s.io)
  • Initial manifest: upup/models/cloudup/resources/addons/aws-cloud-controller.addons.k8s.io/k8s-1.18.yaml.template
  • --external-cloud-volume-plugin flag for KCM
  • AWSCCMTag to hold the logic to determine the CCM tag. This will change as soon as we have stable releases (soon).

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 7, 2020
@nckturner
Copy link
Contributor Author

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 7, 2020
@k8s-ci-robot k8s-ci-robot added area/addons size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 7, 2020
Version: fi.String(version),
Manifest: fi.String(location),
Selector: map[string]string{"k8s-addon": key},
KubernetesVersion: ">=1.18.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the correct behavior for k8s < 1.18? Previously it would install core.addons.k8s.io.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we should probably keep that behavior, as we don't have any published images pre-1.18.

serviceAccountName: cloud-controller-manager
containers:
- name: aws-cloud-controller-manager
image: gcr.io/k8s-staging-provider-aws/cloud-controller-manager:v1.18.0-alpha.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the version need to track that of Kubernetes? The cloud-provider-aws repo doesn't say what the version compatibility policy is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can most likely just copy the OpenStackCCMTag function in template_functions.go.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 11, 2020
@olemarkus
Copy link
Member

Need any help to progress on this?

@nckturner
Copy link
Contributor Author

@olemarkus so I ran into a ton of failures running e2e tests on clusters created with this change, and finally had a chance to dig in and discovered that the issue is that aws volumes need kubelet to start up with --cloud-provider=aws in order to pass. Otherwise, the volumes fail to be mounted.

Error: "MountVolume.SetUp failed for volume \"aws-j6cxz\" (UniqueName: \"kubernetes.io/aws-ebs/ddc37e26-a4ea-4800-b9cf-89f4c4ba2e06-aws-j6cxz\") pod \"aws-injector\" (UID: \"ddc37e26-a4ea-4800-b9cf-89f4c4ba2e06\") : mount failed: exit status 32\nMounting command: systemd-run\nMounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/ddc37e26-a4ea-4800-b9cf-89f4c4ba2e06/volumes/kubernetes.io~aws-ebs/aws-j6cxz --scope -- mount -o bind /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-west-2a/vol-0927658c7047d003e/var/lib/kubelet/pods/ddc37e26-a4ea-4800-b9cf-89f4c4ba2e06/volumes/kubernetes.io~aws-ebs/aws-j6cxz\nOutput: Running scope as unit: run-r0506619461a2436f9b6de03226e313b4.scope\nmount: /var/lib/kubelet/pods/ddc37e26-a4ea-4800-b9cf-89f4c4ba2e06/volumes/kubernetes.io~aws-ebs/aws-j6cxz: special device /var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/us-west-2a/vol-0927658c7047d003e does not exist.\n"

Interestingly, with --cloud-provider=external set, code paths using pkg/volume/awsebs/aws_ebs.go were being triggered, which was unexpected to me.

This issue looks like the same: kubernetes/kubernetes#71018. So possibly the options as the code stands today are either 1. use CSI or 2. run kubelet with --cloud-provider=aws? Note that I tested with 1.18.6.

@olemarkus
Copy link
Member

I think it makes sense to use CSI. It is something that would be interesting to support anyway.

@nckturner
Copy link
Contributor Author

@olemarkus I'd actually like to allow for both, as one use case for this change is testing upgrade scenarios. Right now this is configurable with:

  kubelet:
    cloudProvider: aws

@nckturner
Copy link
Contributor Author

The other KCM flag that I've exposed in this PR is the --external-cloud-volume-plugin which allows the KCM to run the existing cloud volume control loops even when the other cloud loops are disabled due to setting --cloud-provider=external.

})

// Ensure ExternalCloudVolumePlugin is set to AWS
b.Cluster.Spec.KubeControllerManager.ExternalCloudVolumePlugin = "aws"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of logic can be added in ./pkg/model/components. That way changes done are also subject to validation and such. There is already something similar in place for openstack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that's very helpful. Will add some validation here.

@olemarkus
Copy link
Member

Sounds good. It may be an idea to add a flag to CloudConfig controlling if the CSI addon is installed or not, and then ensure users are configuring the flags you mention correctly through validation.

@nckturner
Copy link
Contributor Author

/cc @andrewsykim

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 12, 2020
@nckturner nckturner changed the title WIP Add aws-cloud-controller-manager config to addons Add aws-cloud-controller-manager config to addons Oct 12, 2020
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 12, 2020
@nckturner
Copy link
Contributor Author

  • I've added a template function for the CCM tag: AWSCCMTag
  • I've added a propsed CSI API so that we can detect and at least log a message when the user has a configuration that will break volumes. I'm not sure if we want to overwrite the user's value for --external-cloud-volume-plugin or not, I think we shouldn't do it in this PR because CSI is not yet implemented. (I can also remove the proposed CSI API from this PR if we want to discuss that separately, I'm not planning on implementing it in this PR).


// CSI defines container storage interface configuration.
// +optional
CSI *CSIConfiguration `json:"csi,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other providers we have similar types of configuration under a struct in CloudConfiguration. AWS doesn't have its own struct here; now may be the time to add it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but CloudConfig seems to be tightly mapped to the actual config file that is placed on disk, which is deprecated according to documentation I found. Shouldn't we move on from that object? Also, it kindof combines/conflates various addons, like CCM, CSI, etc. (even though, yes they are cloud related, there are many combinations that users might want to use and combining all of the configuration makes it quite difficult to understand what configuration is configuring which addon).

@olemarkus
Copy link
Member

/retest

@nckturner
Copy link
Contributor Author

I think I missed the versioned v1alpha2 API, will update that now.

@olemarkus
Copy link
Member

And make apimachinery it seems. Running make verify && ./hack/verify-staticcheck.sh locally may be a faster feedback loop.

@justinsb
Copy link
Member

This looks really good. We discussed that probably the CSI fields are better in a separate PR, and that we can just leave the warning when running without CSI & without the CCM flag in the code for now but commented out (maybe just the CSI bit commented out?) until the CSI PR lands separately.

There won't be much in the codepath then unless the feature-flag is set, so I'm in favor of merging and getting this under testing.

Thanks @nckturner for this :-)

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 4, 2020
@nckturner
Copy link
Contributor Author

Updated the PR per @justinsb's comments. I decided to leave out docs at his suggestion since its early/feature gate protected, we will add them in a follow on PR. We can start with some basic usage docs in the aws-cloud-provider repository.

@nckturner
Copy link
Contributor Author

/test pull-kops-e2e-cni-weave

@nckturner
Copy link
Contributor Author

nckturner commented Nov 5, 2020

/hold
Fixing bug...

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 9, 2020
@olemarkus
Copy link
Member

/test pull-kops-e2e-cni-flannel

@olemarkus
Copy link
Member

/retest

@nckturner
Copy link
Contributor Author

@justinsb ready for another look

@nckturner
Copy link
Contributor Author

/hold cancel

Any chance we can merge this @justinsb? It will make testing the CCM easier..

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 25, 2020
@nckturner
Copy link
Contributor Author

/retest

- Config at aws-cloud-controller.addons.k8s.io/k8s-1.18.yaml.template
- AWSCCMTag function for CCM image tag
@justinsb
Copy link
Member

justinsb commented Dec 2, 2020

Sorry about the delay here, and thanks for doing this @nckturner !

This looks to be entirely additive and feature flagged, and I believe the current target for external cloud-controller-manager is 1.21, so I think it's important that we start to get this under testing (not least because my understanding is that it doesn't yet entirely work!).

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 2, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb, nckturner

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 2, 2020
@olemarkus
Copy link
Member

/retest

@k8s-ci-robot k8s-ci-robot merged commit 4435674 into kubernetes:master Dec 2, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Dec 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/addons area/api cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants