Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow NodeGroup/ASG targeting for ?LB instances #58

Closed
bassco opened this issue Sep 18, 2019 · 19 comments
Closed

Allow NodeGroup/ASG targeting for ?LB instances #58

bassco opened this issue Sep 18, 2019 · 19 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@bassco
Copy link

bassco commented Sep 18, 2019

What would you like to be added:

Our cluster has been created with kops.

We operate multiple instance groups in our cluster to partition the different workloads we run.

I would like an annotation on the aws-load-balancer for targeted instances that will be assigned to the ELB/ALB/NLB TargetGroup using a tag.

E.g.:

service.beta.kubernetes.io/aws-load-balancer-node-groups: "asg-node-group-comma-separated-list"

Where node-group-comma-separated-list is a list of the ASG groupName tags to filter the host instance ids that will be added to the Load Balancer.

E.g. In our environment our 4 node groups have a tag: aws:autoscaling:groupName set to

  • nodes
  • ml-cpu
  • ml-gpu
  • search

Using the above list of instance group ASG names; to target Pods of the ml-* ASGs I would use the following annotation.

service.beta.kubernetes.io/aws-load-balancer-node-groups: "ml-cpu,ml-gpu"

Currently, I suspect that the tag k8s.io/role/node with a value of 1 is used to populate the instances on the LB.

Why is this needed:

When creating an ELB/ALB or NLB - the complete node instance list associated with the cluster is assigned to the TargetGroup. It is really inefficient to perform health checks against instances that will never host a Pod of the service type you are creating the LB for.

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 18, 2019
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 17, 2019
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 16, 2020
@bassco
Copy link
Author

bassco commented Jan 17, 2020

Anyone able to read through this and provide feedback?

@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ari-becker
Copy link

/reopen

Not sure why this was allowed to close - this is a must-have feature for large clusters where the number of nodes in the cluster reaches the quota for the number of targets permitted per load balancer. Scaling further requires limiting the load balancer's targets to a specific subset of servers, and configuring the deployment/statefulset to only schedule on those nodes. Currently the only workaround is to create the loadbalancer by hand outside of Kubernetes.

@k8s-ci-robot
Copy link
Contributor

@ari-becker: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Not sure why this was allowed to close - this is a must-have feature for large clusters where the number of nodes in the cluster reaches the quota for the number of targets permitted per load balancer. Scaling further requires limiting the load balancer's targets to a specific subset of servers, and configuring the deployment/statefulset to only schedule on those nodes. Currently the only workaround is to create the loadbalancer by hand outside of Kubernetes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ari-becker
Copy link

@bassco as the author, do you mind re-opening?

@leakingtapan
Copy link

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Apr 21, 2020
@k8s-ci-robot
Copy link
Contributor

@leakingtapan: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ari-becker
Copy link

An additional gotcha that popped up for me:

Let's say that you have 450 nodes and a service that you'd like to expose with an external load balancer.
If you expose a single port on the service - a load balancer is created that targets all 450 servers, and it works.
If you expose multiple additional ports - despite the number of servers not changing, each additional port is considered a whole new target. So if you expose three ports you now have 3 * 450 = 1350 targets, which is above the limit, and AWS will simply refuse to add the listeners for the new ports, complaining about TooManyTargets.

@foobarfran
Copy link

This would be super useful.
I can contribute with a pull request for this feature if it helps

@bassco
Copy link
Author

bassco commented May 6, 2020

That would be fantastic if you could, @foobarfran

@leakingtapan
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 8, 2020
@foobarfran
Copy link

/assign @foobarfran

@foobarfran
Copy link

The feature for this issue is already merged in kubernetes/kubernetes#90943

/close

@k8s-ci-robot
Copy link
Contributor

@foobarfran: You can't close an active issue/PR unless you authored it or you are a collaborator.

In response to this:

The feature for this issue is already merged in kubernetes/kubernetes#90943

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@leakingtapan
Copy link

/close as the PR is merged in legacy provider

@bassco
Copy link
Author

bassco commented Jul 20, 2020

@foobarfran - legend!
Feature will be released in v1.19, for those that reach this comment and don't follow the MR

@bassco bassco closed this as completed Jul 20, 2020
csrwng pushed a commit to csrwng/cloud-provider-aws that referenced this issue Dec 19, 2023
…penshift-4.15-ose-aws-cloud-controller-manager

OCPBUGS-24135: Updating ose-aws-cloud-controller-manager-container image to be consistent with ART
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

6 participants