New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce Shared Security Group to Allow Traffic from Unlimited Number of ELBs. #26670
Comments
We hit the same limit. As it is now with the default AWS limitations you are able to to have only 50 services where you need to use ELB. Our workaround will be to increase the max. inbound rules to 100, but this can be only used if you do not have other EC2 instances with more than 2 SG per network interface. Hopefully there will be a fix before we hit ~100 ELB services. |
Have either of you tried out the setting DisableSecurityGroupIngress? From https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L379-L386 |
This is a pretty serious limitation we ran into today. Luckily we are fixing it by using |
It looks like this is already in the Go SDK; but doesn't look like it's hitting the correct methods to actually work. |
@Krylon360 I don't work on the codebase but it seems to me the code should work already. The code you've linked gets called by p.s. For anyone else who also like me didn't know how to set DisableSecurityGroupIngress , you can pass in |
Hey guys, I also was facing the same roadblock but was able to resolve it by editing the minion sg rules to allow traffic from kubernetes vpc CIDR and removing all other rules. Hope this helps. Regards, |
@prakash1991 , can you elaborate? |
We were able to resolve this issue.
We dug into the vendor aws-go-sdk and saw that the Cluster resources get
set a tag key "KubernetesCluster" and the clusterName as the value.
To prevent a SG per ELB, you have to tag the VPC Subnet (you would need to
tag all subnets associated with the Cluster, internal/private and
external/public), assigned to the Minions with the same
Name=KubernetesCluster,Value=clusterName tag.
Bryce Walter
…On Feb 28, 2017 9:55 AM, "cbluth" ***@***.***> wrote:
@prakash1991 <https://github.com/prakash1991> , can you elaborate?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#26670 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AATizumPtHfi2TUeXudbzFuhToCAshx5ks5rhFGSgaJpZM4IsGGR>
.
|
I have VPC subnets, nodes/minion tagged with the correct KubernetesCluster but still seeing our node SG getting an entry for each ELB that is created. @Krylon360 is there anything else I might be missing? |
We (same team as @hjacobs) are running again into this issue for all non http traffic. We use ingress for http traffic, but for Postgres it is not an option. |
@cbluth I think @prakash1991 manually modified the SG, which can be done as a workaround, because no one is resetting the SG. A simple manual delete works, but is not a solution for Kubernetes. |
For the record, you could use the following configuration change to fix your "Too many ELBs: RulesPerSecurityGroupLimitExceeded" issue: https://github.com/zalando-incubator/kubernetes-on-aws/pull/390/files DISCLAIMER: we do not use it in production yet, so please make sure you understand the change and test it. |
/sig aws |
@justinsb has any work been done on this? |
@chrislovecnm is this relevant to kops? (re: #26670 (comment)) |
I've observed something else. My cluster has SG entries for ELBs for services that no longer exist. It seems in my case I'm left with dangling ELBs and (perhaps) thus dangling SG entries. |
We have an issue whereby we deploy new versions of services from our CI pipeline frequently. We delete services and recreate them as part of this process. ELBs, SGs, Rules and network IFs don't get deleted and we end up hitting our AWS account limits very quickly forcing manual deletion of these via the AWS console. Can anyone point me at which logs I should be looking at to see what might be going wrong during the deletes? |
@jeb5-ccl You should have a look into the logs of the controller-manager. |
This is supported through the cloud controller manager configuration. You can name a single group to be used now |
Glad to see this is addressed (as alpha) in 1.8. Does anyone have a programmatic work around on legacy cluster versions (K8S 1.5 or older)? |
@lypht not sure what you exactly mean, but we are happy with https://github.com/zalando-incubator/kube-ingress-aws-controller/ |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Hi, does anyone knows how to workaround this issue on AWS? we have a lot of TCP based apps and no HTTP and we can't use ingress. |
@racyber if you are using kops then you can apply this setting:
Kubernetes will still create security group per ELB but won't add that to Node security group. |
@hsingh6764 just found the documentation today! thanks! https://github.com/kubernetes/kops/blob/release-1.9/docs/cluster_spec.md#cloudconfig |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten |
@hsingh6764 @racyber |
@luckymagic7 it is part of kubernetes setup not these objects. |
@hsingh6764 Many Thanks!! I created a new cluster and it works fine^^ |
Hi, any more word on this? I'm creating NLBs (for TCP services), and getting issues around too many rules for my worker security groups. Is there any way to reference a specific security group for a NLB? |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
For reference this has been implemented #62774 |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Currently, Kubernetes adds every ELB’s security group rule to instances, which means the number of rules included in instance’s security group grows as the number of ELB grows.
AWS supports up to 50 inbound rules per security group and 5 security groups per network interface. http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Appendix_Limits.html#vpc-limits-security-groups
With this AWS limit and the current setup of Kubernetes, there is a hard limit of 250 services for each Kubernetes cluster. This problem arises mainly because every ELB creation results in a new rule added to every instance security group.
We can resolve this issue by introducing a shared security group per Kubernetes cluster. A simple solution is to modify codes in aws.go such that when ELB is created, it either finds or creates a shared security group with no rule and attaches it to ELB. Also, instead of adding the ELB’s own security group rule to instances, it tests if each instance already has the shared security group id as one of the source group id. Here, we also have to make sure each instance accepts every traffic coming from the shared security group source. If it finds that some instances do not have the shared security group id added yet, it does so.
With this revision, the number of security group rules for instances becomes independent of the number of ELBs, which means the number of service it can support is not limited by the AWS limit.
The text was updated successfully, but these errors were encountered: