Introduce Shared Security Group to Allow Traffic from Unlimited Number of ELBs. #26670

kevinkim9264 · 2016-06-02T00:12:42Z

Currently, Kubernetes adds every ELB’s security group rule to instances, which means the number of rules included in instance’s security group grows as the number of ELB grows.

AWS supports up to 50 inbound rules per security group and 5 security groups per network interface. http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Appendix_Limits.html#vpc-limits-security-groups

With this AWS limit and the current setup of Kubernetes, there is a hard limit of 250 services for each Kubernetes cluster. This problem arises mainly because every ELB creation results in a new rule added to every instance security group.

We can resolve this issue by introducing a shared security group per Kubernetes cluster. A simple solution is to modify codes in aws.go such that when ELB is created, it either finds or creates a shared security group with no rule and attaches it to ELB. Also, instead of adding the ELB’s own security group rule to instances, it tests if each instance already has the shared security group id as one of the source group id. Here, we also have to make sure each instance accepts every traffic coming from the shared security group source. If it finds that some instances do not have the shared security group id added yet, it does so.

With this revision, the number of security group rules for instances becomes independent of the number of ELBs, which means the number of service it can support is not limited by the AWS limit.

dgolja · 2016-09-19T09:41:56Z

We hit the same limit. As it is now with the default AWS limitations you are able to to have only 50 services where you need to use ELB.

Our workaround will be to increase the max. inbound rules to 100, but this can be only used if you do not have other EC2 instances with more than 2 SG per network interface.

Hopefully there will be a fix before we hit ~100 ELB services.

jswoods · 2016-10-04T16:21:00Z

Have either of you tried out the setting DisableSecurityGroupIngress? From https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L379-L386

hjacobs · 2016-12-13T14:52:21Z

This is a pretty serious limitation we ran into today. Luckily we are fixing it by using Ingress: zalando-incubator/kubernetes-on-aws#169

Krylon360 · 2017-02-06T18:13:25Z

It looks like this is already in the Go SDK; but doesn't look like it's hitting the correct methods to actually work.
https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L1836

dimpavloff · 2017-02-13T11:47:05Z

@Krylon360 I don't work on the codebase but it seems to me the code should work already. The code you've linked gets called by setSecurityGroupIngress https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L2730 with the ELB's SG arg.
The issue is to do with modifying the Node's SG, which happens in updateInstanceSecurityGroupsForLoadBalancer https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/aws/aws.go#L2757

p.s. For anyone else who also like me didn't know how to set DisableSecurityGroupIngress , you can pass in --cloud-config=<filepath> to the master components with contents matching https://godoc.org/gopkg.in/kubernetes/kubernetes.v1/pkg/cloudprovider/providers/aws#CloudConfig . I haven't yet confirmed whether this solves the issue

prakash1991 · 2017-02-15T07:19:41Z

Hey guys,

I also was facing the same roadblock but was able to resolve it by editing the minion sg rules to allow traffic from kubernetes vpc CIDR and removing all other rules.

Hope this helps.

Regards,
Prakash

cbluth · 2017-02-28T16:55:11Z

@prakash1991 , can you elaborate?

Krylon360 · 2017-02-28T22:49:05Z

We were able to resolve this issue. We dug into the vendor aws-go-sdk and saw that the Cluster resources get set a tag key "KubernetesCluster" and the clusterName as the value. To prevent a SG per ELB, you have to tag the VPC Subnet (you would need to tag all subnets associated with the Cluster, internal/private and external/public), assigned to the Minions with the same Name=KubernetesCluster,Value=clusterName tag. Bryce Walter

…

On Feb 28, 2017 9:55 AM, "cbluth" ***@***.***> wrote: @prakash1991 <https://github.com/prakash1991> , can you elaborate? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26670 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AATizumPtHfi2TUeXudbzFuhToCAshx5ks5rhFGSgaJpZM4IsGGR> .

rexc · 2017-03-23T06:09:06Z

I have VPC subnets, nodes/minion tagged with the correct KubernetesCluster but still seeing our node SG getting an entry for each ELB that is created.

@Krylon360 is there anything else I might be missing?

szuecs · 2017-04-27T09:34:55Z

We (same team as @hjacobs) are running again into this issue for all non http traffic. We use ingress for http traffic, but for Postgres it is not an option.

szuecs · 2017-05-02T12:18:12Z

@cbluth I think @prakash1991 manually modified the SG, which can be done as a workaround, because no one is resetting the SG. A simple manual delete works, but is not a solution for Kubernetes.

szuecs · 2017-05-02T14:21:22Z

For the record, you could use the following configuration change to fix your "Too many ELBs: RulesPerSecurityGroupLimitExceeded" issue:

https://github.com/zalando-incubator/kubernetes-on-aws/pull/390/files

DISCLAIMER: we do not use it in production yet, so please make sure you understand the change and test it.

chrislovecnm · 2017-06-19T17:17:13Z

/sig aws

chrislovecnm · 2017-06-19T17:18:06Z

@justinsb has any work been done on this?

ahawkins · 2017-06-23T17:48:41Z

To prevent a SG per ELB, you have to tag the VPC Subnet (you would need to
tag all subnets associated with the Cluster, internal/private and
external/public), assigned to the Minions with the same
Name=KubernetesCluster,Value=clusterName tag.

@chrislovecnm is this relevant to kops? (re: #26670 (comment))

ahawkins · 2017-06-23T17:51:18Z

I've observed something else. My cluster has SG entries for ELBs for services that no longer exist. It seems in my case I'm left with dangling ELBs and (perhaps) thus dangling SG entries.

jeb5-ccl · 2017-07-26T14:47:10Z

We have an issue whereby we deploy new versions of services from our CI pipeline frequently. We delete services and recreate them as part of this process. ELBs, SGs, Rules and network IFs don't get deleted and we end up hitting our AWS account limits very quickly forcing manual deletion of these via the AWS console. Can anyone point me at which logs I should be looking at to see what might be going wrong during the deletes?

szuecs · 2017-07-30T10:34:01Z

@jeb5-ccl You should have a look into the logs of the controller-manager.

chrislovecnm · 2017-10-30T18:34:04Z

This is supported through the cloud controller manager configuration. You can name a single group to be used now

lypht · 2017-11-10T18:32:01Z

Glad to see this is addressed (as alpha) in 1.8. Does anyone have a programmatic work around on legacy cluster versions (K8S 1.5 or older)?

szuecs · 2017-11-11T18:21:42Z

@lypht not sure what you exactly mean, but we are happy with https://github.com/zalando-incubator/kube-ingress-aws-controller/
We use one SG shared for all ALBs https://github.com/zalando-incubator/kube-ingress-aws-controller/blob/master/aws/adapter.go#L238
If you have any problems with using it, please let us know in an issue in the kube-ingress-aws-controller repository.

fejta-bot · 2018-02-09T19:21:46Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

racyber · 2018-04-15T08:54:50Z

Hi, does anyone knows how to workaround this issue on AWS? we have a lot of TCP based apps and no HTTP and we can't use ingress.

hsingh6764 · 2018-04-15T12:41:36Z

@racyber if you are using kops then you can apply this setting:

spec
  cloudConfig:
    disableSecurityGroupIngress: true

Kubernetes will still create security group per ELB but won't add that to Node security group.
You will have to add rule to your Node Security Group to allow all ELB to have access. (mostly adding your VPC CIDR)

racyber · 2018-04-15T16:09:01Z

@hsingh6764 just found the documentation today! thanks!

https://github.com/kubernetes/kops/blob/release-1.9/docs/cluster_spec.md#cloudconfig

fejta-bot · 2018-07-14T16:36:11Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-08-13T17:22:21Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

szuecs · 2018-08-13T20:23:30Z

/remove-lifecycle rotten

luckymagic7 · 2018-10-05T06:14:23Z

@hsingh6764 @racyber
Where shoud I apply that configuration?
I have yamlfile that kinds are Service, Deployment, HorizontalPodAutoscaler

hsingh6764 · 2018-10-05T12:05:54Z

@luckymagic7 it is part of kubernetes setup not these objects.

luckymagic7 · 2018-12-03T01:18:24Z

@hsingh6764 Many Thanks!! I created a new cluster and it works fine^^

ghost · 2019-02-04T23:08:25Z

Hi, any more word on this? I'm creating NLBs (for TCP services), and getting issues around too many rules for my worker security groups. Is there any way to reference a specific security group for a NLB?

fejta-bot · 2019-05-06T00:12:34Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

ghost · 2019-05-06T01:02:27Z

For reference this has been implemented #62774

fejta-bot · 2019-06-05T01:57:07Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2019-07-05T02:45:27Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2019-07-05T02:45:34Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

a-robinson added the team/cluster label Jun 2, 2016

hjacobs mentioned this issue Dec 13, 2016

Too many ELBs: RulesPerSecurityGroupLimitExceeded zalando-incubator/kubernetes-on-aws#195

Closed

mumoshu mentioned this issue Mar 29, 2017

New settings: nodeMonitorGracePeriod, disableSecurityGroupIngress for controller-manager, nodeStatusUpdateFrequency for worker kubelet kubernetes-retired/kube-aws#473

Merged

k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 31, 2017

gileshinchcliff mentioned this issue Jun 19, 2017

Services are over using security groups. kubernetes/kops#2721

Closed

k8s-ci-robot added the sig/aws label Jun 19, 2017

k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 19, 2017

llarsson mentioned this issue Nov 8, 2017

Improved handling of .spec.cloudConfig.disableSecurityGroupIngress kubernetes/kops#3796

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 9, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 9, 2018

spiffxp removed the team/cluster (deprecated - do not use) label Mar 15, 2018

srbry mentioned this issue Apr 27, 2018

Optionally disable creation of ingress rules on worker security groups cloudfoundry-incubator/kubo-release#147

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 14, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 13, 2018

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Aug 13, 2018

wolstena mentioned this issue Sep 27, 2018

Upgrade v0.10.2 (k8s 1.9.10) --> v0.11.0(k8s 1.10.5) kubernetes-retired/kube-aws#1458

Closed

ghost mentioned this issue Oct 19, 2018

AWS Security Group ELB Limit rancher/rancher#16212

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 6, 2019

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 5, 2019

k8s-ci-robot closed this as completed Jul 5, 2019

Introduce Shared Security Group to Allow Traffic from Unlimited Number of ELBs. #26670

Introduce Shared Security Group to Allow Traffic from Unlimited Number of ELBs. #26670

Comments

kevinkim9264 commented Jun 2, 2016

dgolja commented Sep 19, 2016

jswoods commented Oct 4, 2016

hjacobs commented Dec 13, 2016

Krylon360 commented Feb 6, 2017

dimpavloff commented Feb 13, 2017 • edited

prakash1991 commented Feb 15, 2017

cbluth commented Feb 28, 2017

Krylon360 commented Feb 28, 2017 via email

rexc commented Mar 23, 2017

szuecs commented Apr 27, 2017 • edited

szuecs commented May 2, 2017

szuecs commented May 2, 2017

chrislovecnm commented Jun 19, 2017

chrislovecnm commented Jun 19, 2017

ahawkins commented Jun 23, 2017

ahawkins commented Jun 23, 2017

jeb5-ccl commented Jul 26, 2017

szuecs commented Jul 30, 2017

chrislovecnm commented Oct 30, 2017

lypht commented Nov 10, 2017

szuecs commented Nov 11, 2017

fejta-bot commented Feb 9, 2018

racyber commented Apr 15, 2018

hsingh6764 commented Apr 15, 2018 • edited

racyber commented Apr 15, 2018

fejta-bot commented Jul 14, 2018

fejta-bot commented Aug 13, 2018

szuecs commented Aug 13, 2018

luckymagic7 commented Oct 5, 2018

hsingh6764 commented Oct 5, 2018

luckymagic7 commented Dec 3, 2018

ghost commented Feb 4, 2019 • edited by ghost

fejta-bot commented May 6, 2019

ghost commented May 6, 2019

fejta-bot commented Jun 5, 2019

fejta-bot commented Jul 5, 2019

k8s-ci-robot commented Jul 5, 2019

dimpavloff commented Feb 13, 2017 •

edited

szuecs commented Apr 27, 2017 •

edited

hsingh6764 commented Apr 15, 2018 •

edited

ghost commented Feb 4, 2019 •

edited by ghost