New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS node/instance security group - misuse of cluster tag #73906

Open
Benjamin-Dobell opened this Issue Feb 11, 2019 · 6 comments

Comments

Projects
None yet
3 participants
@Benjamin-Dobell
Copy link

Benjamin-Dobell commented Feb 11, 2019

What happened:

Load balancer -> instance inbound security rules not configured on instances.

W0211 00:02:32.503572       1 aws.go:3627] Error opening ingress rules for the load balancer to the instances: "Multiple tagged security groups found for instance i-0ba118132440e527f; ensure only the k8s security group is tagged; the tagged groups were sg-0a47f194362fa26a4(nodes.staging.kops.snaploader.com) sg-04b52fd571603668f(ftp.nodes.staging.kops.snaploader.com) "
E0211 00:02:32.503805       1 service_controller.go:219] error processing service ingress-nginx/ingress-nginx (will retry): failed to ensure load balancer for service ingress-nginx/ingress-nginx: Multiple tagged security groups found for instance i-0ba118132440e527f; ensure only the k8s security group is tagged; the tagged groups were sg-0a47f194362fa26a4(nodes.staging.kops.snaploader.com) sg-04b52fd571603668f(ftp.nodes.staging.kops.snaploader.com)

What you expected to happen:

The instances to have their security rules configured correctly.

How to reproduce it (as minimally and precisely as possible):

On AWS backed k8s cluster:

  1. Create an additional security group for your instances and tag it as owned by the cluster. key=kubernetes.io/cluster/<CLUSTER_NAME, value=owned.

  2. Define an ingress resource with an AWS ELB load balancer (the specifics are not important).

Anything else we need to know?:

The semantics for cluster tag, and its value, in particular "owned" are well defined:

    // ResourceLifecycleOwned is the value we use when tagging resources to indicate
    // that the resource is considered owned and managed by the cluster,
    // and in particular that the lifecycle is tied to the lifecycle of the cluster.
    ResourceLifecycleOwned = "owned"

i.e. we need to tag our resources such that k8s knows to clean them up when our cluster is destroyed.

However, the cloud provider is reusing the cluster tag for an unrelated purpose i.e. Determining the primary security group of an instance:

// If users create multiple SGs, they must tag one of them as being k8s owned

Presumably, this is a not what the tag was intended for; it appears to be semantically inaccurate to use it in this fashion. It has the unintended side-effect that we can't have k8s automatically clean up multiple security groups (that are attached to instances).

Tags are cheap; the k8s AWS provider integration should be using an additional tag to identify the primary security group. This can be implemented in a backwards compatible fashion i.e. Look for the new semantically accurate "primary" tag; if it doesn't exist fall back to using the semantically inaccurate "owned" tag.

Environment:

  • Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.2", GitCommit:"cff46ab41ff0bb44d8584413b598ad8360ec1def", GitTreeState:"clean", BuildDate:"2019-01-13T23:16:58Z", GoVersion:"go1.11.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.5", GitCommit:"51dd616cdd25d6ee22c83a858773b607328a18ec", GitTreeState:"clean", BuildDate:"2019-01-16T18:14:49Z", GoVersion:"go1.10.7", Compiler:"gc", Platform:"linux/amd64"}

  • Cloud provider or hardware configuration:

AWS

  • OS (e.g. from /etc/os-release):

Debian GNU/Linux 9 (stretch)

  • Kernel (e.g. uname -a):

Linux ip-172-20-60-92 4.9.0-7-amd64 #1 SMP Debian 4.9.110-3+deb9u2 (2018-08-13) x86_64 GNU/Linux

  • Install tools:

Kops 1.11.0

@Benjamin-Dobell

This comment has been minimized.

Copy link
Author

Benjamin-Dobell commented Feb 11, 2019

@kubernetes/sig-aws-bugs

... One of these days I'll work out how to tag sigs 😉

@k8s-ci-robot k8s-ci-robot added sig/aws and removed needs-sig labels Feb 11, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 11, 2019

@Benjamin-Dobell: Reiterating the mentions to trigger a notification:
@kubernetes/sig-aws-bugs

In response to this:

@kubernetes/sig-aws-bugs

... One of these days I'll work out how to tag sigs 😉

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@zetaab

This comment has been minimized.

Copy link
Member

zetaab commented Feb 11, 2019

this is kops issue not k8s? so this is reported in wrong repo?

@Benjamin-Dobell

This comment has been minimized.

Copy link
Author

Benjamin-Dobell commented Feb 12, 2019

@zetaab I've linked to the problematic code within this repository in my OP. Once the AWS cloud provider is updated to be aware of a new tag that signifies which security group is the "primary" security group, then Kops will need to be updated.

So the issue within this repository needs to be resolved first. Then I will open an issue PR against Kops to support the new tag. EKS will likely want to follow suit.

@Benjamin-Dobell

This comment has been minimized.

Copy link
Author

Benjamin-Dobell commented Feb 12, 2019

Hmm, I see there's a separate AWS cloud provider repository.

I'll open the issue against it as well. Presumably the changes would need to be merged (pkg updated) here as well. So I'll leave this issue open.

@Benjamin-Dobell Benjamin-Dobell changed the title AWS node/instance security group - abuse of k8s "owned" tag AWS node/instance security group - misuse of cluster tag Feb 12, 2019

@Benjamin-Dobell

This comment has been minimized.

Copy link
Author

Benjamin-Dobell commented Feb 12, 2019

I've updated this issue to indicate that it's the "cluster tag" (kubernetes.io/cluster/<CLUSTER_NAME>) that's having its semantics mistreated - technically it need not be "owned" (it could be shared).

It's worth noting that the legacy cluster tag KubernetesCluster must also not be present.

func (t *awsTagging) hasClusterTag(tags []*ec2.Tag) bool {

I'd be happy to submit a PR to resolve this issue, but seems as I'm talking about introducing an additional tag I suspect there ought to be a discussion around this.

P.S. I'll be providing updates in this repo until informed otherwise, as the other AWS cloud provider repository seems to simply be a mirror i.e. This is the correct place for this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment