New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AWS NLB security group updates #68422

Merged
merged 2 commits into from Nov 29, 2018

Conversation

@kellycampbell
Copy link
Contributor

kellycampbell commented Sep 7, 2018

This corrects a problem where valid security group ports were removed
unintentionally when updating a service or when node changes occur.

What this PR does / why we need it:

There is a bug in the existing logic that causes valid security group port mappings to be removed incorrectly.

Which issue(s) this PR fixes:

Fixes #60825

Special notes for your reviewer:

This still needs some unit tests. The existing loadbalancer tests are very minimal. Any pointers to how to mock security group actions?

Release note:

Fix AWS NLB security group updates where valid security group ports were incorrectly removed
when updating a service or when node changes occur.

Root cause was an assumption that the list of security groups was actually a set.

Here's some psuedo code of the cause:

existingGroups = [g1:80, g1:80]  // <-- NOT a Set
desiredGroups = [g1:80]

portChanges = { desiredGroups : ports }
// portChanges = [g1:80 = true]

for g in existingGroups:
    desiredSet.Insert(portChanges[g] == true)
    // iter1: desiredSet = [ g1:80 ]
    // iter2: desiredSet = [ ]
    existingSet = portsForGroup() 
    // i1: existingSet = [ g1:80 ]
    // i2: existingSet = [ g1:80 ]

    alreadyExists = desiredSet.Intersection(existingSet)
    removeAlreadyExistsFromPortChanges()
    // i1: portChanges = []
    // i2: portChanges = []

    notDesired = existingSet.Minus(desiredSet)
    // i1: notDesired = []
    // i2: notDesired = [g1:80]
    setPortChangeToFalseForNotDesired
@kellycampbell

This comment has been minimized.

Copy link
Contributor Author

kellycampbell commented Sep 7, 2018

Really simple test case that reproduced the issue easily for me:

  1. Create two services with NLB load balancer - SG's get update to allow the node ports to 0.0.0.0/0
  2. Alter one of the services e.g. by adding a new label in the metadata
  3. Apply the alteration. The old controller logic removes its node port from the SG.
  4. You can continue toggling the SG just by applying label updates to the service.
@kellycampbell

This comment has been minimized.

Copy link
Contributor Author

kellycampbell commented Sep 11, 2018

/sig aws
/kind bug

@rosskukulinski

This comment has been minimized.

Copy link
Contributor

rosskukulinski commented Sep 26, 2018

/assign @kris-nova

@rosskukulinski

This comment has been minimized.

Copy link
Contributor

rosskukulinski commented Sep 26, 2018

@kris-nova related to 1.12/1.13 AWS goals: kubernetes/enhancements#423 (comment)

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Oct 7, 2018

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://git.k8s.io/community/CLA.md#the-contributor-license-agreement to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@geojaz

geojaz approved these changes Oct 10, 2018

Copy link
Member

geojaz left a comment

/lgtm

@Freyert

This comment has been minimized.

Copy link
Contributor

Freyert commented Nov 2, 2018

/test pull-kubernetes-e2e-kops-aws
/test pull-kubernetes-e2e-gce-100-performance

@Freyert

This comment has been minimized.

Copy link
Contributor

Freyert commented Nov 5, 2018

What's the procedure with merging this? I see it needs the lgtm label again, but does someone have to push the button afterward?

@acaire

This comment has been minimized.

Copy link

acaire commented Nov 7, 2018

Thanks @kellycampbell this is great. We're running private clusters in VPCs with multiple CIDRs and the health check ingress rules only factor in the main CIDR, not the CIDR for the relevant target group instances. I'm still stepping through the code to see where the CIDRs originate from, just wanted to give you a heads up as I was hoping to figure out a fix before this is merged.

kellycampbell and others added some commits Sep 7, 2018

Fix AWS NLB security group updates
This corrects a problem where valid security group ports were removed
unintentionally when updating a service or when node changes occur.

Fixes #60825, #64148

@kellycampbell kellycampbell force-pushed the kellycampbell:fix-nlb-secgroups branch from 3a94c65 to 996c37c Nov 11, 2018

@kellycampbell

This comment has been minimized.

Copy link
Contributor Author

kellycampbell commented Nov 15, 2018

/label priority/important-soon

@micahhausler I'd like to get this bugfix into the v1.13 milestone and release if possible.

Also I would like to work on backporting it to 1.12 and 1.11 since NLB doesn't work at all for us without this and we'd like to use one of the features NLB provides.

@kellycampbell

This comment has been minimized.

Copy link
Contributor Author

kellycampbell commented Nov 21, 2018

/assign @d-nishi

}
} else {
if clientTraffic {
klog.V(2).Infof("Removing rule for client MTU discovery from the network load balancer (%s) to instances (%s)", clientCidrs, instanceSecurityGroupID)
klog.V(2).Infof("Removing rule for client traffic from the network load balancer (%s) to instance (%s)", clientCidrs, instanceSecurityGroupID)
klog.V(2).Infof("Removing rule for client traffic from the network load balancer (%s) to instance (%s), port (%v)", clientCidrs, instanceSecurityGroupID, port)
}

This comment has been minimized.

@justinsb

justinsb Nov 28, 2018

Member

Is there an else block missing here?

OK to do in a follow-on PR

This comment has been minimized.

@kellycampbell

kellycampbell Nov 28, 2018

Author Contributor

Yeah, that looks like it was overlooked in the original code from @micahhausler f9445b9#diff-b390ae3de185e9f13a631a5c07b8f3ffR646

@justinsb

This comment has been minimized.

Copy link
Member

justinsb commented Nov 28, 2018

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Nov 28, 2018

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Nov 28, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb, kellycampbell, micahhausler

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 69bab0d into kubernetes:master Nov 29, 2018

18 checks passed

cla/linuxfoundation Freyert authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details

k8s-ci-robot added a commit that referenced this pull request Jan 10, 2019

Merge pull request #72749 from M00nF1sh/automated-cherry-pick-of-#684…
…22-upstream-release-1.12

Automated cherry pick of #68422: Fix AWS NLB security group updates
@brenix

This comment has been minimized.

Copy link

brenix commented Jan 11, 2019

Did this get cherry-picked to the 1.11 release? I see the cherry-pick for 1.12, but not 1.11. It would be nice if it was as the latest release kops supports is 1.11

@kellycampbell

This comment has been minimized.

Copy link
Contributor Author

kellycampbell commented Jan 11, 2019

Did this get cherry-picked to the 1.11 release? I see the cherry-pick for 1.12, but not 1.11. It would be nice if it was as the latest release kops supports is 1.11

I don't think this has been picked into 1.13 yet either.

@d-nishi

This comment has been minimized.

Copy link

d-nishi commented Jan 11, 2019

This has not been delivered yet.

Slated for 1.14 and is with @M00nF1sh

@M00nF1sh

This comment has been minimized.

Copy link
Contributor

M00nF1sh commented Jan 14, 2019

Did this get cherry-picked to the 1.11 release? I see the cherry-pick for 1.12, but not 1.11. It would be nice if it was as the latest release kops supports is 1.11

I didn't see any reason that this cannot be cherry-picked into v1.11 and v1.13, i'll create two PRs for them 😄

k8s-ci-robot added a commit that referenced this pull request Jan 17, 2019

Merge pull request #72981 from M00nF1sh/automated-cherry-pick-of-#684…
…22-upstream-release-1.11

Automated cherry pick of #68422: Fix AWS NLB security group updates

k8s-ci-robot added a commit that referenced this pull request Jan 18, 2019

Merge pull request #72982 from M00nF1sh/automated-cherry-pick-of-#684…
…22-upstream-release-1.13

Automated cherry pick of #68422: Fix AWS NLB security group updates
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment