Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix additionalSecurityGroups support for NLB #10162

Merged
merged 1 commit into from
Nov 3, 2020

Conversation

rifelpet
Copy link
Member

@rifelpet rifelpet commented Nov 3, 2020

We were correctly adding the security groups to the master ASGs but identified them incorrectly.

ref: #10158 (comment)

We were correctly adding the security groups to the master ASGs but identified them incorrectly.
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 3, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rifelpet

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the area/provider/aws Issues or PRs related to aws provider label Nov 3, 2020
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 3, 2020
@rifelpet
Copy link
Member Author

rifelpet commented Nov 3, 2020

/cc @seh this should fix your most recent report

@k8s-ci-robot
Copy link
Contributor

@rifelpet: GitHub didn't allow me to request PR reviews from the following users: most, recent, report, seh, this, should, fix, your.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @seh this should fix your most recent report

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@seh
Copy link
Contributor

seh commented Nov 3, 2020

this should fix your most recent report

Thank you. Does this also require #10161 to work correctly?

@rifelpet
Copy link
Member Author

rifelpet commented Nov 3, 2020

If you're using terraform output, yes you'll need both commits for a test build

@hakman
Copy link
Member

hakman commented Nov 3, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 3, 2020
@k8s-ci-robot k8s-ci-robot merged commit 578920e into kubernetes:master Nov 3, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.20 milestone Nov 3, 2020
@seh
Copy link
Contributor

seh commented Nov 3, 2020

Thank you. This solved the name problem, but then I ran into this: When the master ASGs try to create new instances for the master machines, they fail with an error like this:

Launching a new EC2 instance. Status Reason: There is no ACTIVE Load Balancer named 'api-redacted-cluster-name--55chum'. Validating load balancer configuration failed.

I don't understand the complaint, because there is an NLB with that exact same name.

@seh
Copy link
Contributor

seh commented Nov 3, 2020

Ah, I think I see at least part of the problem: In the AWS Web console, inspecting one of these ASGs, it shows that it's still associated both with the previous Classic Load Balancer and a load balancer target group. Both the load balancer and the target group have the same name.

Looking at the generated Terraform configuration, the aws_autoscaling_group resources each mention one entry in their "target_group_arns" attribute. I don't see any mention there of the previous Classic Load Balancer. Running terraform plan doesn't threaten any changes.

k8s-ci-robot added a commit that referenced this pull request Nov 3, 2020
#10162-origin-release-1.19

Automated cherry pick of #10161: Move NLB's VPC CIDR security group rule logic into model #10162: Fix additionalSecurityGroups support for NLB
@seh
Copy link
Contributor

seh commented Nov 3, 2020

I think this may have to do with the manual removal of the Terraform aws_autoscaling_attachment resources that was necessary when continuing to use the Classic Load Balancer:

# Accommodate the migration to using inline attachments of ASGs to
# load balancers, per the required action documented here:
# https://github.com/rdrgmnzs/kops/blob/41adf07e15b1d4204647684e78ebde8bfcd41782/docs/releases/1.19-NOTES.md#required-actions:
for attachment in $(terraform state list | grep '^aws_autoscaling_attachment\.'); do
  terraform state rm "${attachment}"
done

I'll try removing that workaround—leaving those resources in place—in another test to see if we can get our ASGs working again.

@seh
Copy link
Contributor

seh commented Nov 3, 2020

That fixed that problem.

Next is that the "master" security group accepts inbound HTTPS traffic only from a CIDR block that looks to be related to my VPC, but the traffic in from the (new) NLB is apparently not coming from any addresses in that block.

Opening up HTTPS ingress from any IPv4 address (as a test) fixed that problem. I'm not sure if we can be any more particular about which addresses we'll see for the downstream NLB listeners opening connections to our master machines. @hakman, this sort of relates to #10142.

@rifelpet
Copy link
Member Author

rifelpet commented Nov 3, 2020

@seh the security group on the master should be configured to allow 443/TCP traffic from the VPC's CIDR block(s) as well as any CIDRs defined in the cluster's kubernetesAPIAccess list. The VPC CIDRs are for the NLB target groups' health checks as recommended by the AWS docs. There might be a less permissive way to allow that traffic, but we can handle that down the road.

What was the exact error you were experiencing that was fixed by opening up the security group to 0.0.0.0/0 ? I'm troubleshooting a connection refused issue in our end to end prow jobs for a related PR and I'm wondering if its related.

@seh
Copy link
Contributor

seh commented Nov 3, 2020

as well as any CIDRs defined in the cluster's kubernetesAPIAccess list

Oh, I had never noticed that field before. We're not using it today, and had been granting access from a CIDR block for our VPN—one we were able to attach to the Classic Load Balancer, but not the NLB. Only after I posted my message earlier and went out for a long walk did it occur to me that NLB is preserving client IP addresses (since we register our targets by instance ID, rather than by IP address), so the IP addresses I needed to authorize are from our VPN-related block.

What was the exact error you were experiencing that was fixed by opening up the security group to 0.0.0.0/0?

There was no error message, per se; all of my API server clients just hung trying to connect. It turned out that the NLB was passing the traffic through to any of the three master machines, but then their firewalls were blocking the connection.

I'll try specifying a CIDR block for the "kubernetesAPIAccess" field and report back on how I fare.

@seh
Copy link
Contributor

seh commented Nov 4, 2020

So far using the "kubernetesApiIAccess" field has fixed that problem. It turns out that we have another longstanding security group granting ingress to our master machines with this same CIDR block, but it's allowing port 80 instead of port 443! I presume that's a vestige of having terminated TLS previously at the Classic ELB, and not configuring use of HTTPS out the back.

@rifelpet rifelpet deleted the nlb-sg branch May 5, 2021 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/aws Issues or PRs related to aws provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants