Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data/aws/vpc: Restrict ICMP ingress to our security groups #2412

Closed
wants to merge 1 commit into from

Conversation

wking
Copy link
Member

@wking wking commented Sep 26, 2019

And in the UPI CloudFormation templates too.

We've allowed ICMP ingress for OpenStack since 6f76298 (coreos/tectonic-installer#1), which did not motivate the ICMP ingress. Allowing ICMP ingress for AWS dates back to b620c16 (coreos/tectonic-installer#264). The master rule was restricted to the VPC in e7bd29a (coreos/tectonic-installer#2147). And the worker rules was restricted to the VPC in e131a74 (#1550) in-cluster ICMP ingress.

There are reasons to allow in-cluster ICMP, including Path MTU Discovery (PMTUD). Folks also use ping to [troubleshoot connectivity][4]. Restricting this to in-cluster security groups will avoid exposing ICMP ports to siblings living in shared VPCs, as we move towards allowing the installer to launch clusters in a pre-existing VPC. It might also block ICMP ingress from our load balancers, where we probably want PMTUD and possibly other ICMP calls. I'm not sure if there's a convenient way to allow access from the load-balancers while excluding it from sibling clusters that share the same VPC, but this commit is my attempt to get that.

CC @cuppett, @squeed

@openshift-ci-robot openshift-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 26, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 26, 2019
@wking wking force-pushed the awc-icmp-ingress-restriction branch from 018c446 to a213842 Compare September 26, 2019 04:34
@abhinavdahiya
Copy link
Contributor

personally i think ICMP is okay in the virtual network, it is an important debugging tool and would block ping from a bastion not part of the cluster.

would like to see @cuppett and @squeed 's opinion.

@squeed
Copy link
Contributor

squeed commented Sep 26, 2019

I'd rather not block ICMP. Blocking it for non-Internet-exposed hosts has essentially no security benefit, and comes at great cost of usability. As you identified, PMTUD (which is critical, as AWS has non-1500-MTU) is one casualty of blocking ICMP.

If, nevertheless, end-users demand this, then we should still allow the types+codes specified in http://shouldiblockicmp.com/ throughout the VPC.

It might also block ICMP ingress from our load balancers...

Fortunately, AWS is clever enough and auto-adds a SG always allowing ICMP between the VPC and a xLB.

@wking
Copy link
Member Author

wking commented Sep 26, 2019

Fortunately, AWS is clever enough and auto-adds a SG always allowing ICMP between the VPC and a xLB.

This was my concern. With this in place, why would we not want this PR? To ping from a bastion, you could put your bastion in the worker group. I see no need to support ping, PMTUD, etc. between machines in our cluster and unrelated machines in the same VPC. They should be coming in through LBs like external users (although they could use internal LBs too).

And in the UPI CloudFormation templates too.

We've allowed ICMP ingress for OpenStack since 6f76298 (OpenStack
prototype, 2017-02-16, coreos/tectonic-installer#1), which did not
motivate the ICMP ingress.  Allowing ICMP ingress for AWS dates back
to b620c16 (modules/aws: tighten security groups, 2017-04-19,
coreos/tectonic-installer#264).  The master rule was restricted to the
VPC in e7bd29a (modules/aws/vpc - Better security for master nodes,
2017-10-16, coreos/tectonic-installer#2147).  And the worker rules was
restricted to the VPC in e131a74 (aws: fix ICMP ACL, 2019-04-08, openshift#1550),
before which a typo had blocked all ICMP ingress.

There are reasons to allow in-cluster ICMP, including Path MTU
Discovery (PMTUD) [1,2,3].  Folks also use ping to troubleshoot
connectivity [4].  Restricting this to in-cluster security groups will
avoid exposing ICMP ports to siblings living in shared VPCs, as we
move towards allowing the installer to launch clusters in a
pre-existing VPC.  It might also block ICMP ingress from our load
balancers, where we probably want PMTUD and possibly other ICMP calls.
I'm not sure if there's a convenient way to allow access from the
load-balancers while excluding it from sibling clusters that share the
same VPC, but this commit is my attempt to get that.

[1]: http://shouldiblockicmp.com/
[2]: https://tools.ietf.org/html/rfc1191
[3]: https://tools.ietf.org/html/rfc8201
[4]: https://bugzilla.redhat.com/show_bug.cgi?id=1689857#c2
@wking wking force-pushed the awc-icmp-ingress-restriction branch from a213842 to aff01b9 Compare September 26, 2019 16:13
@wking
Copy link
Member Author

wking commented Sep 26, 2019

Fortunately, AWS is clever enough and auto-adds a SG always allowing ICMP between the VPC and a xLB.

Do you have a doc reference for that? The classic, application and network LB docs make it sound like you need to set these up manually.

I've pushed a213842 -> aff01b9 to fix the tf-fmt issue.

@wking
Copy link
Member Author

wking commented Sep 26, 2019

Hmm, here is Terraform saying that security_group is only valid for application load balancers and classic load balancers. So maybe our network load balancers get some auto-added magic, but we should check to see if the Kubernetes AWS cloud provider has security-group magic for its classic LBs.

@abhinavdahiya
Copy link
Contributor

To ping from a bastion, you could put your bastion in the worker group.

i don't think the users should be required to add the bastian to worker security group, that has more privilege access... than just ping.

@wking
Copy link
Member Author

wking commented Sep 26, 2019

i don't think the users should be required to add the bastian to worker security group, that has more privilege access... than just ping.

They could certainly create a new security group to allow pings. And really, I think the best test for node connectivity is oc get nodes through the Kubernetes API, falling back to SSH access. I don't see a need to support out-of-cluster pings.

@jstuever
Copy link
Contributor

Looks like you are allowing workers to masters and masters to workers... but not masters to masters or workers to workers?

@cuppett
Copy link
Member

cuppett commented Oct 1, 2019

Restricting ping-of-deaths to be an intra-cluster/SDN issue only by default has my vote. This will be a thing when we have multiple clusters in the same VPC (either UPI today or the feature tomorrow).

We don't need ping from ELBs. If allowed, I'd do it just WorkerSecurityGroup <-> WorkerSecurityGroup and Master<->Master and Master -> Worker. I wouldn't allow Worker->Master for same ping-of-death from workload killing control plane reasons. (Not sure if you can control via network policies similar kinds of things within the SDN as well for our control plane operators.)

@openshift-ci-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun them all:

Test name Commit Details Rerun command
ci/prow/e2e-aws aff01b9 link /test e2e-aws
ci/prow/e2e-aws-scaleup-rhel7 aff01b9 link /test e2e-aws-scaleup-rhel7
ci/prow/e2e-aws-disruptive aff01b9 link /test e2e-aws-disruptive

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@jstuever
Copy link
Contributor

I might be confused here... I see masters can ping workers, and workers can ping masters. Is it somehow inherent that masters can ping masters and workers can ping workers?

IMHO it's better to implement this to improve cluster isolation even if it doesn't fix the internal cluster ping ddos issue mentioned above.

@abhinavdahiya
Copy link
Contributor

So i think we are dropping improving isolation wrt ICMP in a network, and keeping things as-is. Unless we see reports of users requesting that.

/close

@openshift-ci-robot
Copy link
Contributor

@abhinavdahiya: Closed this PR.

In response to this:

So i think we are dropping improving isolation wrt ICMP in a network, and keeping things as-is. Unless we see reports of users requesting that.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@wking wking deleted the awc-icmp-ingress-restriction branch November 7, 2019 02:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants