required security group ports depend on choice of network plugin #1218

danwinship · 2019-02-08T15:08:12Z

Currently data/data/aws/vpc/sg-{master,worker}.tf hardcode the fact that nodes need UDP port 4789 (VXLAN) open between them for the SDN. But with ovn-kubernetes, they will instead need UDP port 6081 (Geneve aka "Generic Network Virtualization Encapsulation") for the pod network, as well as TCP ports 6641 and 6642 for communication between OVN daemons. Other network plugins might use other ports. So there will eventually need to be some sort of abstraction. (Or I guess we could just open all of those ports regardless of network plugin. I'm not sure if we care about supporting third-party plugins on AWS anyway.)

The text was updated successfully, but these errors were encountered:

squeed · 2019-02-21T10:33:15Z

I've been doing some thinking about this, and I wonder if the answer is for the network operator to manage the security groups.

We're going to have to do something like this if we ever support migrating between SDN providers anyways. I believe the Kuryr team's approach is similar - they've moved almost all networking setup code out of Terraform and in to the network operator.

This does mean that we need to give the network operator AWS credentials, but there are already other operators that do similar things

Ideally we could abstract this away and create a desired list of open ports, and the machine-api-operator would handle this. However, we have a bootstrapping problem because the network needs to come up before the MAO in our current architecture.

Are there long-term plans to move cloud creation out of the installer?

danwinship · 2019-02-21T13:45:11Z

This does mean that we need to give the network operator AWS credentials

and knowledge of how to open ports on each supported cloud platform... It seems weird to have cloud-specific knowledge in CNO itself.

wking · 2019-02-21T16:20:47Z

and knowledge of how to open ports on each supported cloud platform... It seems weird to have cloud-specific knowledge in CNO itself.

You may be able to add a port-opening abstraction under the cluster-API umbrella. But somebody is going to have to know how to do this on all platforms, so I don't know if we save much by shuffling around within the cluster. Moving from the installer into the cluster seems like something you'd need to support migrations.

squeed · 2019-02-28T12:54:18Z

QE found an (annoyingly obvious) bug - we give users a knob in the network operator to change the vxlan port, which is critical in certain platforms an deployments. Of course, they can't actually test it, because all UDP ports are blocked except 4789.

As a quick fix, I'm going to expand the "system services" range to also allow udp. But we will probably just need to bite the bullet and do this in the operator.

This is looking like a bit of a mess. I'd like there to be a clean handoff point to the operator, which means a single security group for all machines. Right now there are entirely disjoint SGs for workers and masters.

danwinship · 2019-02-28T14:39:09Z

That feature is only needed on VMware though. We could just drop it for 4.0. (We didn't provide any way to open non-standard VXLAN ports on AWS in 3.x either.)

wking · 2019-02-28T15:40:28Z

On Feb 28, 2019 04:54, Casey Callendrello wrote:

I'd like there to be a clean handoff point to the operator, which means a single security group for all machines. Right now there are entirely disjoint SGs for workers and masters.

Or have the operator create a new networking security group that it associates with all nodes? The installer could keep providing separate compute and control-plane groups for the other ports.

squeed · 2019-02-28T16:39:59Z

Or have the operator create a new networking security group that it associates with all nodes? The installer could keep providing separate compute and control-plane groups for the other ports.

Then the operator would need to know how to associate the security group with all nodes, which strikes me as something we don't want to do.

danwinship · 2019-02-28T19:44:15Z

I think @wking might have meant "have the installer create a new networking security group"?

wking · 2019-02-28T23:56:31Z

I think @wking might have meant "have the installer create a new networking security group"?

No, but that would work too. If we created it in Terraform (which we run after rendering our manifests), then we wouldn't have the ID to put into a manifest. But we could put a name- or tag-based filters into the manifest, the networking operator could look it up, and then just manage rules on that security group (while the usual machine-API tooling took care of keeping it associated with new nodes).

danwinship · 2019-04-10T15:27:30Z

This does mean that we need to give the network operator AWS credentials, but there are already other operators that do similar things

It looks like https://github.com/openshift/cloud-credential-operator is what you're supposed to use. But it doesn't start until well after the network is up, so we can't use it to help bring the network up.

I guess the CNO already has cluster-admin, so it can just request the kube-system/aws-creds Secret itself and then use that.

pecameron · 2019-04-10T20:48:14Z

Can we start with kube-system/aws-creds and after the ports are set up, drop them? We don't need them on an ongoing basis.

wking · 2019-04-10T23:20:17Z

Can we start with kube-system/aws-creds and after the ports are set up, drop them? We don't need them on an ongoing basis.

If you have cluster-admin, can't you always drop them and then re-fetch them whenever you need them (e.g. for subsequent migrations)? In the future, you could try requesting a cred from the cred operator and fall back to grabbing kube-system/aws-creds, but you don't have to start there.

dcbw · 2019-04-24T02:04:34Z

@squeed so what's the consensus here? Time is getting short for 4.1.

danwinship · 2019-05-07T16:51:58Z

@wking so actually, we're going to want to distinguish masters and workers in the network plugin rules; eg, for ovn-kubernetes, we want to open the GENEVE port between all nodes, but the ovn-northd port should only be open from masters to masters, and the ovn-southd port should be open from masters-or-workers to masters. So maybe we should just modify the existing master and worker security groups rather than having a separate networking security group?

If so, how can we reliably find those groups? It looks like right now we could fetch the Infrastructure.config.openshift.io object, look at its infrastructureName, and then search for security groups tagged with Key=Name and Value=${infrastructureName}-master-sg or Value=${infrastructureName}-worker-sg, but is that guaranteed to keep working?

tmjd · 2019-05-24T15:33:45Z

The changes needed for network operators are not restricted to Security Group allowances only. At least for Calico it is desirable to be able to disable source/destination checks on masters and nodes.
For nodes I hope this will be possible through updating the AWSMachineProviderConfig (with configuration options that are not yet available) but for the masters I am less sure if that is an option. At least the source/destination checks can be disabled while an instance is running but are there any cases where it is not?

I wanted to bring this up to make sure the full scope of what needs to be managed/updated is considered.

Sorry I'm not totally clear on how machine configuration is done/handled so please correct me if it seems I'm misunderstanding something.

squeed · 2019-05-24T16:09:29Z

So, for 4.1 and 4.2, the plan is to stick with static VPC and SG management. There's just not enough time in the 4.2 cycle to change anything up.

For 4.3, the openshift network operator will be managing the security group settings necessary for the overlay network to function. (The installer will still be expected to install the initial basic infrastructure). I'll make sure that the design includes the ability for third-party components to request open ports.

As for source+destination port security, what would be a reasonable API for this? I do think it might have to be part of the installer, since it's a machine property and needs to be set on the control plane before the CNO comes up.

abhinavdahiya · 2019-11-05T18:55:38Z

So it looks like we have an consensus that the networking operators will be doing this.

/close

openshift-ci-robot · 2019-11-05T18:55:40Z

@abhinavdahiya: Closing this issue.

In response to this:

So it looks like we have an consensus that the networking operators will be doing this.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

danwinship mentioned this issue Feb 13, 2019

WIP initial OVN support openshift/cluster-network-operator#99

Closed

eparis added the platform/aws label Feb 20, 2019

danwinship mentioned this issue Apr 9, 2019

aws: allow GENEVE (6081) and OVN database ports (6641 & 6642) #1563

Merged

danwinship mentioned this issue Apr 23, 2019

Add helper script to open OVN ports on AWS openshift/cluster-network-operator#154

Merged

abhinavdahiya mentioned this issue May 24, 2019

Calico networking enables IPIP and BPG #1788

Closed

danwinship mentioned this issue Jun 26, 2019

aws: conditionally open either ovn-kubernetes or openshift-sdn ports #1905

Closed

openshift-ci-robot closed this as completed Nov 5, 2019

danwinship mentioned this issue Jul 6, 2020

Enhancement proposal for making service node port range configurable openshift/enhancements#396

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

required security group ports depend on choice of network plugin #1218

required security group ports depend on choice of network plugin #1218

danwinship commented Feb 8, 2019 •

edited

squeed commented Feb 21, 2019

danwinship commented Feb 21, 2019

wking commented Feb 21, 2019

squeed commented Feb 28, 2019

danwinship commented Feb 28, 2019

wking commented Feb 28, 2019

squeed commented Feb 28, 2019

danwinship commented Feb 28, 2019

wking commented Feb 28, 2019

danwinship commented Apr 10, 2019

pecameron commented Apr 10, 2019

wking commented Apr 10, 2019

dcbw commented Apr 24, 2019

danwinship commented May 7, 2019

tmjd commented May 24, 2019

squeed commented May 24, 2019

abhinavdahiya commented Nov 5, 2019

openshift-ci-robot commented Nov 5, 2019

required security group ports depend on choice of network plugin #1218

required security group ports depend on choice of network plugin #1218

Comments

danwinship commented Feb 8, 2019 • edited

squeed commented Feb 21, 2019

danwinship commented Feb 21, 2019

wking commented Feb 21, 2019

squeed commented Feb 28, 2019

danwinship commented Feb 28, 2019

wking commented Feb 28, 2019

squeed commented Feb 28, 2019

danwinship commented Feb 28, 2019

wking commented Feb 28, 2019

danwinship commented Apr 10, 2019

pecameron commented Apr 10, 2019

wking commented Apr 10, 2019

dcbw commented Apr 24, 2019

danwinship commented May 7, 2019

tmjd commented May 24, 2019

squeed commented May 24, 2019

abhinavdahiya commented Nov 5, 2019

openshift-ci-robot commented Nov 5, 2019

danwinship commented Feb 8, 2019 •

edited