New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
required security group ports depend on choice of network plugin #1218
Comments
I've been doing some thinking about this, and I wonder if the answer is for the network operator to manage the security groups. We're going to have to do something like this if we ever support migrating between SDN providers anyways. I believe the Kuryr team's approach is similar - they've moved almost all networking setup code out of Terraform and in to the network operator. This does mean that we need to give the network operator AWS credentials, but there are already other operators that do similar things Ideally we could abstract this away and create a desired list of open ports, and the machine-api-operator would handle this. However, we have a bootstrapping problem because the network needs to come up before the MAO in our current architecture. Are there long-term plans to move cloud creation out of the installer? |
and knowledge of how to open ports on each supported cloud platform... It seems weird to have cloud-specific knowledge in CNO itself. |
You may be able to add a port-opening abstraction under the cluster-API umbrella. But somebody is going to have to know how to do this on all platforms, so I don't know if we save much by shuffling around within the cluster. Moving from the installer into the cluster seems like something you'd need to support migrations. |
QE found an (annoyingly obvious) bug - we give users a knob in the network operator to change the vxlan port, which is critical in certain platforms an deployments. Of course, they can't actually test it, because all UDP ports are blocked except 4789. As a quick fix, I'm going to expand the "system services" range to also allow udp. But we will probably just need to bite the bullet and do this in the operator. This is looking like a bit of a mess. I'd like there to be a clean handoff point to the operator, which means a single security group for all machines. Right now there are entirely disjoint SGs for workers and masters. |
That feature is only needed on VMware though. We could just drop it for 4.0. (We didn't provide any way to open non-standard VXLAN ports on AWS in 3.x either.) |
On Feb 28, 2019 04:54, Casey Callendrello wrote:
Or have the operator create a new networking security group that it associates with all nodes? The installer could keep providing separate compute and control-plane groups for the other ports. |
Then the operator would need to know how to associate the security group with all nodes, which strikes me as something we don't want to do. |
I think @wking might have meant "have the installer create a new networking security group"? |
No, but that would work too. If we created it in Terraform (which we run after rendering our manifests), then we wouldn't have the ID to put into a manifest. But we could put a name- or tag-based filters into the manifest, the networking operator could look it up, and then just manage rules on that security group (while the usual machine-API tooling took care of keeping it associated with new nodes). |
It looks like https://github.com/openshift/cloud-credential-operator is what you're supposed to use. But it doesn't start until well after the network is up, so we can't use it to help bring the network up. I guess the CNO already has |
Can we start with kube-system/aws-creds and after the ports are set up, drop them? We don't need them on an ongoing basis. |
If you have cluster-admin, can't you always drop them and then re-fetch them whenever you need them (e.g. for subsequent migrations)? In the future, you could try requesting a cred from the cred operator and fall back to grabbing kube-system/aws-creds, but you don't have to start there. |
@squeed so what's the consensus here? Time is getting short for 4.1. |
@wking so actually, we're going to want to distinguish masters and workers in the network plugin rules; eg, for ovn-kubernetes, we want to open the GENEVE port between all nodes, but the ovn-northd port should only be open from masters to masters, and the ovn-southd port should be open from masters-or-workers to masters. So maybe we should just modify the existing master and worker security groups rather than having a separate networking security group? If so, how can we reliably find those groups? It looks like right now we could fetch the |
The changes needed for network operators are not restricted to Security Group allowances only. At least for Calico it is desirable to be able to disable source/destination checks on masters and nodes. I wanted to bring this up to make sure the full scope of what needs to be managed/updated is considered. Sorry I'm not totally clear on how machine configuration is done/handled so please correct me if it seems I'm misunderstanding something. |
So, for 4.1 and 4.2, the plan is to stick with static VPC and SG management. There's just not enough time in the 4.2 cycle to change anything up. For 4.3, the openshift network operator will be managing the security group settings necessary for the overlay network to function. (The installer will still be expected to install the initial basic infrastructure). I'll make sure that the design includes the ability for third-party components to request open ports. As for source+destination port security, what would be a reasonable API for this? I do think it might have to be part of the installer, since it's a machine property and needs to be set on the control plane before the CNO comes up. |
So it looks like we have an consensus that the networking operators will be doing this. /close |
@abhinavdahiya: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Currently
data/data/aws/vpc/sg-{master,worker}.tf
hardcode the fact that nodes need UDP port 4789 (VXLAN) open between them for the SDN. But with ovn-kubernetes, they will instead need UDP port 6081 (Geneve aka "Generic Network Virtualization Encapsulation") for the pod network, as well as TCP ports 6641 and 6642 for communication between OVN daemons. Other network plugins might use other ports. So there will eventually need to be some sort of abstraction. (Or I guess we could just open all of those ports regardless of network plugin. I'm not sure if we care about supporting third-party plugins on AWS anyway.)The text was updated successfully, but these errors were encountered: