Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network design idea: use a private network. #9

Closed
rbo opened this issue Jun 4, 2021 · 12 comments
Closed

Network design idea: use a private network. #9

rbo opened this issue Jun 4, 2021 · 12 comments
Assignees
Labels
kind/design Design decision lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@rbo
Copy link
Member

rbo commented Jun 4, 2021

Additional to #8, it's hard to secure the public interfaces of the Hetzner servers.

Idea A: Move SDN / OpenShift traffic to a private network. Add Hetzner you can attach the dedicated server to a vSwitch.

This would end up in this network setup:

image

Initially, I tried this kind of setup and failed.

Because of openshift installation use interfaces with a default gateway for main interface decision. We changed the default gw to 172.22.2.1 at vlan4000 interface to force to use the VLAN interface IP.

Additional we decided to complete disable public IP because bootstrap pick the first IP and this is the public one so bootstrap etcd member uses public API but all other nodes do not have access anymore to public IPs.

Source commit #4b7523ec11161c20e4a2e851e4f2e732185e96f1

@rbo
Copy link
Member Author

rbo commented Jun 4, 2021

Moving SDN/OVN to the secondary interface is not possible, RFE: Support Migration of OVN to a Secondary Cluster Host Interface planned for 4.9

Another related RFE: Multiple NIC Support for OVN-Kubernetes Deployments

@rbo rbo added the kind/design Design decision label Jun 4, 2021
@rbo
Copy link
Member Author

rbo commented Jun 4, 2021

Idea B:

image

It is close to, but with an own default router

network-overview-v2

@rbo rbo added the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jun 4, 2021
@rbo rbo removed the priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. label Jun 17, 2021
@sesheta
Copy link
Member

sesheta commented Oct 15, 2021

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@sesheta sesheta added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 15, 2021
@rbo
Copy link
Member Author

rbo commented Oct 18, 2021

/remove-lifecycle stale

4.9 will be released this month then we can plan to use the secondary interface thing.

@sesheta sesheta removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 18, 2021
@rbo rbo self-assigned this Oct 18, 2021
@rbo
Copy link
Member Author

rbo commented Oct 29, 2021

Bring up Idea A at our internal slack with sdn engineering, and ask for a solution. https://coreos.slack.com/archives/CDCP2LA9L/p1635522685150000

/cc @durandom

@durandom
Copy link
Member

durandom commented Nov 1, 2021

@rbo since 4.9 is out now. Would you recommend upgrading rick to 4.9 and try this solution?
Or would we rather deploy another 4.9 test cluster and try the solution there?

@rbo
Copy link
Member Author

rbo commented Nov 2, 2021

It looks like SDN-1813 is the wrong approach, we are not the only one with that kind of problem: https://issues.redhat.com/browse/OCPBUGSM-27829

@rbo since 4.9 is out now. Would you recommend upgrading rick to 4.9 and try this solution? Or would we rather deploy another 4.9 test cluster and try the solution there?

As far as I know, the upgrade path is only available in the unsupported candidate-4.9 channel. First, we have to cleanup the rick cluster, there are some operators in "in progressing" state.

@durandom
Copy link
Member

durandom commented Nov 4, 2021

@larsks can you open a ticket with RH support for this and add @rbo to it?

@larsks
Copy link

larsks commented Nov 4, 2021

@durandom I think someone else should probably manage the support ticket for this issue. It doesn't seem to touch on the MOC/BU environment, which is generally how I masquerade as a customer w/r/t support, and I'm not familiar with the hetzner environment at all. There needs to be someone else who is able to interact with the support system (or we should be pursuing this by directly interacting with the sdn team, rather than trying to treat it as a support issue -- which might be for the best because openshift support has been a mixed bag so far).

@rbo
Copy link
Member Author

rbo commented Nov 18, 2021

@sesheta
Copy link
Member

sesheta commented Feb 16, 2022

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@sesheta sesheta added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 16, 2022
@rbo
Copy link
Member Author

rbo commented Feb 25, 2022

The private network works pretty well:

image

PR #17 contains everything. I guess we can close this issue.

@rbo rbo closed this as completed Feb 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/design Design decision lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

4 participants