Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to specify mon endpoints when using host networking #12363

Closed
echel0n-TR opened this issue Jun 12, 2023 · 20 comments · Fixed by #13500
Closed

Ability to specify mon endpoints when using host networking #12363

echel0n-TR opened this issue Jun 12, 2023 · 20 comments · Fixed by #13500
Assignees
Labels

Comments

@echel0n-TR
Copy link

Is this a bug report or feature request?

  • Feature Request

What should the feature do:
Allow for specifying either a list of IPs or a subnet range that the mon endpoints are allowed to bind to, in the case event that nothing is specified in the Cluster CRD, default back to using the node's InternalIP, followed by the ExternalIP which is the current logic.

What is use case behind this feature:
Currently mon endpoints are used by the CSI to mount Ceph Filesystem volumes into the pod, having this feature would allow those mon endpoints to be bound to other networks instead of just to the Kubernetes cluster network, in our case our Kubernetes network is on a 10Gb switch, however, our Ceph public network uses a 40Gb switch which is idea for this scenario to be used for Ceph filesystem

Environment:
Kubernetes v1.27.1
Rook v1.11.7
Ceph v17.2.6

@travisn
Copy link
Member

travisn commented Jun 13, 2023

Rook currently gets the host IP from the addresses on the K8s Node, either the InternalIP or then falling back to the ExternalIP. If we don't have the host address on the K8s node, Rook needs to be told which IP to use for each node. I'm thinking in the CephCluster CR we would need a map of all hosts to IPs where mons may run, but still open to ideas.

@echel0n-TR
Copy link
Author

echel0n-TR commented Jun 13, 2023

In our case our nodes all have InternalIP's set but those relate to our k8s 10Gb network, we want to bind the mon endpoints to our 40Gb switch subnet instead, so there needs to be a way to specify in the CephCluster CRD a list of IPs or a subnet that Rook can use for the mon endpoints to bind to, Rook may need to create a IP address mapping of all the nodes first to determine where to place the mon endpoints dynamically, if nothing is specified, then have it default back to its original logic.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@travisn
Copy link
Member

travisn commented Aug 14, 2023

@echel0n-TR In your K8s nodes, what do you see in the status.addresses? What type do the desired addresses show? For example, in minikube I see the following. Do the desired addresses show up in the list with a different type?

 $ kubectl get node -o yaml | grep addresses -A 4
    addresses:
    - address: 10.0.2.15
      type: InternalIP
    - address: minikube
      type: Hostname

@echel0n-HX
Copy link

echel0n-HX commented Aug 14, 2023

@travisn below you can see what the output of the command you used but in our cluster, the IP address range we have assigned for ceph is 10.0.2.0/24 and is not shown in the output, instead we see our k8s IP address range.

$ kubectl get node -o yaml | grep addresses -A 4
    addresses:
    - address: 10.0.1.20
      type: InternalIP
    - address: k8s-w01
      type: Hostname

@travisn
Copy link
Member

travisn commented Aug 14, 2023

I see, it seems we would have to purely configure this based off some custom settings in the CephCluster CR.

@github-actions github-actions bot removed the wontfix label Aug 15, 2023
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@github-actions
Copy link

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 24, 2023
@echel0n-HX
Copy link

Any update on this request?

@travisn
Copy link
Member

travisn commented Oct 24, 2023

The design hasn't felt right, so this hasn't made progress. Here is another idea...

Instead of settings in the CephCluster CR, what about node labels? This feels like a topology question very similar to the OSD topology labels. For example, the flow could be:

  1. When starting a new mon, Rook schedules the mon canary pod (as already done today)
  2. If using host networking and the node has a label network.rook.io/mon-ip, Rook uses that IP address for the mon endpoints. If the node does not have the label, fall back to today's behavior.
  3. Start the mon daemon with the desired IP address.

@travisn travisn reopened this Oct 24, 2023
@github-actions github-actions bot removed the wontfix label Oct 25, 2023
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@echel0n-HX
Copy link

The design hasn't felt right, so this hasn't made progress. Here is another idea...

Instead of settings in the CephCluster CR, what about node labels? This feels like a topology question very similar to the OSD topology labels. For example, the flow could be:

  1. When starting a new mon, Rook schedules the mon canary pod (as already done today)
  2. If using host networking and the node has a label network.rook.io/mon-ip, Rook uses that IP address for the mon endpoints. If the node does not have the label, fall back to today's behavior.
  3. Start the mon daemon with the desired IP address.

this sounds like a more clean approach to the solution, when can that be implemented ?

@github-actions github-actions bot removed the wontfix label Dec 28, 2023
@BlaineEXE
Copy link
Member

I think the reason that we have the mons specifically bind to the k8s Node resource's IP is because it is guaranteed to be routable to the Rook operator. The Rook operator pod isn't guaranteed to know how to reach the ceph public network to query mon health status, which means Rook could then create a cluster that it doesn't know how to manage.

I think what you are suggesting would work, but we need some time to consider whether there are usage patterns where users would be able to shoot themselves in the foot easily by adding this option in.

But the Rook is implemented today, the Ceph cluster should still be working even with mons on the k8s node IP, and there should be no bandwidth loss. The mons aren't directly in the Ceph data path for any ceph client. Mons are only contacted when the client makes initial contact, then the client communicates to OSDs (or MDSes), so bandwidth isn't lost to the Ceph cluster by the mons communicating on a different network.

Both of these reasons are why it hasn't been a high priority to fix the situation. There isn't really enough downside to force our hand into doing the surprisingly large amount of work needed to plan through all of the use-cases and eventualities.

@travisn
Copy link
Member

travisn commented Jan 4, 2024

@echel0n-HX With #13500 this change is implemented. I verified in minikube that I could apply an IP that was specified by the annotation, but I don't really know if it will work for your networked environment. Could you test this change? I've pushed an image with this change.

  1. Set the image travisn/ceph:custom-mon-ip as your Rook operator image
  2. Apply the annotation to the node(s) where the mons will be created: network.rook.io/mon-ip: <IPAddress>
  3. Create the cluster
  4. Verify that the mons form quorum with the desired IPs and the cluster is otherwise functional.

If you need any additional configuration such as running the operator on the host network, please comment here (or in #13500) to be clear about what it took to run as expected in your network configuration.

@travisn
Copy link
Member

travisn commented Jan 22, 2024

@echel0n-TR @echel0n-HX Did you get a chance to look at this?

@echel0n-HX
Copy link

Unfortunately we had to switch to a different storage system as the performance we got from rook-ceph was just not adequate to what our requirements are, so I am unable to test this for you, sorry.

@travisn
Copy link
Member

travisn commented Jan 22, 2024

Ok thanks for the response, will close this for now until someone else has this requirement.

@travisn travisn closed this as completed Feb 12, 2024
@klippo
Copy link

klippo commented Jun 5, 2024

@travisn We are interested in trying this. Currently running rook 1.14.4

We tested your code on 1.14.4 and it seems to work as expected.

kubectl get node ceph-node01 -o json | jq '.metadata.annotations["network.rook.io/mon-ip"]'
"172.19.90.10"
 ss -alpt | grep 6789
LISTEN 0      512     172.19.90.10:6789        0.0.0.0:*    users:(("ceph-mon",pid=1890639,fd=26))     

@travisn
Copy link
Member

travisn commented Jun 5, 2024

@klippo Great to hear it's working for you. With that, I will reopen assuming you are interested in the feature getting into a release.

@travisn travisn reopened this Jun 5, 2024
@klippo
Copy link

klippo commented Jun 6, 2024

Thanks @travisn , looking forward to this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants