Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multus and CSI #5356

Closed
leseb opened this issue Apr 28, 2020 · 12 comments · Fixed by #5740
Closed

Multus and CSI #5356

leseb opened this issue Apr 28, 2020 · 12 comments · Fixed by #5740
Assignees
Projects

Comments

@leseb
Copy link
Member

leseb commented Apr 28, 2020

CSI templates need to see their annotation updated is Multus is detected as a provider.

@leseb leseb added the feature label Apr 28, 2020
@Madhu-1 Madhu-1 added the csi label Apr 28, 2020
@leseb leseb self-assigned this Apr 29, 2020
@leseb leseb added this to To do in v1.4 May 20, 2020
@leseb leseb assigned rohan47 and unassigned leseb Jun 8, 2020
@leseb
Copy link
Member Author

leseb commented Jun 9, 2020

  1. Read the CephCluster CR before deploying CSI
  2. discover if multus is used
  3. propagate the annotations to the csi pods when deploying

rohan47 added a commit to rohan47/rook that referenced this issue Jun 30, 2020
CSI pods now utilize multus networking and connect to public
network specified in the CephCluster CR.

Closes: rook#5356
Signed-off-by: rohan47 <rohgupta@redhat.com>
@leseb leseb moved this from To do to In Review in v1.4 Jul 1, 2020
rohan47 added a commit to rohan47/rook that referenced this issue Jul 1, 2020
CSI pods now utilize multus networking and connect to public
network specified in the CephCluster CR.

Closes: rook#5356
Signed-off-by: rohan47 <rohgupta@redhat.com>
rohan47 added a commit to rohan47/rook that referenced this issue Jul 8, 2020
CSI pods now utilize multus networking and connect to public
network specified in the CephCluster CR.

Closes: rook#5356
Signed-off-by: rohan47 <rohgupta@redhat.com>
@travisn travisn moved this from In Review to In progress in v1.4 Jul 14, 2020
@bengland2
Copy link

question: does this mean that if I'm running an fio pod using CSI that the fio pod would automatically add the multus annotation and receive an extra network interface on the fly? I wasn't sure if multus interfaces could be hot-plugged. So this means that I don't have to change application yamls to know about multus? If so, that would be great news. Thx for help from Sebastien and Rohan. -ben

@rohan47
Copy link
Member

rohan47 commented Jul 16, 2020

question: does this mean that if I'm running an fio pod using CSI that the fio pod would automatically add the multus annotation and receive an extra network interface on the fly? I wasn't sure if multus interfaces could be hot-plugged. So this means that I don't have to change application yamls to know about multus? If so, that would be great news. Thx for help from Sebastien and Rohan. -ben

The fix for this issue will only apply multus annotations to the CSI component pods.
We will have to apply multus annotation to the fio pods manually.

@rohan47
Copy link
Member

rohan47 commented Jul 16, 2020

Is the fio binary inside the csi image that you are using? @bengland2

@bengland2
Copy link

the fio binary is inside the https://quay.io/repository/cloud-bulldozer/fio image, sorry, not clear to me what you mean by a "CSI image". fio does not know about Ceph, it is just using a mountpoint on a volume handed to it by Kubernetes, and the mountpoint is created from a storageclass, either rbd or cephfs. Isn't that what CSI does -- provide access to storage resources independent of the storageclass implementation?

@bengland2
Copy link

Another problem: OSD pods come up with 2 multus interfaces, 1 public 1 cluster, but do not use these interfaces. This is because the "cluster network" parameter appears not to be set. Rohan seems to think that rook is not dealing with the whereabouts NAD correctly. But there is a workaround - either using ceph_config_overrides configmap or using "ceph config-key" feature, we can set the "cluster network" parameter and restart the OSDs to force it to use them. I have actually seen this work, just that rook.io should be doing this automatically (according to Seb). Does this need a separate github issue or can we piggyback on top of this one ;-)

@bengland2
Copy link

today I got a multus cluster to technically come up (HEALTH_OK with 12 OSDs), but when I tried to run a fio workload, the PVC would not bind, I got this error, is this what you expect to happen?

http://pastebin.test.redhat.com/885502

Also, I do not understand the mechanics of how kernel Ceph modules learn how to access the correct Multus subnet when they are not part of the openshift network namespace. Specifically,when I run the fio benchmark with a Ceph storage class, that means that the pod must access a kernel RBD or Cephfs mountpoint, and these are implemented by kernel modules that the Ceph CSI module does not directly control. But this problem exists already because the SDN network is part of a network namespace already isn't it? I understand better how Ceph works on baremetal but I haven't learned K8S networking yet.

@bengland2
Copy link

I tried just creating a cluster network and no public network, which is an unsupported configuration. It fails with this crash. Could we support this configuration because it would allow some benefit from Multus even before we resolve this issue.

@leseb
Copy link
Member Author

leseb commented Jul 20, 2020

I tried just creating a cluster network and no public network, which is an unsupported configuration. It fails with this crash. Could we support this configuration because it would allow some benefit from Multus even before we resolve this issue.

Two things:

  1. Rook should not crash like this
  2. Why should we support this? "cluster" network is only the replication network, if you have a single network to dedicate then just specify "public"?

rohan47 added a commit to rohan47/rook that referenced this issue Jul 20, 2020
CSI pods now utilize multus networking and connect to public
network specified in the CephCluster CR.

Closes: rook#5356
Signed-off-by: rohan47 <rohgupta@redhat.com>
rohan47 added a commit to rohan47/rook that referenced this issue Jul 20, 2020
CSI pods now utilize multus networking and connect to public
network specified in the CephCluster CR.

Closes: rook#5356
Signed-off-by: rohan47 <rohgupta@redhat.com>
@bengland2
Copy link

Sebastien, if multus only specifies the cluster network, then the implication is that we continue to use the SDN network for the public network, but at least you still have > 1 network (and > 1 physical NIC port) that you could use this way. I would prefer to have the public network on Multus as well, but until Ceph-CSI can set that up, I thought this might be a short-term solution.

@leseb
Copy link
Member Author

leseb commented Jul 22, 2020

Ok so basically, slow client network but fast replication network 🤔 , I believe it's an acceptable short-term solution.
But since CSI uses host networking, it should be able to connect to any networks already. So declaring the public network should work IMO.

rohan47 added a commit to rohan47/rook that referenced this issue Jul 23, 2020
CSI pods now utilize multus networking and connect to public
network specified in the CephCluster CR.

Closes: rook#5356
Signed-off-by: rohan47 <rohgupta@redhat.com>
@jbw976 jbw976 moved this from In progress to In Review in v1.4 Jul 28, 2020
rohan47 added a commit to rohan47/rook that referenced this issue Aug 5, 2020
CSI pods now utilize multus networking and connect to public
network specified in the CephCluster CR.

Closes: rook#5356
Signed-off-by: rohan47 <rohgupta@redhat.com>
rohan47 added a commit to rohan47/rook that referenced this issue Aug 5, 2020
CSI pods now utilize multus networking and connect to public
network specified in the CephCluster CR.

Closes: rook#5356
Signed-off-by: rohan47 <rohgupta@redhat.com>
@travisn travisn moved this from In Review to In progress in v1.4 Aug 10, 2020
@rohan47
Copy link
Member

rohan47 commented Aug 12, 2020

The current issue that we are facing while using multus and CSI is that the rbd commands get stuck inside the csi-nodeplugin/csi-rbdplugin container, the details are captured in this issue ceph/ceph-csi#1323

rohan47 added a commit to rohan47/rook that referenced this issue Aug 20, 2020
CSI pods now utilize multus networking and connect to public
network specified in the CephCluster CR.

Closes: rook#5356
Signed-off-by: rohan47 <rohgupta@redhat.com>
rohan47 added a commit to rohan47/rook that referenced this issue Aug 20, 2020
CSI pods now utilize multus networking and connect to public
network specified in the CephCluster CR.

Closes: rook#5356
Signed-off-by: rohan47 <rohgupta@redhat.com>
rohan47 added a commit to rohan47/rook that referenced this issue Aug 20, 2020
CSI pods now utilize multus networking and connect to public
network specified in the CephCluster CR.

Closes: rook#5356
Signed-off-by: rohan47 <rohgupta@redhat.com>
mergify bot pushed a commit that referenced this issue Aug 24, 2020
CSI pods now utilize multus networking and connect to public
network specified in the CephCluster CR.

Closes: #5356
Signed-off-by: rohan47 <rohgupta@redhat.com>
(cherry picked from commit 8c4ede1)
@travisn travisn moved this from In progress to Done in v1.4 Aug 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
v1.4
  
Done
Development

Successfully merging a pull request may close this issue.

4 participants