Allow configuration of two Ceph mgr daemons #3076

wangxiao86 · 2019-04-28T03:47:13Z

base on this PR: #3028 , configure mgr in CephCluster spec.

Description of your changes:
Base on this PR: #3028, add the configuration of mgr to the CephCluster spec. Then, user can set the number of deployments of the mgr in the CephCluster spec.

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

Documentation has been updated, if necessary.
Pending release notes updated with breaking and/or notable changes, if necessary.
Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
Code generation (make codegen) has been run to update object specifications, if necessary.
Comments have been added or updated based on the standards set in CONTRIBUTING.md

Signed-off-by: wangxiao86 <wangxiao1@sensetime.com>

travisn

A few overall suggestions for the PR:

Could you reference Run two MGRs to have one in standby mode #1048 as the issue to be resolved by the PR? I just reopened this issue.
In the PR subject and commit messages please describe the issue. There is no need to reference issue numbers or other PRs in the subject or commit message. The body of the PR is ok for that.
We will need this feature added to PendingReleaseNotes.txt and the documentation in the ceph-cluster-crd.yaml.
Can you add notes to the PR message about testing you've performed? There are comments in Run two MGRs to have one in standby mode #1048 about challenges in the past with multiple mgrs.

Thanks!

travisn · 2019-04-29T17:15:35Z

cluster/examples/kubernetes/ceph/cluster.yaml

@@ -32,6 +32,10 @@ spec:
  mon:
    count: 3
    allowMultiplePerNode: false
+  # set the amount of mgrs to be started, the maximum number of mgr.count is 2


We can enforce a max with schema validation similar to the mons. See this example

travisn · 2019-04-29T17:16:29Z

pkg/apis/ceph.rook.io/v1/types.go

@@ -134,6 +137,12 @@ type MonSpec struct {
 	AllowMultiplePerNode bool `json:"allowMultiplePerNode"`
 }

+type MgrSpec struct {
+	Count                int  `json:"count"`
+	PreferredCount       int  `json:"preferredCount"`


We don't need PreferredCount. That's specific to the mons

travisn · 2019-04-29T17:17:42Z

pkg/apis/ceph.rook.io/v1/types.go

+type MgrSpec struct {
+	Count                int  `json:"count"`
+	PreferredCount       int  `json:"preferredCount"`
+	AllowMultiplePerNode bool `json:"allowMultiplePerNode"`


I wonder if we could just not add this property. It isn't as critical for the mgr as the mons to be running on separate nodes, but the operator should certainly attempt to schedule them on different nodes if possible.

The mgrs will still need to be on separate nodes for host networking I believe. Though I'm still interested in removing this property. I would rather set an anti-affinity on the mgr pods when host networking is enabled so Kubernetes won't schedule them together.

I don't know what the behavior would be if there is only a single node the mgr could be scheduled on however.

travisn · 2019-04-29T17:20:34Z

pkg/apis/ceph.rook.io/v1/types.go

@@ -72,6 +72,9 @@ type ClusterSpec struct {
 	// A spec for mon related options
 	Mon MonSpec `json:"mon"`

+	// A spec for mgr related options
+	Mgr MgrSpec `json:"mgr"`


do you see any updates to the generated code if you run make codegen?

travisn · 2019-04-29T17:23:21Z

pkg/operator/ceph/cluster/mgr/mgr.go

@@ -106,6 +110,11 @@ func (c *Cluster) Start() error {
 		return fmt.Errorf("%v", err)
 	}

+	// fail if we were instructed to deploy more than one mgr on the same machine with host networking
+	if c.HostNetwork && c.allowMultiplePerNode && c.Replicas > 1 {
+		return fmt.Errorf("refusing to deploy %d managers on the same host since hostNetwork is %v and allowMultiplePerNode is %v. Only one manager per node is allowed", c.Replicas, c.HostNetwork, c.allowMultiplePerNode)


We should check if the number of nodes available for mgr (that match the placement policy) as well. If there are actually multiple nodes available, it will be fine.

Yes, you are right. When we set the count of mgr for host networking, we must check the number of nodes available for mgr.

travisn · 2019-04-29T17:24:48Z

pkg/operator/ceph/cluster/mgr/mgr.go

+	if c.HostNetwork && c.allowMultiplePerNode && c.Replicas > 1 {
+		return fmt.Errorf("refusing to deploy %d managers on the same host since hostNetwork is %v and allowMultiplePerNode is %v. Only one manager per node is allowed", c.Replicas, c.HostNetwork, c.allowMultiplePerNode)
+	}
+


Could we add a taint to the mgr deployments so they won't deploy on the same node? There isn't anything right now that ensures they are scheduled independently IIRC.

BlaineEXE · 2019-04-30T15:50:30Z

pkg/apis/ceph.rook.io/v1/types.go

+type MgrSpec struct {
+	Count                int  `json:"count"`
+	PreferredCount       int  `json:"preferredCount"`
+	AllowMultiplePerNode bool `json:"allowMultiplePerNode"`


The mgrs will still need to be on separate nodes for host networking I believe. Though I'm still interested in removing this property. I would rather set an anti-affinity on the mgr pods when host networking is enabled so Kubernetes won't schedule them together.

I don't know what the behavior would be if there is only a single node the mgr could be scheduled on however.

BlaineEXE · 2019-04-30T15:54:29Z

pkg/apis/ceph.rook.io/v1/types.go

@@ -134,6 +137,12 @@ type MonSpec struct {
 	AllowMultiplePerNode bool `json:"allowMultiplePerNode"`
 }

+type MgrSpec struct {
+	Count                int  `json:"count"`


@liewegas or @leseb, are there situations where (1) more than 2 mgrs makes sense or (2) a single mgr is preferable to having two mgrs?

If more than two mgrs doesn't make sense and if one mgr doesn't make sense except for the case of single-node clusters with host networking enabled, I think it could be more user friendly to just always create two and not make them configure this in the crd.

liewegas · 2019-04-30T16:05:10Z

Running multiple mgrs via kube is tricky because of the routing of services to the active mgr. Ceph does it's own internal selection of which mgr is the master, and the non-master will do HTTP redirects (usually) for things like dashboard etc to the active mgr. This doesn't mesh at all with the private/public addrs and the Service stuff that kubernetes does.

I think before we can do this we need to do something like

disable the redirect to master behavior
make rook communicate the mgr master to kubernetes and update the Service definitions to only send connections to the master mgr

...but I'm not sure that will actually result in a net improvement in availability (e.g., latency after a mgr crash) since it leaves rook pooling ceph for mgr changes and changing the service definition.. is that really faster than creating a new mgr pod and letting a replacement mgr start up?

sebastian-philipp · 2019-05-02T13:28:46Z

Relates to http://tracker.ceph.com/issues/24662

wangxiao86 · 2019-05-05T06:33:40Z

Can I set the mgr Replicas to 2 directly for HA ?

sebastian-philipp · 2019-07-18T10:18:10Z

I think before we can do this we need to do something like

disable the redirect to master behavior

See ceph/ceph#29088

make rook communicate the mgr master to kubernetes and update the Service definitions to only send connections to the master mgr

leseb · 2019-09-27T13:50:57Z

It looks we are still uncertain of the benefit of having more than one mgr at this point. The PR hasn't got much attention, so I believe we should close it for now and resume once a decision has been made. Thanks for your understanding @wangxiao86

sebastian-philipp · 2019-09-30T07:46:33Z

Relates to https://lists.ceph.io/hyperkitty/list/dev@ceph.io/thread/SGVLC3NQDSB3A27QO4XL6TDURJM3ZF5W/

base on this PR: rook#3028 , configure mgr in CephCluster spec

e2b8206

Signed-off-by: wangxiao86 <wangxiao1@sensetime.com>

wangxiao86 mentioned this pull request Apr 28, 2019

Set mgr daemon name in server_addr config setting path #3028

Merged

5 tasks

travisn requested changes Apr 29, 2019

View reviewed changes

travisn changed the title ~~base on this PR: https://github.com/rook/rook/pull/3028 , configure m…~~ Allow configuration of two Ceph mgr daemons Apr 29, 2019

BlaineEXE reviewed Apr 30, 2019

View reviewed changes

sebastian-philipp mentioned this pull request Jul 18, 2019

mgr/dashboard: Allow disabling redirection on standby Dashboards ceph/ceph#29088

Merged

3 tasks

galexrt added ceph main ceph tag ceph-mgr Relating to the Ceph mgr or mgr modules labels Sep 23, 2019

leseb closed this Sep 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow configuration of two Ceph mgr daemons #3076

Allow configuration of two Ceph mgr daemons #3076

wangxiao86 commented Apr 28, 2019

travisn left a comment

travisn Apr 29, 2019

travisn Apr 29, 2019

travisn Apr 29, 2019

BlaineEXE Apr 30, 2019

travisn Apr 29, 2019

travisn Apr 29, 2019

wangxiao86 May 5, 2019

travisn Apr 29, 2019

BlaineEXE Apr 30, 2019

BlaineEXE Apr 30, 2019

liewegas commented Apr 30, 2019

sebastian-philipp commented May 2, 2019

wangxiao86 commented May 5, 2019

sebastian-philipp commented Jul 18, 2019

leseb commented Sep 27, 2019

sebastian-philipp commented Sep 30, 2019

Allow configuration of two Ceph mgr daemons #3076

Allow configuration of two Ceph mgr daemons #3076

Conversation

wangxiao86 commented Apr 28, 2019

travisn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liewegas commented Apr 30, 2019

sebastian-philipp commented May 2, 2019

wangxiao86 commented May 5, 2019

sebastian-philipp commented Jul 18, 2019

leseb commented Sep 27, 2019

sebastian-philipp commented Sep 30, 2019