Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuration of two Ceph mgr daemons #3076

Closed
wants to merge 1 commit into from

Conversation

wangxiao86
Copy link

base on this PR: #3028 , configure mgr in CephCluster spec.

Description of your changes:
Base on this PR: #3028, add the configuration of mgr to the CephCluster spec. Then, user can set the number of deployments of the mgr in the CephCluster spec.

Which issue is resolved by this Pull Request:
Resolves #

Checklist:

  • Documentation has been updated, if necessary.
  • Pending release notes updated with breaking and/or notable changes, if necessary.
  • Upgrade from previous release is tested and upgrade user guide is updated, if necessary.
  • Code generation (make codegen) has been run to update object specifications, if necessary.
  • Comments have been added or updated based on the standards set in CONTRIBUTING.md

Signed-off-by: wangxiao86 <wangxiao1@sensetime.com>
Copy link
Member

@travisn travisn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few overall suggestions for the PR:

  • Could you reference Run two MGRs to have one in standby mode #1048 as the issue to be resolved by the PR? I just reopened this issue.
  • In the PR subject and commit messages please describe the issue. There is no need to reference issue numbers or other PRs in the subject or commit message. The body of the PR is ok for that.
  • We will need this feature added to PendingReleaseNotes.txt and the documentation in the ceph-cluster-crd.yaml.
  • Can you add notes to the PR message about testing you've performed? There are comments in Run two MGRs to have one in standby mode #1048 about challenges in the past with multiple mgrs.

Thanks!

@@ -32,6 +32,10 @@ spec:
mon:
count: 3
allowMultiplePerNode: false
# set the amount of mgrs to be started, the maximum number of mgr.count is 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can enforce a max with schema validation similar to the mons. See this example

@@ -134,6 +137,12 @@ type MonSpec struct {
AllowMultiplePerNode bool `json:"allowMultiplePerNode"`
}

type MgrSpec struct {
Count int `json:"count"`
PreferredCount int `json:"preferredCount"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need PreferredCount. That's specific to the mons

type MgrSpec struct {
Count int `json:"count"`
PreferredCount int `json:"preferredCount"`
AllowMultiplePerNode bool `json:"allowMultiplePerNode"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could just not add this property. It isn't as critical for the mgr as the mons to be running on separate nodes, but the operator should certainly attempt to schedule them on different nodes if possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mgrs will still need to be on separate nodes for host networking I believe. Though I'm still interested in removing this property. I would rather set an anti-affinity on the mgr pods when host networking is enabled so Kubernetes won't schedule them together.

I don't know what the behavior would be if there is only a single node the mgr could be scheduled on however.

@@ -72,6 +72,9 @@ type ClusterSpec struct {
// A spec for mon related options
Mon MonSpec `json:"mon"`

// A spec for mgr related options
Mgr MgrSpec `json:"mgr"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you see any updates to the generated code if you run make codegen?

@@ -106,6 +110,11 @@ func (c *Cluster) Start() error {
return fmt.Errorf("%v", err)
}

// fail if we were instructed to deploy more than one mgr on the same machine with host networking
if c.HostNetwork && c.allowMultiplePerNode && c.Replicas > 1 {
return fmt.Errorf("refusing to deploy %d managers on the same host since hostNetwork is %v and allowMultiplePerNode is %v. Only one manager per node is allowed", c.Replicas, c.HostNetwork, c.allowMultiplePerNode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check if the number of nodes available for mgr (that match the placement policy) as well. If there are actually multiple nodes available, it will be fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. When we set the count of mgr for host networking, we must check the number of nodes available for mgr.

if c.HostNetwork && c.allowMultiplePerNode && c.Replicas > 1 {
return fmt.Errorf("refusing to deploy %d managers on the same host since hostNetwork is %v and allowMultiplePerNode is %v. Only one manager per node is allowed", c.Replicas, c.HostNetwork, c.allowMultiplePerNode)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a taint to the mgr deployments so they won't deploy on the same node? There isn't anything right now that ensures they are scheduled independently IIRC.

@travisn travisn changed the title base on this PR: https://github.com/rook/rook/pull/3028 , configure m… Allow configuration of two Ceph mgr daemons Apr 29, 2019
type MgrSpec struct {
Count int `json:"count"`
PreferredCount int `json:"preferredCount"`
AllowMultiplePerNode bool `json:"allowMultiplePerNode"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mgrs will still need to be on separate nodes for host networking I believe. Though I'm still interested in removing this property. I would rather set an anti-affinity on the mgr pods when host networking is enabled so Kubernetes won't schedule them together.

I don't know what the behavior would be if there is only a single node the mgr could be scheduled on however.

@@ -134,6 +137,12 @@ type MonSpec struct {
AllowMultiplePerNode bool `json:"allowMultiplePerNode"`
}

type MgrSpec struct {
Count int `json:"count"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liewegas or @leseb, are there situations where (1) more than 2 mgrs makes sense or (2) a single mgr is preferable to having two mgrs?

If more than two mgrs doesn't make sense and if one mgr doesn't make sense except for the case of single-node clusters with host networking enabled, I think it could be more user friendly to just always create two and not make them configure this in the crd.

@liewegas
Copy link
Member

Running multiple mgrs via kube is tricky because of the routing of services to the active mgr. Ceph does it's own internal selection of which mgr is the master, and the non-master will do HTTP redirects (usually) for things like dashboard etc to the active mgr. This doesn't mesh at all with the private/public addrs and the Service stuff that kubernetes does.

I think before we can do this we need to do something like

  • disable the redirect to master behavior
  • make rook communicate the mgr master to kubernetes and update the Service definitions to only send connections to the master mgr

...but I'm not sure that will actually result in a net improvement in availability (e.g., latency after a mgr crash) since it leaves rook pooling ceph for mgr changes and changing the service definition.. is that really faster than creating a new mgr pod and letting a replacement mgr start up?

@sebastian-philipp
Copy link
Member

Relates to http://tracker.ceph.com/issues/24662

@wangxiao86
Copy link
Author

Can I set the mgr Replicas to 2 directly for HA ?

@sebastian-philipp
Copy link
Member

I think before we can do this we need to do something like

  • disable the redirect to master behavior

See ceph/ceph#29088

  • make rook communicate the mgr master to kubernetes and update the Service definitions to only send connections to the master mgr

@galexrt galexrt added ceph main ceph tag ceph-mgr Relating to the Ceph mgr or mgr modules labels Sep 23, 2019
@leseb
Copy link
Member

leseb commented Sep 27, 2019

It looks we are still uncertain of the benefit of having more than one mgr at this point. The PR hasn't got much attention, so I believe we should close it for now and resume once a decision has been made. Thanks for your understanding @wangxiao86

@leseb leseb closed this Sep 27, 2019
@sebastian-philipp
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ceph main ceph tag ceph-mgr Relating to the Ceph mgr or mgr modules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants