-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow configuration of two Ceph mgr daemons #3076
Conversation
Signed-off-by: wangxiao86 <wangxiao1@sensetime.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few overall suggestions for the PR:
- Could you reference Run two MGRs to have one in standby mode #1048 as the issue to be resolved by the PR? I just reopened this issue.
- In the PR subject and commit messages please describe the issue. There is no need to reference issue numbers or other PRs in the subject or commit message. The body of the PR is ok for that.
- We will need this feature added to PendingReleaseNotes.txt and the documentation in the ceph-cluster-crd.yaml.
- Can you add notes to the PR message about testing you've performed? There are comments in Run two MGRs to have one in standby mode #1048 about challenges in the past with multiple mgrs.
Thanks!
@@ -32,6 +32,10 @@ spec: | |||
mon: | |||
count: 3 | |||
allowMultiplePerNode: false | |||
# set the amount of mgrs to be started, the maximum number of mgr.count is 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can enforce a max with schema validation similar to the mons. See this example
@@ -134,6 +137,12 @@ type MonSpec struct { | |||
AllowMultiplePerNode bool `json:"allowMultiplePerNode"` | |||
} | |||
|
|||
type MgrSpec struct { | |||
Count int `json:"count"` | |||
PreferredCount int `json:"preferredCount"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need PreferredCount
. That's specific to the mons
type MgrSpec struct { | ||
Count int `json:"count"` | ||
PreferredCount int `json:"preferredCount"` | ||
AllowMultiplePerNode bool `json:"allowMultiplePerNode"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could just not add this property. It isn't as critical for the mgr as the mons to be running on separate nodes, but the operator should certainly attempt to schedule them on different nodes if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mgrs will still need to be on separate nodes for host networking I believe. Though I'm still interested in removing this property. I would rather set an anti-affinity on the mgr pods when host networking is enabled so Kubernetes won't schedule them together.
I don't know what the behavior would be if there is only a single node the mgr could be scheduled on however.
@@ -72,6 +72,9 @@ type ClusterSpec struct { | |||
// A spec for mon related options | |||
Mon MonSpec `json:"mon"` | |||
|
|||
// A spec for mgr related options | |||
Mgr MgrSpec `json:"mgr"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you see any updates to the generated code if you run make codegen
?
@@ -106,6 +110,11 @@ func (c *Cluster) Start() error { | |||
return fmt.Errorf("%v", err) | |||
} | |||
|
|||
// fail if we were instructed to deploy more than one mgr on the same machine with host networking | |||
if c.HostNetwork && c.allowMultiplePerNode && c.Replicas > 1 { | |||
return fmt.Errorf("refusing to deploy %d managers on the same host since hostNetwork is %v and allowMultiplePerNode is %v. Only one manager per node is allowed", c.Replicas, c.HostNetwork, c.allowMultiplePerNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should check if the number of nodes available for mgr (that match the placement policy) as well. If there are actually multiple nodes available, it will be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right. When we set the count of mgr for host networking, we must check the number of nodes available for mgr.
if c.HostNetwork && c.allowMultiplePerNode && c.Replicas > 1 { | ||
return fmt.Errorf("refusing to deploy %d managers on the same host since hostNetwork is %v and allowMultiplePerNode is %v. Only one manager per node is allowed", c.Replicas, c.HostNetwork, c.allowMultiplePerNode) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a taint to the mgr deployments so they won't deploy on the same node? There isn't anything right now that ensures they are scheduled independently IIRC.
type MgrSpec struct { | ||
Count int `json:"count"` | ||
PreferredCount int `json:"preferredCount"` | ||
AllowMultiplePerNode bool `json:"allowMultiplePerNode"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mgrs will still need to be on separate nodes for host networking I believe. Though I'm still interested in removing this property. I would rather set an anti-affinity on the mgr pods when host networking is enabled so Kubernetes won't schedule them together.
I don't know what the behavior would be if there is only a single node the mgr could be scheduled on however.
@@ -134,6 +137,12 @@ type MonSpec struct { | |||
AllowMultiplePerNode bool `json:"allowMultiplePerNode"` | |||
} | |||
|
|||
type MgrSpec struct { | |||
Count int `json:"count"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liewegas or @leseb, are there situations where (1) more than 2 mgrs makes sense or (2) a single mgr is preferable to having two mgrs?
If more than two mgrs doesn't make sense and if one mgr doesn't make sense except for the case of single-node clusters with host networking enabled, I think it could be more user friendly to just always create two and not make them configure this in the crd.
Running multiple mgrs via kube is tricky because of the routing of services to the active mgr. Ceph does it's own internal selection of which mgr is the master, and the non-master will do HTTP redirects (usually) for things like dashboard etc to the active mgr. This doesn't mesh at all with the private/public addrs and the Service stuff that kubernetes does. I think before we can do this we need to do something like
...but I'm not sure that will actually result in a net improvement in availability (e.g., latency after a mgr crash) since it leaves rook pooling ceph for mgr changes and changing the service definition.. is that really faster than creating a new mgr pod and letting a replacement mgr start up? |
Relates to http://tracker.ceph.com/issues/24662 |
Can I set the mgr Replicas to 2 directly for HA ? |
See ceph/ceph#29088
|
It looks we are still uncertain of the benefit of having more than one mgr at this point. The PR hasn't got much attention, so I believe we should close it for now and resume once a decision has been made. Thanks for your understanding @wangxiao86 |
base on this PR: #3028 , configure mgr in CephCluster spec.
Description of your changes:
Base on this PR: #3028, add the configuration of mgr to the CephCluster spec. Then, user can set the number of deployments of the mgr in the CephCluster spec.
Which issue is resolved by this Pull Request:
Resolves #
Checklist:
make codegen
) has been run to update object specifications, if necessary.