-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceph: Set the location on the mon daemon for stretch clusters #7535
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
until testing works
This pull request has merge conflicts that must be resolved before it can be merged. @travisn please rebase it. https://rook.io/docs/rook/master/development-flow.html#updating-your-fork |
c88623a
to
21b9fe2
Compare
These changes are working when testing against the latest ceph master tag: After we get a new Pacific release with that merged, we can get this PR merged. |
8eda3eb
to
63e0432
Compare
This is confirmed working with |
63e0432
to
1355ba5
Compare
if monConfig.Zone != "" { | ||
desiredLocation := fmt.Sprintf("%s=%s", c.stretchFailureDomainName(), monConfig.Zone) | ||
container.Args = append(container.Args, []string{"--set-crush-location", desiredLocation}...) | ||
if monConfig.Zone == c.getArbiterZone() { | ||
// remember the arbiter mon to be set later in the reconcile after the OSDs are configured | ||
c.arbiterMon = monConfig.DaemonName | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do a check and enforce that the user is using v16.2.5 or higher here? That might be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'll add a note in the pending release notes that stretch requires 16.2.5 or higher. I can't add a code check because it will block downstream from running on a custom build of nautilus for stretch clusters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do the check but have an env var option that would disable the check for downstream. I think providing clear feedback for upstream users who try to use the wrong version is a good experience for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stretch clusters are a much less common scenario where I haven't seen upstream feedback, so I think we'll be fine upstream without a check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's acceptable to have different checks for upstream and downstream. I agree stretch mode is downstream focus but I don't want downstream to drive upstream code especially if it brings clarity to users. Also, since it's now considered stable We do this already for rbd and cephfs mirror which caused some small issues downstream, we just need to track this so during the resync we change the version for downstream Nautilus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed downstream shouldn't drive the upstream since upstream is first. I'll go ahead and add the check, then we can remove it downstream as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as a new commit
1355ba5
to
37afa29
Compare
if monConfig.Zone != "" { | ||
desiredLocation := fmt.Sprintf("%s=%s", c.stretchFailureDomainName(), monConfig.Zone) | ||
container.Args = append(container.Args, []string{"--set-crush-location", desiredLocation}...) | ||
if monConfig.Zone == c.getArbiterZone() { | ||
// remember the arbiter mon to be set later in the reconcile after the OSDs are configured | ||
c.arbiterMon = monConfig.DaemonName | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's acceptable to have different checks for upstream and downstream. I agree stretch mode is downstream focus but I don't want downstream to drive upstream code especially if it brings clarity to users. Also, since it's now considered stable We do this already for rbd and cephfs mirror which caused some small issues downstream, we just need to track this so during the resync we change the version for downstream Nautilus.
The mon daemon in a stretch cluster now can have its location set as a CLI param instead of setting it with a separate command. This enables mon failover to set the location of a mon immediately when it is joining quorum instead of having a delayed command to set the location. Signed-off-by: Travis Nielsen <tnielsen@redhat.com>
Show an example of configuring a stretch cluster in AWS. Signed-off-by: Travis Nielsen <tnielsen@redhat.com>
37afa29
to
bd316a0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
pkg/operator/ceph/cluster/cluster.go
Outdated
@@ -240,6 +240,12 @@ func (c *ClusterController) configureLocalCephCluster(cluster *cluster) error { | |||
return errors.Wrap(err, "failed the ceph version check") | |||
} | |||
|
|||
if cluster.Spec.IsStretchCluster() { | |||
if !cephVersion.IsAtLeast(cephver.CephVersion{Major: 16, Minor: 2, Build: 5}) { | |||
return fmt.Errorf("stretch clusters minimum ceph version is v16.2.5, but is running %s", cephVersion.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use errors.Errorf()
The stretch clusters pass a new parameter to the mon daemons which is only available in v16.2.5 and newer. Older versions of Ceph will fail to run with stretch clusters in rook. Signed-off-by: Travis Nielsen <tnielsen@redhat.com>
bd316a0
to
3d629ea
Compare
Description of your changes:
The mon daemon in a stretch cluster now can have its location set as a CLI param instead of setting it with a separate command. This enables mon failover to set the location of a mon immediately when it is joining quorum instead of having a delayed command to set the location.
The mon is not yet joining quorum with these changes, still testing
Checklist:
make codegen
) has been run to update object specifications, if necessary.