-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pool: Update the desired failure domain when changed #9214
Conversation
pkg/daemon/ceph/client/pool.go
Outdated
return errors.Wrapf(err, "failed to get pool %q details", poolName) | ||
} | ||
|
||
if details.FailureDomain == pool.FailureDomain { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the pool has an existing rule and a new one is applied? Should we remove the old rule? Or at least print a message that the previous rule might need to be manually removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about removing the old rule, but decided not to in case some other pool was using the same rule. Are there any side effects of old rules staying in the crush map? I'll at least log it though.
558c8d4
to
5ea6677
Compare
pkg/daemon/ceph/client/crush_rule.go
Outdated
|
||
err = json.Unmarshal(buf, &rule) | ||
if err != nil { | ||
return rule, errors.Wrap(err, "failed to unmarshal crush rule") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Printing buf
would be nice too for debugging.
5ea6677
to
db54338
Compare
pkg/daemon/ceph/client/crush_rule.go
Outdated
|
||
err = json.Unmarshal(buf, &rule) | ||
if err != nil { | ||
return rule, errors.Wrapf(err, "failed to unmarshal crush rule. %v", string(buf)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return rule, errors.Wrapf(err, "failed to unmarshal crush rule. %v", string(buf)) | |
return rule, errors.Wrapf(err, "failed to unmarshal crush rule. %s", string(buf)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed?
pkg/daemon/ceph/client/pool.go
Outdated
func extractFailureDomain(rule ruleSpec) string { | ||
// find the failure domain in the crush rule, which is the first step where the | ||
// "type" property is set | ||
for _, step := range rule.Steps { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach better. Can we use sets.NewString()
with Has()
instead for this loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this might or might not work. If we have a 2 steps rule or more we might find the step but it could be in a different rack. Maybe what we had earlier was stricter and we should stick to it, if the rule steps go deeper we should fail. This would mean the rule was not created by Rook and modified. So I think we should go back to using rule.Steps[1]
with a length check.
pkg/daemon/ceph/client/pool.go
Outdated
|
||
_, err := NewCephCommand(context, clusterInfo, args).Run() | ||
if err != nil { | ||
return errors.Wrapf(err, "failed to set crush rule %s", crushRule) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return errors.Wrapf(err, "failed to set crush rule %s", crushRule) | |
return errors.Wrapf(err, "failed to set crush rule %q", crushRule) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed?
pkg/daemon/ceph/client/pool.go
Outdated
func extractFailureDomain(rule ruleSpec) string { | ||
// find the failure domain in the crush rule, which is the first step where the | ||
// "type" property is set | ||
for _, step := range rule.Steps { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this might or might not work. If we have a 2 steps rule or more we might find the step but it could be in a different rack. Maybe what we had earlier was stricter and we should stick to it, if the rule steps go deeper we should fail. This would mean the rule was not created by Rook and modified. So I think we should go back to using rule.Steps[1]
with a length check.
db54338
to
0f51bb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just small nits if you want to address them :)
pkg/daemon/ceph/client/crush_rule.go
Outdated
|
||
err = json.Unmarshal(buf, &rule) | ||
if err != nil { | ||
return rule, errors.Wrapf(err, "failed to unmarshal crush rule. %v", string(buf)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed?
pkg/daemon/ceph/client/pool.go
Outdated
|
||
_, err := NewCephCommand(context, clusterInfo, args).Run() | ||
if err != nil { | ||
return errors.Wrapf(err, "failed to set crush rule %s", crushRule) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed?
The failure domain is baked into the crush rule that is created for a pool. To allow for an updated failure domain on the pool, create a new crush rule specific for that failure domain and update the pool with the new rule. Signed-off-by: Travis Nielsen <tnielsen@redhat.com>
0f51bb5
to
8e55dd9
Compare
pool: Update the desired failure domain when changed (backport #9214)
Description of your changes:
The failure domain is baked into the crush rule that is created for a pool. To allow for an updated failure domain on the pool, create a new crush rule specific for that failure domain and update the pool with the new rule.
Previously the advanced docs indicated how to update the failure domain. Now the failure domain can simply be updated on the pool CR.
Checklist:
make codegen
) has been run to update object specifications, if necessary.