Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Skip reconcile if override configmap is unchanged #13652

Merged
merged 1 commit into from Feb 5, 2024

Conversation

travisn
Copy link
Member

@travisn travisn commented Jan 30, 2024

If the configmap rook-config-override is empty, there is no need to trigger the reconcile to update the ceph daemons. This configmap update is causing unnecessary reconciles periodically in some clusters even when it is empty.

Issue resolved by this Pull Request:
Resolves #13478

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

// Only reconcile on rook-config-override CM changes
isCMTConfigOverride := isCMTConfigOverride(e.ObjectNew)
// Only reconcile on rook-config-override CM changes if the configmap is not empty
isCMTConfigOverride, configmapIsEmpty := isCMTConfigOverride(e.ObjectNew)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this. What if a user has a non-empty CM and then deletes the content? In that case, wouldn't we still want to reconcile to clear out what was previously set?

Copy link
Member

@BlaineEXE BlaineEXE Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented this on the issue as well, buthere is a little bit more to consider as well:

... this is a feature of Controller Runtime. It sets all resources to re-reconcile every 10 hours plus/minus a jitter factor. Reconciling periodically means that an operator can make sure that the current state of the system doesn't drift over time when there are no user changes to configs.

I don't think we have seen issues where users complain of drift over time with Rook, which is good. I think it's safe to make this change. But this may be an important note for the future if users ever start seeing system drift over time in otherwise stable clusters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this. What if a user has a non-empty CM and then deletes the content? In that case, wouldn't we still want to reconcile to clear out what was previously set?

In that case the workaround would be to restart the operator. This seems like enough of a corner case to justify the change if it helps the common case skip the unnecessary reconcile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't think we want to restart the dashboard unnecessarily, and this is definitely an improvement to help that. It seemed like relevant info to note what is causing the reconcile.

I found a thread today where a controller-runtime main dev offered the opinion that the resync risks masking bugs, so not resyncing without user input is actually a good thing.

The problem was that nobody actually understood what this knob did -- it doesn't relist from the API server (which is what most people thought it did). All it does is force a re-reconcile of everything in the cache. Generally, you probably don't want to do this -- it's often expensive, it'll mask subtle bugs, and it's usually not necessary. It's actually probably fine to set this to never, except it might save you in production if you've got bugs in your reconcile logic because you didn't write a level-based controller.

(kubernetes-sigs/controller-runtime#521 (comment))

@@ -747,15 +750,23 @@ func isDeployment(obj runtime.Object, appName string) bool {
return false
}

func isCMTConfigOverride(obj runtime.Object) bool {
func isCMTConfigOverride(obj runtime.Object) (bool, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about passing both the new and existing object to this function and compare the data part of the config map?

If data is same, then don't reconcile, else reconcile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'll make that change to help catch the case of it being changed to empty. Not sure it's worth a deep comparison of the map, but will take a look.

Copy link
Member Author

@travisn travisn Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, using reflect.DeepEqual() keeps it very simple and gives a full comparison

If the configmap rook-config-override is empty,
there is no need to trigger the reconcile to update
the ceph daemons. This configmap update is causing
unnecessary reconciles periodically in some clusters
even when it is empty.

Signed-off-by: travisn <tnielsen@redhat.com>
@travisn travisn requested a review from sp98 February 2, 2024 17:49
@sp98 sp98 merged commit f5fcf96 into rook:master Feb 5, 2024
49 of 50 checks passed
@travisn travisn deleted the cm-skip-reconcile branch February 5, 2024 14:36
@travisn travisn changed the title core: Skip reconcile if override configmap is empty core: Skip reconcile if override configmap is unchanged Feb 5, 2024
travisn added a commit that referenced this pull request Feb 6, 2024
core: Skip reconcile if override configmap is empty (backport #13652)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ceph cluster reconciling triggers every 9h10m
4 participants