Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix addon controller reconciling too frequently #13252

Merged
merged 2 commits into from
Apr 9, 2024

Conversation

xrstf
Copy link
Contributor

@xrstf xrstf commented Apr 8, 2024

What this PR does / why we need it:
In #4773 we tried to make the addon controller reconcile less frequently. The logic was to only react to Cluster changes if the AddonControllerReconcilingSuccess condition changed. The only controller setting this condition is the addon controller itself (looking back at the original PR, I don't really understand it... but I was young and dumber).

However this condition on the Cluster object isn't really saying much. If you have 3 addons, one of them fails, it is basically random which status the AddonControllerReconcilingSuccess on the Cluster object will have.

Even worse: Suppose all your addons are happy and healthy. Now you change a field like the Cluster Owner or any other field that is part of the addon TemplateData. Even though you changed the Cluster object, no reconciliation will happen because neither the CNI values nor the condition has changed.

Additionally, Addon objects are reconciled even when their status changes. This isn't much of a problem right now, because there is only 1 Condition and it only gets set exactly once and never heartbeats, but if the status is extended (an I have a branch ready for that 馃榿 ), then status changes on an addon should also be ignored.

To that effect, this PR adjusts the watches:

  • Clusters are reconciled when the TemplateData we derive from them changed. TemplateData is the only thing that could influence how an addon is rendered. Note that external resources, like the kubeconfig, are not taken into account: We'd need to store somehow the revision of each related object, so we can diff those as well (recursively) and at that point we begin to re-implement the Applications feature. So for addons I chose the simple approach of relying on the auto-resync interval of controller-runtime to reconcile all addons after a number of hours, regardless of state.
  • Addons are only reconciled if their generation is changed, i.e. not just the status.
  • I also noticed some diffs where we removed the UID we set ourselves. In Addons, the cluster ref doesn't care about the UID and it should be left blank. I adjusted the code a bit to ensure we ourselves do not set it, saving one more reconciliation per addon.

What type of PR is this?
/kind cleanup

Does this PR introduce a user-facing change? Then add your Release Note here:

Addons reconciliation is triggered more consistently for changes to Cluster objects, reducing the overall number of unnecessary addon reconciliations.

Documentation:

NONE

@kubermatic-bot kubermatic-bot added docs/none Denotes a PR that doesn't need documentation (changes). release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 8, 2024
@xrstf xrstf self-assigned this Apr 8, 2024
@xrstf xrstf requested a review from embik April 8, 2024 15:01
@xrstf
Copy link
Contributor Author

xrstf commented Apr 9, 2024

/retest

Copy link
Member

@embik embik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@kubermatic-bot kubermatic-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 9, 2024
@kubermatic-bot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 7c0daa622e1bcdf674ef842a5a1661239bc8f163

@kubermatic-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: embik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubermatic-bot kubermatic-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 9, 2024
@xrstf
Copy link
Contributor Author

xrstf commented Apr 9, 2024

/retest

@kubermatic-bot kubermatic-bot merged commit 4a0ef6f into kubermatic:main Apr 9, 2024
19 checks passed
@kubermatic-bot kubermatic-bot added this to the KKP 2.26 milestone Apr 9, 2024
@xrstf xrstf deleted the fix-addon-reconciles branch April 9, 2024 10:39
@xrstf
Copy link
Contributor Author

xrstf commented Apr 10, 2024

/cherrypick release/v2.25

@kubermatic-bot
Copy link
Contributor

@xrstf: new pull request created: #13262

In response to this:

/cherrypick release/v2.25

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@xrstf
Copy link
Contributor Author

xrstf commented Apr 10, 2024

/cherrypick release/v2.24

@kubermatic-bot
Copy link
Contributor

@xrstf: new pull request created: #13263

In response to this:

/cherrypick release/v2.24

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Denotes that all commits in the pull request have the valid DCO signoff message. docs/none Denotes a PR that doesn't need documentation (changes). kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants