Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[node-agent] Restrict Node watches via label/field selector to prevent watching all nodes #9672

Merged
merged 10 commits into from
Apr 29, 2024

Conversation

rfranzke
Copy link
Member

How to categorize this PR?

/area cost scalability
/kind enhancement

What this PR does / why we need it:
Today, each gardener-node-agent watches all Nodes which is quite costly in terms of network I/O. Actually, each of them has two watches for Nodes: one with the full object, and one with metadata-only.

This is because of #8885. In this PR, we added a linear mapping approach for reconciliation delay (to prevent too many kubelet restarts at the same time, e.g. when a shoot changes its patch Kubernetes version). Back then, we were very keen on using metadata-only for this which made the approach acceptable.
However, #8786 was developed in parallel, and in this PR, the full Node object was required:

node := &corev1.Node{}
if err := r.Client.Get(ctx, request.NamespacedName, node); err != nil {

This invalidated the approach of listing all nodes in #8885 since it's simply too costly.

With this PR, we introduce a new controller in gardener-resource-manager which still follows the linear mapping approach for computing the reconciliation delays for nodes. It adds its computation result to the node annotations with key node-agent.gardener.cloud/reconciliation-delay. gardener-node-agent can now simply read this value and use it for its reconciliation delay. This allows us to restrict its WATCH to only only the Node it is responsible for (with the help of label/field selectors).

In addition, I removed the metadata-only usage of Node objects in gardener-node-agent to prevent having two watches for Nodes as explained above. Since at least one controller requires the full object, we can always work with the full object only.

Which issue(s) this PR fixes:
Follow-up of #8885 and #8786
Related to #8023

Special notes for your reviewer:
/cc @oliver-goetz

Release note:

`gardener-node-agent` no longer watches all `Node`s in the cluster but restricts to only the `Node` it is responsible for (with the help of label/field selectors). This should lead to a significant reduction of network I/O, especially for shoot clusters with many nodes.

@gardener-prow gardener-prow bot added area/cost Cost related area/scalability Scalability related kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 25, 2024
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 26, 2024
@oliver-goetz
Copy link
Member

/assign

Copy link
Member

@oliver-goetz oliver-goetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice PR, thanks 🚀

pkg/resourcemanager/apis/config/types.go Show resolved Hide resolved
Copy link
Member

@oliver-goetz oliver-goetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Apr 26, 2024
Copy link
Contributor

gardener-prow bot commented Apr 26, 2024

LGTM label has been added.

Git tree hash: 0362278f7a0f456397faf8fd6c81bfd5ef4bfebc

Copy link
Contributor

gardener-prow bot commented Apr 26, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: oliver-goetz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gardener-prow gardener-prow bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 26, 2024
…/resourcemanager/controller/node/criticalcomponents`

We will introduce another controller for nodes, so let's prepare the folder structure
The health check controller in `node-agent` needs the full `Node` object anyways: https://github.com/gardener/gardener/blob/2cdfaa5545f3fc07e93997d6bb52fafdcc57d4ef/pkg/nodeagent/controller/healthcheck/reconciler.go#L45-L46
Hence, we can always work with the full object. Currently/without this, `node-agent` starts two watches for `Node`s: one with the full object, another one with metadata-only.
When the node name is known, we can use a field selector for its name. Otherwise, we use a label selector for the hostname.
@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2024
@gardener-prow gardener-prow bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 29, 2024
@oliver-goetz
Copy link
Member

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label Apr 29, 2024
Copy link
Contributor

gardener-prow bot commented Apr 29, 2024

LGTM label has been added.

Git tree hash: d1fb9e1fb3dea815fb373db25e2c60115f834590

@gardener-prow gardener-prow bot merged commit f82a1e6 into gardener:master Apr 29, 2024
17 checks passed
@rfranzke rfranzke deleted the gna-node-watch branch April 29, 2024 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/cost Cost related area/scalability Scalability related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants