Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: indexer cache error when default evictor is re-initialized #1452

Merged
merged 1 commit into from
Jul 6, 2024

Conversation

a7i
Copy link
Contributor

@a7i a7i commented Jun 28, 2024

It appears that DefaultEvictor is re-initialized, which causes a conflict issue when adding the same indexer again.

closes #1440

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 28, 2024
Signed-off-by: Amir Alavi <amiralavi7@gmail.com>
@a7i
Copy link
Contributor Author

a7i commented Jun 28, 2024

/cc @ingvagabund

any ideas on how to make this better? to be honest, I'm not sure why DefaultEvictor is getting re-initialized. But perhaps this is possible in case of multiple profiles

@ingvagabund ingvagabund self-assigned this Jul 1, 2024
@ingvagabund
Copy link
Contributor

I'm not sure why DefaultEvictor is getting re-initialized. But perhaps this is possible in case of multiple profiles

Yeah. Each profile gets initialized separately. The same shared informer, a new instance of the default evictor.

@ingvagabund
Copy link
Contributor

Some thoughts:

  • what if two plugins register the same indexer? We should prefix each indexer with a plugin name (some time in the future)
  • should we register generic indexer so the default plugins do not need to register it on their own? This will not resolve the current issue. Yet, it will get rid of the error
  • should we create a separate shared informer for each profile? We don't limit the number of profiles. We probably should, e.g. to 100 or 1000 and extend later when needed?

@ingvagabund
Copy link
Contributor

ingvagabund commented Jul 1, 2024

The current implementation creates a new profile in each descheduling cycle. Yet, the created indexer never gets removed from the shared informer. Making every other descheduling cycle to error when minReplicas is set to a non-zero number.

@ingvagabund
Copy link
Contributor

ingvagabund commented Jul 1, 2024

Worth checking if the code can be cleanly refactored to initialize each plugin only once per the whole descheduler run. The primary motivation for re-initializing each plugin was to start clean. If we can guarantee no plugin will keep a state then this will be a nice improvement to the code base.

@a7i
Copy link
Contributor Author

a7i commented Jul 1, 2024

what if two plugins register the same indexer? We should prefix each indexer with a plugin name (some time in the future)

Why is that necessarily? This is no plugin logic inside the shared informer. And ideally we share the shared informer.

should we register generic indexer so the default plugins do not need to register it on their own? This will not resolve the current issue. Yet, it will get rid of the error

we certainly could, but we then need to compare it against the current implementation. I personally prefer this PR but open to feedback from others. For example, no need to initialize the shared informer for policies that do not need minReplicas check.

should we create a separate shared informer for each profile? We don't limit the number of profiles. We probably should, e.g. to 100 or 1000 and extend later when needed?

is there a reason for this? ideally we share it to reduce memory footprint.

@a7i
Copy link
Contributor Author

a7i commented Jul 1, 2024

Worth checking if the code can be cleanly refactored to initialize each plugin only once per the whole descheduler run. The primary motivation for re-initializing each plugin was to start clean. If we can guarantee no plugin will keep a state then this will be a nice improvement to the code base.

I think the issue is that the params/config for each can be different. Correct?

@ingvagabund
Copy link
Contributor

ingvagabund commented Jul 1, 2024

We could inject the params/config as "data" (alongside nodes and pods). Assuming the args do not interfere with a plugin initialization. Though, that will change the plugin API.

@ingvagabund
Copy link
Contributor

I think the issue is that the params/config for each can be different. Correct?

You are right. That was another motivation for it.

@ingvagabund
Copy link
Contributor

Note: We share a single shared informer through all the plugins. As long as the plugins are part of this repository we have a full control over how each indexer gets created. We don't have such control over custom plugins. Thus, we need to rely on good citizenship principles and assume anyone assembling their own deschedulers from the descheduling framework will take the extra step of making sure all the extra plugins create their indexes properly. Plugin owners can already define as many indexers as they like without following any good practices. The same holds for updating items in either of the informer caches without making a deep copy.

@ingvagabund
Copy link
Contributor

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 6, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ingvagabund

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 6, 2024
@k8s-ci-robot k8s-ci-robot merged commit 7657345 into kubernetes-sigs:master Jul 6, 2024
10 checks passed
@a7i a7i deleted the defaultevictor-reinit branch July 28, 2024 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to create a profile" err="unable to build DefaultEvictor plugin
3 participants