Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support evict annotation for namespaces #176

Closed
damemi opened this issue Mar 25, 2021 · 7 comments
Closed

Support evict annotation for namespaces #176

damemi opened this issue Mar 25, 2021 · 7 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@damemi
Copy link

damemi commented Mar 25, 2021

The operator currently auto-excludes all namespaces with openshift-* or kube-* prefixes from eviction. This makes sense to prevent users from breaking their cluster with the Descheduler, and those are reserved prefixes so users should not be able to create their own namespaces that match the pattern.

However, it may be useful for administrators and support to be able to include certain system namespaces for rebalancing (for example, during and after upgrades). Perhaps we could add a check for the same descheduler.alpha.kubernetes.io/evict annotation on namespaces before assuming they should be excluded. Pods within that namespace would still be subject to the same eviction rules

cc @ingvagabund wdyt?

@ingvagabund
Copy link
Member

ingvagabund commented Apr 9, 2021

We might also introduce a new profile which will do this right before/after the upgrade if this is the only use case. Make it part of the upgrade itself (pre/post-upgrade steps).

@ravitri
Copy link
Member

ravitri commented Apr 20, 2021

@damemi , @ingvagabund - In addition to the upgrades, the other scenarios I can think of which could possibly disrupt existing core workloads is node/machine replacements/addition/removal. For now, the worker and infra nodes are supported as part of this but in future will be extended to control plane too (subject to change). Another scenario can be a MachineConfig change to cause rolling reboot of nodes in the respective MachineConfigPool (triggered change which is not an upgrade).

Considering above scenarios as well, I think we might need to stretch openshift* namespace inclusion criteria further. WDYT?

@wking
Copy link
Member

wking commented May 21, 2021

Descheduler is very polite by using the eviction API. We have efforts like openshift/origin#26160 underway to improve our PDB coverage. If folks using the eviction API can cause excessive disruption in the OpenShift core, that sounds like it's really a missing/miscongured PDB situation to me. I expect we have some bugs like that today. Hopefully openshift/origin#26160 turns them up, and we get them fixed. Once we get them fixed, can we pivot to having the descheduler cover the kube-system and openshift-* namespaces by default? Because "don't ask about evicting us, we don't handle that well" doesn't seem like a good long-term plan.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 20, 2021
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 19, 2021
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this as completed Oct 19, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 19, 2021

@openshift-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

5 participants