-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Long-lived deployment: Triggering Descheduler with Events (SharedIndexInformer rules, endpoints, etc) #696
Comments
Hi @Dentrax, If there's any interest, feel free to read through some of the feedback there and pick this up as it's been on the back burner for a while. But, it would be great to get this rolling again. So, if you don't mind, we can close this issue and keep the discussion for this feature in one spot (#489) |
This is cool! Didn't notice that. Do you need any help on your PR? To lift & shift this feature, I can give a hand for this. You mentioned The motivation here is to make Descheduler run as scheduled as is + event watching with informers. As a caveat, we can not run as |
Hi ~ @damemi and @denkensk
Then we can create one interface for all strategys, and they can have run() method for default mode, and infomered method for informerd mode. and |
One another alternative way to trigger descheduler would be using custom endpoints as such: |
Kindly ping @damemi 🤞 (#488 (comment)) |
So sorry for the delay... I had actually started working on a response to this and got completely distracted! To answer your questions:
@Dentrax this seems like a fine option for me. I'm not entirely sure why I originally made the run mode design mutually exclusive. I think this may bring up some challenges such as a timed strategy running at the same time as an informed strategy, but this could probably be handled with good multithreading practices.
@JaneLiuL this is the general idea, yeah. And I think that we should definitely initially start with a basic list of informers (like just pod and node, as you mentioned). However, the implementation details are where this gets tricky. Not all strategies will need all the informers, and in the future new strategies might need different informers.
That's kind of how I was originally trying to address this need, by wrapping the existing default mode functions within a new interface. That interface can be defined with the generic informer functions that will be needed by all strategies, but the actual implementation of the interface can vary for each strategy by only calling upon the relevant informers.
@Dentrax this is an interesting idea, but not one that I think we should immediately pursue. I think it is safer to keep the descheduler reactive to objects and events that are known to the API server. allowing direct requests like this could bypass the need to know the "state" of the cluster and end up with conflicting, or unpredictable results. Ultimately I'd like to revisit my implementation, or if there are alternatives/cherry-picks that others would like to work on please feel free. I think we are converging on a similar basic idea for the design that was started there |
I’m concerned about how this would work out when adding multiple nodes. It might be nice to think about implementing a grace period. For example, if I were to add multiple nodes with a few seconds between each, it should wait for a while to ensure all nodes are in the equation. An initial delay (e.g 30s) after each join event is a way to resolve this issue. Queuing them will not cut it for our use case, considering it’s common for us to add 40-50 nodes. We could reset the countdown after each node join/update/delete event, and run the descheduler once the delay time of the last joined node is completed. |
Hey @damemi, thanks for clarifying. One point that I might miss is that I want to pass
At the end of the day, as @necatican pointed, we expect descheduler will run each strategy in case a new node joined/removed/updated, just like a cronjob. |
That's a good point, I think a configurable batch-delay would help the informed strategies run more efficiently and effectively for use cases regarding high frequency periods of updates. Regarding each strategy, I think this will still need to consider each strategy on a case-by-case basis. For simplicity and safety, I would prefer to roll this out in just one or two strategies to start (with the intent of adding more/all later) to give the design soak time. If we do that, then we won't need a policy-level This would set us up for a more backwards-friendly development pattern (adding new fields is always backward-compatible, but if at some point we decide to change or remove the policy-level |
+1, sounds good to me!
So we need to create a new
Some questions that might need some clarification:
PTAL @necatican |
Join and leave events are the obvious ones for sure, we also should consider adding support for status changes and labels/taints.
This could be a nice feature but I can not think about an instance where this is necessary.
How about Should we consider adding a check to determine the maximum allotted time? What happens if changes keep occurring in a cluster? |
Kind ping @damemi. Waiting your final command, boss. So we can get to it! |
Sorry for the delay @Dentrax |
Thank you so much for informing us! I'm definitely interested and want to take a look at that proposal. On the other hand, we'd like to send a PoC PR for this proposal if you want. It's obvious that there will be needs we haven't covered/proposed here for sure. At least, we can continue to work on this idea and get feedback/review after submitting a PR. |
/cc |
@Dentrax sure thing, it never hurts to send a poc PR to get some more detail discussion rolling. Thanks! |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
Hey @damemi @ingvagabund, Although we couldn't do a contribution to the Descheduler Framework migration, we followed closely what was done. Months passed since the initial proposal and internal code base has changed. We're running Deschedulers on high scale environments and still looking forward to this feature. Also, we want to get into it. In the discussion we had above, I mostly clarified my concerns about what and how we're going to do. The missing question is the where we are going to put logic in the current codebase. Plugins directory is currently brand-new. I'm not sure if this is proposal counts as a plugin. I think it is something should be placed in execution level rather than plugin level. I'm a bit confused and lost here. Can you please enlighten us about where should we write the actual logic in current codebase? Thanks! cc'ing @eminaktas for awareness |
@Dentrax thanks for bumping this. This is definitely one of the features we had in mind with the framework refactor. You're right that this will be an execution-level change rather than a "plugin". But, it will still need to be extendable/modifiable in similar ways to the plugins (for example, writing a new strategy plugin that needs to be triggered on custom events). In the old codebase there wasn't a good mechanism for wiring this kind of information through to the strategies that need it. For now, we should focus on migrating the existing code into the framework model so we don't overload ourselves. I think the big blocker for this will be migrating the internal code to a plugin-registry (right now the strategies are migrated to "plugins", but the internals are still basically just wrapping them in the old code). With that, we will be able to start working on an event registry that can be mapped to corresponding plugins. If you want to put together a preliminary design for how this could work in the new codebase, please do! Having this discussion now, while not a primary focus, will still help inform design decisions for the rest of the framework migration. |
Dropping a concern, so I don't forget! One small thing that worth to mention is that there could be slight time window between Node delete event and Descheduling cycle. There is a possibility to race condition between Scheduler and Descheduler: An example case is:
My idea here is that we can listen for scheduler to check if any ongoing scheduling event. (Don't know how) As soon as Node gets deleted, we should NOT trigger the descheduler; instead we can wait awhile: all scheduling stuff gets done (all missing Pods up & running) + grace period time. Does it make sense? 🤔 |
Is your feature request related to a problem? Please describe.
When I think about a problem I have that requires to take action when Descheduler does something my first target is one of the events that it triggers as follows: 1
(Just randomly filled these as an example, not meant to represent the use-cases.)
Describe the solution you'd like
To stay informed about when these events get triggered, we can use a primitive exposed by Kubernetes and the client-go called NewSharedIndexInformer, inside the cache package.
Describe alternatives you've considered
In current architecture, node.go already handle this concern when we schedule the Descheduler to run as cron. But the motivation here, consider if we set the Descheduler to run every hour or once a week. If we delete the service, added new node, etc. there will be a time-window between event started - last time descheduler ran. Which means we have to wait hours or days to next run.
What version of descheduler are you using?
descheduler version:
v0.22.1
Additional context
-
Footnotes
https://www.cncf.io/blog/2019/10/15/extend-kubernetes-via-a-shared-informer/ ↩
The text was updated successfully, but these errors were encountered: