Add circuit breaking (immediately scale to 0) on specific Trigger #3949

kencieszykowski · 2022-12-03T01:10:47Z

Proposal

Proposing a circuit breaking feature for Keda. Effectively, if a specific circuitBreak trigger crosses a threshold, it overrides the other triggers and scales the Deployment/ReplicaSet/StatefulSet down to 0.

This would be useful in situations where some microservice upstream or downstream releases bad data into the target process; effectively stopping the data transmission at the top of the funnel.

Possible recovery options could include:

Manual intervention (a redeploy or annotation deletion)
When the circuitBreak threshold is under the target value

Use-Case

For example, we have a Kafka stream worker that ingests messages and pushes them downstream to APIs and Seldon models. We have implemented a dead letter queue; and if the target rate of messages going to the dlq is above a certain threshold, we would like to stop the stream worker in its' tracks.

Anything else?

There are other ways to do this using Istio and various other tooling; but Keda would offer much more flexibility than what I have seen.

The text was updated successfully, but these errors were encountered:

JorTurFer · 2022-12-03T14:42:16Z

hi @kencieszykowski
KEDA already supports to pause the autoscaling, isn't this enough? I mean, you could add the annotations to the ScaledObjects based on the business logic that you have

kencieszykowski · 2022-12-05T22:05:23Z

hi @kencieszykowski KEDA already supports to pause the autoscaling, isn't this enough? I mean, you could add the annotations to the ScaledObjects based on the business logic that you have

Appreciate the link there-- I see it as a slightly different use-case. The issue with that is that it would require manual intervention (or middleware) to pause autonomously.

Ideally, there could be a specific Trigger/metric that we could monitor that would automatically inject this annotation into the ScaledObject. Manual remediation is just fine and expected.

We're trying to catch those times where metrics indicate poor performance (and the immediate need for a shutdown), and no human intervention is involved to stop things in their tracks.

zroubalik · 2023-01-02T11:24:15Z

I understand the request, I think that it is covered by these feature requests:
#2440
#3330
#3567

Am I right?

tomkerkhove · 2023-01-03T07:50:08Z

hi @kencieszykowski KEDA already supports to pause the autoscaling, isn't this enough? I mean, you could add the annotations to the ScaledObjects based on the business logic that you have

Appreciate the link there-- I see it as a slightly different use-case. The issue with that is that it would require manual intervention (or middleware) to pause autonomously.

💯, we should provide automation for this.

I understand the request, I think that it is covered by these feature requests: #2440 #3330 #3567

Am I right?

I don't think it does, because the ask is to scale to 0 if a given threshold is met. I'm not completely sure if the model should be based on scaler though. I do see why it could be an approach but typically you'd check a metric on something different from what you are scaling.

Taking the example in the request:

For example, we have a Kafka stream worker that ingests messages and pushes them downstream to APIs and Seldon models. We have implemented a dead letter queue; and if the target rate of messages going to the dlq is above a certain threshold, we would like to stop the stream worker in its' tracks.

Here the circuit is meant to be broken if the DLQ is getting messages, while the app typically is interested in the main queue.

So I'm wondering if we should support multiple providers such as Prometheus metric, metric API and potentially talking to dependencies but not through the trigger section.

stale · 2023-03-04T08:00:32Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

kencieszykowski · 2023-03-11T00:34:12Z

Here the circuit is meant to be broken if the DLQ is getting messages, while the app typically is interested in the main queue.

So I'm wondering if we should support multiple providers such as Prometheus metric, metric API and potentially talking to dependencies but not through the trigger section.

Most definitely-- I think rolling out to the individual Scalers could happen over time, but Prometheus, metric API and Datadog would provide a huge benefit here.

Since posting this I've come up with a slightly different workflow using KEDA that automates a circuit breaker effect, but it seems like all of the pieces are in place within Keda to allow something like this. Not a great feature for a large amount of people, but for some, I think it would be a big feature.

tomkerkhove · 2023-03-13T09:08:17Z

Since posting this I've come up with a slightly different workflow using KEDA that automates a circuit breaker effect, but it seems like all of the pieces are in place within Keda to allow something like this. Not a great feature for a large amount of people, but for some, I think it would be a big feature.

Interesting, so how does it work then?

stale · 2023-05-12T09:14:19Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2023-05-19T12:08:56Z

This issue has been automatically closed due to inactivity.

fgheysels · 2023-07-17T11:12:52Z

I think this is somewhat related with this issue: #1932

I think it would be quite usefull to scale down a workload to avoid doing a lot of work that is most likely going to fail.
For instance you have a process working on messages in a queue. If you need to call a certain dependency for every processed message, and that dependency is down, you'll end up having a huge amount of messages being deadlettered.
A circuit-breaker rule on the DLQ which scales down to 0 replica's might prevent this. Then the question is: when should it scale back up ?

tomkerkhove · 2023-07-17T17:36:00Z

I agree

JorTurFer · 2023-07-17T19:44:46Z

I think that both can be done with this feature: #4583 (not merged yet)
If deps are internals (inside k8s), you could use kubernetes-workload to get how many of them are ready and use them in the formula to scale to 0 if they are down. Under the hood that feature uses https://github.com/antonmedv/expr, so whatever that can be calculated with it, will be doable

tomkerkhove · 2023-07-18T09:09:08Z

That's fine, but I personally believe this is only the first step and we should improve it further so that nobody has to writes expressions for this scenario

JorTurFer · 2023-07-18T12:19:52Z

I wouldn't add 2 ways for doing the same tbh, I think that documenting the way to go with examples is better than having more than 1 way to do the same (having to support all of them)

tomkerkhove · 2023-07-20T10:05:57Z

One can build on the other. End-users writing custom formula's should really be a last resort as it's not straight forward unless what you know you are doing. Because of that, we should streamline the experience for end-users and maybe use custom formula's under the hood (which end-user does not care about, just get it done).

kencieszykowski added feature-request All issues for new features that have not been committed to needs-discussion labels Dec 3, 2022

stale bot added the stale All issues that are marked as stale due to inactivity label Mar 4, 2023

stale bot removed the stale All issues that are marked as stale due to inactivity label Mar 11, 2023

stale bot added the stale All issues that are marked as stale due to inactivity label May 12, 2023

stale bot closed this as completed May 19, 2023

tomkerkhove reopened this Jul 20, 2023

stale bot removed the stale All issues that are marked as stale due to inactivity label Jul 20, 2023

tomkerkhove added the stale-bot-ignore All issues that should not be automatically closed by our stale bot label Jul 20, 2023

tomkerkhove mentioned this issue Aug 10, 2023

Add Trigger to stop scaling #4823

Closed

tomkerkhove mentioned this issue Feb 23, 2024

Need Circuit breaker functionality for scaler #5529

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add circuit breaking (immediately scale to 0) on specific Trigger #3949

Add circuit breaking (immediately scale to 0) on specific Trigger #3949

kencieszykowski commented Dec 3, 2022

JorTurFer commented Dec 3, 2022

kencieszykowski commented Dec 5, 2022

zroubalik commented Jan 2, 2023 •

edited

tomkerkhove commented Jan 3, 2023

stale bot commented Mar 4, 2023

kencieszykowski commented Mar 11, 2023

tomkerkhove commented Mar 13, 2023

stale bot commented May 12, 2023

stale bot commented May 19, 2023

fgheysels commented Jul 17, 2023

tomkerkhove commented Jul 17, 2023

JorTurFer commented Jul 17, 2023 •

edited

tomkerkhove commented Jul 18, 2023

JorTurFer commented Jul 18, 2023

tomkerkhove commented Jul 20, 2023

Add circuit breaking (immediately scale to 0) on specific Trigger #3949

Add circuit breaking (immediately scale to 0) on specific Trigger #3949

Comments

kencieszykowski commented Dec 3, 2022

Proposal

Use-Case

Anything else?

JorTurFer commented Dec 3, 2022

kencieszykowski commented Dec 5, 2022

zroubalik commented Jan 2, 2023 • edited

tomkerkhove commented Jan 3, 2023

stale bot commented Mar 4, 2023

kencieszykowski commented Mar 11, 2023

tomkerkhove commented Mar 13, 2023

stale bot commented May 12, 2023

stale bot commented May 19, 2023

fgheysels commented Jul 17, 2023

tomkerkhove commented Jul 17, 2023

JorTurFer commented Jul 17, 2023 • edited

tomkerkhove commented Jul 18, 2023

JorTurFer commented Jul 18, 2023

tomkerkhove commented Jul 20, 2023

zroubalik commented Jan 2, 2023 •

edited

JorTurFer commented Jul 17, 2023 •

edited