Keda operator restarting after failing to renew leader election #4212

sinanub18 · 2023-02-08T12:14:07Z

Report

I0207 13:31:36.792715 1 leaderelection.go:283] failed to renew lease operator/operator.keda.sh: timed out waiting for the condition {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Stopping and waiting for non leader election runnables"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Stopping and waiting for leader election runnables"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Stopping and waiting for caches"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Stopping and waiting for webhooks"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Wait completed, proceeding to shutdown the manager"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"triggerauthentication","controllerGroup":"keda.sh","controllerKind":"TriggerAuthentication"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"scaledjob","controllerGroup":"keda.sh","controllerKind":"ScaledJob"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"clustertriggerauthentication","controllerGroup":"keda.sh","controllerKind":"ClusterTriggerAuthentication"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"All workers finished","controller":"clustertriggerauthentication","controllerGroup":"keda.sh","controllerKind":"ClusterTriggerAuthentication"}

Expected Behavior

Keda operator container should not restart

Actual Behavior

Keda operator containers gets restarted after every 15-20hrs

Steps to Reproduce the Problem

Deploy Keda
Observe the pod after a day
check the logs of previous keda-operator container

Logs from KEDA operator

example

KEDA Version

2.9.0

Kubernetes Version

1.25

Platform

None

Scaler Details

Kafka Scaler

Anything else?

No response

The text was updated successfully, but these errors were encountered:

JorTurFer · 2023-02-08T23:16:53Z

Hello
Sadly, as they explained in the upstream repo, that's the desired behavior. Something produces a renewal fail (api server slow response, network issues, whatever).

All we can do from KEDA side is to bring the option to configure the leader election values, maybe you could try extending them to reduce the chances of renewal failures.

sinanub18 · 2023-02-09T03:36:48Z

@JorTurFer What will be standard recommended values for below config if the scaledobject are close to 100

KEDA_OPERATOR_LEADER_ELECTION_LEASE_DURATION
KEDA_OPERATOR_LEADER_ELECTION_RENEW_DEADLINE
KEDA_OPERATOR_LEADER_ELECTION_RETRY_PERIOD 
KEDA_METRICS_LEADER_ELECTION_LEASE_DURATION 
KEDA_METRICS_LEADER_ELECTION_RENEW_DEADLINE 
KEDA_METRICS_LEADER_ELECTION_RETRY_PERIOD

JorTurFer · 2023-02-09T07:18:20Z

Hey,
This isn't related with the sacled object (this is other different api) , you can just try setting 2 times current values.
BTW, what k8s cluster are you using? EKS? AKS? GKE?

JorTurFer · 2023-02-09T07:42:17Z

okey, I have just noticed that there aren't default values 🤦
They are:
KEDA_OPERATOR_LEADER_ELECTION_LEASE_DURATION=15s
KEDA_OPERATOR_LEADER_ELECTION_RENEW_DEADLINE=10s
KEDA_OPERATOR_LEADER_ELECTION_RETRY_PERIOD=2s
KEDA_METRICS_LEADER_ELECTION_LEASE_DURATION=15s
KEDA_METRICS_LEADER_ELECTION_RENEW_DEADLINE=10s
KEDA_METRICS_LEADER_ELECTION_RETRY_PERIOD=2s

sinanub18 · 2023-02-09T10:17:31Z

Hey, This isn't related with the sacled object (this is other different api) , you can just try setting 2 times current values. BTW, what k8s cluster are you using? EKS? AKS? GKE?

K8s cluster is GKE

JorTurFer · 2023-03-13T11:00:17Z

Have you tried modifying the default values? Does it work now?

sinanub18 · 2023-03-16T09:52:40Z

Have you tried modifying the default values? Does it work now?

@JorTurFer I am yet to apply these changes in next iteration, will confirm you if this works

stale · 2023-05-15T14:48:34Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2023-05-23T12:01:38Z

This issue has been automatically closed due to inactivity.

sinanub18 added the bug Something isn't working label Feb 8, 2023

stale bot added the stale All issues that are marked as stale due to inactivity label May 15, 2023

stale bot closed this as completed May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keda operator restarting after failing to renew leader election #4212

Keda operator restarting after failing to renew leader election #4212

sinanub18 commented Feb 8, 2023

JorTurFer commented Feb 8, 2023

sinanub18 commented Feb 9, 2023

JorTurFer commented Feb 9, 2023

JorTurFer commented Feb 9, 2023

sinanub18 commented Feb 9, 2023

JorTurFer commented Mar 13, 2023

sinanub18 commented Mar 16, 2023

stale bot commented May 15, 2023

stale bot commented May 23, 2023

Keda operator restarting after failing to renew leader election #4212

Keda operator restarting after failing to renew leader election #4212

Comments

sinanub18 commented Feb 8, 2023

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

JorTurFer commented Feb 8, 2023

sinanub18 commented Feb 9, 2023

JorTurFer commented Feb 9, 2023

JorTurFer commented Feb 9, 2023

sinanub18 commented Feb 9, 2023

JorTurFer commented Mar 13, 2023

sinanub18 commented Mar 16, 2023

stale bot commented May 15, 2023

stale bot commented May 23, 2023