Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keda operator restarting after failing to renew leader election #4212

Closed
sinanub18 opened this issue Feb 8, 2023 · 9 comments
Closed

Keda operator restarting after failing to renew leader election #4212

sinanub18 opened this issue Feb 8, 2023 · 9 comments
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity

Comments

@sinanub18
Copy link

Report

I0207 13:31:36.792715 1 leaderelection.go:283] failed to renew lease operator/operator.keda.sh: timed out waiting for the condition {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Stopping and waiting for non leader election runnables"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Stopping and waiting for leader election runnables"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Stopping and waiting for caches"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Stopping and waiting for webhooks"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Wait completed, proceeding to shutdown the manager"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"triggerauthentication","controllerGroup":"keda.sh","controllerKind":"TriggerAuthentication"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"scaledjob","controllerGroup":"keda.sh","controllerKind":"ScaledJob"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"clustertriggerauthentication","controllerGroup":"keda.sh","controllerKind":"ClusterTriggerAuthentication"} {"level":"info","ts":"2023-02-07T13:31:36Z","msg":"All workers finished","controller":"clustertriggerauthentication","controllerGroup":"keda.sh","controllerKind":"ClusterTriggerAuthentication"}

Expected Behavior

Keda operator container should not restart

Actual Behavior

Keda operator containers gets restarted after every 15-20hrs

Steps to Reproduce the Problem

  1. Deploy Keda
  2. Observe the pod after a day
  3. check the logs of previous keda-operator container

Logs from KEDA operator

example

KEDA Version

2.9.0

Kubernetes Version

1.25

Platform

None

Scaler Details

Kafka Scaler

Anything else?

No response

@sinanub18 sinanub18 added the bug Something isn't working label Feb 8, 2023
@JorTurFer
Copy link
Member

Hello
Sadly, as they explained in the upstream repo, that's the desired behavior. Something produces a renewal fail (api server slow response, network issues, whatever).

All we can do from KEDA side is to bring the option to configure the leader election values, maybe you could try extending them to reduce the chances of renewal failures.

@sinanub18
Copy link
Author

@JorTurFer What will be standard recommended values for below config if the scaledobject are close to 100

KEDA_OPERATOR_LEADER_ELECTION_LEASE_DURATION
KEDA_OPERATOR_LEADER_ELECTION_RENEW_DEADLINE
KEDA_OPERATOR_LEADER_ELECTION_RETRY_PERIOD 
KEDA_METRICS_LEADER_ELECTION_LEASE_DURATION 
KEDA_METRICS_LEADER_ELECTION_RENEW_DEADLINE 
KEDA_METRICS_LEADER_ELECTION_RETRY_PERIOD

@JorTurFer
Copy link
Member

Hey,
This isn't related with the sacled object (this is other different api) , you can just try setting 2 times current values.
BTW, what k8s cluster are you using? EKS? AKS? GKE?

@JorTurFer
Copy link
Member

okey, I have just noticed that there aren't default values 🤦
They are:
KEDA_OPERATOR_LEADER_ELECTION_LEASE_DURATION=15s
KEDA_OPERATOR_LEADER_ELECTION_RENEW_DEADLINE=10s
KEDA_OPERATOR_LEADER_ELECTION_RETRY_PERIOD=2s
KEDA_METRICS_LEADER_ELECTION_LEASE_DURATION=15s
KEDA_METRICS_LEADER_ELECTION_RENEW_DEADLINE=10s
KEDA_METRICS_LEADER_ELECTION_RETRY_PERIOD=2s

@sinanub18
Copy link
Author

Hey, This isn't related with the sacled object (this is other different api) , you can just try setting 2 times current values. BTW, what k8s cluster are you using? EKS? AKS? GKE?

K8s cluster is GKE

@JorTurFer
Copy link
Member

Have you tried modifying the default values? Does it work now?

@sinanub18
Copy link
Author

Have you tried modifying the default values? Does it work now?

@JorTurFer I am yet to apply these changes in next iteration, will confirm you if this works

@stale
Copy link

stale bot commented May 15, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label May 15, 2023
@stale
Copy link

stale bot commented May 23, 2023

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity
Projects
Archived in project
Development

No branches or pull requests

2 participants