KEDA treats un-paused ScaledObject as Paused when restart from down #5490

tr1et · 2024-02-07T09:28:33Z

Report

We are using KEDA 2.12.1 on an Azure Kubernetes Service version 1.27.3. When keda-operator got an issue, down and restarted, it flag the previously un-paused ScaledObjects as paused and scaled it to 0 pods instead of keeping it running with 1 pod (set in minReplicasCount)

Expected Behavior

It should keep the pods running when restarted.

Actual Behavior

It scaled the Deployments to 0 pods and kill all running pods of those deployments.

Steps to Reproduce the Problem

Pause a ScaledObject: kubectl annotate scaledobject "$deployment" autoscaling.keda.sh/paused-replicas="0" -n "$NAMESPACE" --overwrite
Un-pause a ScaledObject: kubectl annotate scaledobject "$deployment" autoscaling.keda.sh/paused-replicas- -n "$NAMESPACE" --overwrite
"Down" the keda-operator

Logs from KEDA operator

Service names are masked.
Operator down logs:

2024-02-06T20:06:25ZERRORReconciler error{"controller": "cert-rotator", "object": {"name":"kedaorg-certs","namespace":"keda"}, "namespace": "keda", "name": "kedaorg-certs", "r
econcileID": "cd95ed0d-c848-4a17-8b1a-d4b2c21724e4", "error": "Operation cannot be fulfilled on apiservices.apiregistration.k8s.io \"v1beta1.external.metrics.k8s.io\": the object has been mo
dified; please apply your changes to the latest version and try again"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:329
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/workspace/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227
2024-02-06T20:06:25ZINFOcert-rotationno cert refresh needed
2024-02-06T20:06:25ZINFOcert-rotationEnsuring CA cert{"name": "keda-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration", "name": "ked
a-admission", "gvk": "admissionregistration.k8s.io/v1, Kind=ValidatingWebhookConfiguration"}
2024-02-06T20:06:33ZINFOcert-rotationEnsuring CA cert{"name": "v1beta1.external.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService", "name": "v1beta1.exte
rnal.metrics.k8s.io", "gvk": "apiregistration.k8s.io/v1, Kind=APIService"}
E0206 20:06:35.635590       1 leaderelection.go:332] error retrieving resource lock keda/operator.keda.sh: Get "https://10.8.0.1:443/apis/coordination.k8s.io/v1/namespaces/keda/leases/operat
or.keda.sh": context deadline exceeded
I0206 20:06:35.635622       1 leaderelection.go:285] failed to renew lease keda/operator.keda.sh: timed out waiting for the condition
2024-02-06T20:06:35ZERRORsetupproblem running manager{"error": "leader election lost"}
main.main
/workspace/cmd/operator/main.go:300
runtime.main
/usr/local/go/src/runtime/proc.go:250

Operator restarted logs:

2024-02-06T20:06:37ZINFOsetupStarting manager
2024-02-06T20:06:37ZINFOsetupKEDA Version: 2.12.1
2024-02-06T20:06:37ZINFOsetupGit Commit: dc76ca70f19c22e8f0c806f84d95256d771f3dc9
2024-02-06T20:06:37ZINFOsetupGo Version: go1.20.8
2024-02-06T20:06:37ZINFOsetupGo OS/Arch: linux/amd64
2024-02-06T20:06:37ZINFOsetupRunning on Kubernetes 1.27{"version": "v1.27.3"}
2024-02-06T20:06:37ZINFOstarting server{"kind": "health probe", "addr": "[::]:8081"}
I0206 20:06:37.889130       1 leaderelection.go:250] attempting to acquire leader lease keda/operator.keda.sh...
2024-02-06T20:06:37ZINFOcontroller-runtime.metricsStarting metrics server
2024-02-06T20:06:37ZINFOcontroller-runtime.metricsServing metrics server{"bindAddress": ":8080", "secure": false}
I0206 20:06:56.178956       1 leaderelection.go:260] successfully acquired lease keda/operator.keda.sh
...
2024-02-06T20:06:58ZINFOscaleexecutorSuccessfully scaled target to paused replicas count{"scaledobject.Name": "a-service-green", "scaledObject.Namespace": "prd", "scaleTarget.Name": "a-service-green", "paused replicas": 0}
2024-02-06T20:06:58ZINFOscaleexecutorSuccessfully scaled target to paused replicas count{"scaledobject.Name": "b-service-green", "scaledObject.Namespace": "prd", "scaleTarget.Name": "b-service-green", "paused replicas": 0}
2024-02-06T20:06:58ZINFOscaleexecutorSuccessfully scaled target to paused replicas count{"scaledobject.Name": "c-service-green", "scaledObject.Namespace": "prd", "scaleTarget.Name": "c-service-green", "paused replicas": 0}
2024-02-06T20:06:58ZINFOscaleexecutorSuccessfully scaled target to paused replicas count{"scaledobject.Name": "d-service-green", "scaledObject.Namespace": "prd", "scaleTarget.Name": "d-service-green", "paused replicas": 0}

KEDA Version

2.12.1

Kubernetes Version

1.27

Platform

Microsoft Azure

Scaler Details

Azure Service Bus, CPU, Memory

Anything else?

No response

The text was updated successfully, but these errors were encountered:

JorTurFer · 2024-02-07T14:06:17Z

Hi,
I've tested and I can't reproduce the issue :(

I don't think that this is related with KEDA because KEDA is stateless. It means that after a restart, KEDA pulls the ScaledObject from the API Server and works based on them.
Based on these lines:

I0206 20:06:35.635622       1 leaderelection.go:285] failed to renew lease keda/operator.keda.sh: timed out waiting for the condition
2024-02-06T20:06:35ZERRORsetupproblem running manager{"error": "leader election lost"}

I think that something has happened in the control plane side and maybe there have been a rollback in ETCD, producing that the annotation has been placed again (permanently of just for a while). Have you checked if the annotation is still there?

@tomkerkhove , Do you know if this is something the @tr1et can ask via support ticket?

tr1et · 2024-02-19T04:40:19Z

Thanks @JorTurFer , let me try to reproduce the issue on our dev environment.

I don't have the exact annotations info after the restart right now (we deleted and re-created the ScaledObject after finding the issue to make sure) but I remember that the "paused" annotations are not there.

stale · 2024-04-19T21:48:46Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2024-04-27T01:00:59Z

This issue has been automatically closed due to inactivity.

tr1et added the bug Something isn't working label Feb 7, 2024

stale bot added the stale All issues that are marked as stale due to inactivity label Apr 19, 2024

stale bot closed this as completed Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEDA treats un-paused ScaledObject as Paused when restart from down #5490

KEDA treats un-paused ScaledObject as Paused when restart from down #5490

tr1et commented Feb 7, 2024

JorTurFer commented Feb 7, 2024 •

edited

Loading

tr1et commented Feb 19, 2024

stale bot commented Apr 19, 2024

stale bot commented Apr 27, 2024

KEDA treats un-paused ScaledObject as Paused when restart from down #5490

KEDA treats un-paused ScaledObject as Paused when restart from down #5490

Comments

tr1et commented Feb 7, 2024

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

JorTurFer commented Feb 7, 2024 • edited Loading

tr1et commented Feb 19, 2024

stale bot commented Apr 19, 2024

stale bot commented Apr 27, 2024

JorTurFer commented Feb 7, 2024 •

edited

Loading