ScaledObject reconciliation delay due to redundant reconciles triggered by HPA status updates #5281

deefreak · 2023-12-12T08:09:52Z

Report

In our organisation, we have large clusters where workload autoscaling is being managed by KEDA where 350+ scaledobjects are present.

Lately, we have observed that the time taken for a change to the scaledobject to reflect in its child HPA takes around 10 min which is hampering the autoscaler responsiveness.
For example, if we update the scaledobject.spec.maxReplicaCount , it takes around 10 min to reflect in the hpa.spec.maxReplicas.

After debugging and analyzing the KEDA operator pod logs and the scaledobject_controller.go , we identified that ScaledObjects are continuously being reconciled when the child HPA is updated. This includes even upon the status updates to the HPA by other controllers such as HPA controller which is part of the kube controller manager. HPA controller keeps updating the hpa status with information such as conditions and the controller perceived resource metrics
(.status.currentMetrics). The volume of such updates is too high and this is causing unnecessary and redundant reconciles on the scaledobject further delaying any genuine updates.

Expected Behavior

Only changes to the hpa spec or labels/annotations should trigger scaledobject reconcilation and ignore any status updates.

Actual Behavior

In this controller initialisation part, any updates to the HPA are triggering scaledobject reconciles which can be redundant and unnecessary.

Steps to Reproduce the Problem

Have a cluster with 100+ scaledobjects being managed by KEDA. The time taken to update the HPA upon scaledobject spec updates is in the order of minutes.

Logs from KEDA operator

We are seeing multiple logs like this which are triggered by multiple HPA status updates (.status.currentMetrics).

2023-12-08T13:44:04.647+0530 INFO controllers.ScaledObject Reconciling ScaledObject {"ScaledObject.Namespace": "xx", "ScaledObject.Name": "yy"}

2023-12-08T13:51:08.931+0530 INFO controllers.ScaledObject Reconciling ScaledObject {"ScaledObject.Namespace": "xx", "ScaledObject.Name": "yy"}

KEDA Version

2.12.0

Kubernetes Version

1.26

Platform

Other

Scaler Details

CPU

Anything else?

This issue is reproducible in the older and the latest KEDA versions. Also, tweaking KEDA_SCALEDOBJECT_CTRL_MAX_RECONCILES will not completely resolve this issue as the redundant updates will still continue to be processed.

zroubalik · 2023-12-12T11:58:27Z

@deefreak thanks for reporting. We should add predicate there.

I wonder, should we really trigger reconcile loop on HPA annotations changes?

zroubalik · 2023-12-12T11:58:59Z

I can contribute this 😄

zroubalik · 2023-12-12T12:00:14Z

At the moment we only replace Spec and Labels if I am not mistaken:

keda/controllers/keda/hpa.go

Lines 154 to 183 in a2c21ec

    
           func (r *ScaledObjectReconciler) updateHPAIfNeeded(ctx context.Context, logger logr.Logger, scaledObject *kedav1alpha1.ScaledObject, foundHpa *autoscalingv2.HorizontalPodAutoscaler, gvkr *kedav1alpha1.GroupVersionKindResource) error { 
        
           	hpa, err := r.newHPAForScaledObject(ctx, logger, scaledObject, gvkr) 
        
           	if err != nil { 
        
           		logger.Error(err, "Failed to create new HPA resource", "HPA.Namespace", scaledObject.Namespace, "HPA.Name", getHPAName(scaledObject)) 
        
           		return err 
        
           	} 
        
           	// DeepDerivative ignores extra entries in arrays which makes removing the last trigger not update things, so trigger and update any time the metrics count is different. 
        
           	if len(hpa.Spec.Metrics) != len(foundHpa.Spec.Metrics) || !equality.Semantic.DeepDerivative(hpa.Spec, foundHpa.Spec) { 
        
           		logger.V(1).Info("Found difference in the HPA spec accordint to ScaledObject", "currentHPA", foundHpa.Spec, "newHPA", hpa.Spec) 
        
           		if err = r.Client.Update(ctx, hpa); err != nil { 
        
           			foundHpa.Spec = hpa.Spec 
        
           			logger.Error(err, "Failed to update HPA", "HPA.Namespace", foundHpa.Namespace, "HPA.Name", foundHpa.Name) 
        
           			return err 
        
           		} 
        
           		logger.Info("Updated HPA according to ScaledObject", "HPA.Namespace", foundHpa.Namespace, "HPA.Name", foundHpa.Name) 
        
           	} 
        
           	if !equality.Semantic.DeepDerivative(hpa.ObjectMeta.Labels, foundHpa.ObjectMeta.Labels) { 
        
           		logger.V(1).Info("Found difference in the HPA labels accordint to ScaledObject", "currentHPA", foundHpa.ObjectMeta.Labels, "newHPA", hpa.ObjectMeta.Labels) 
        
           		if err = r.Client.Update(ctx, hpa); err != nil { 
        
           			foundHpa.ObjectMeta.Labels = hpa.ObjectMeta.Labels 
        
           			logger.Error(err, "Failed to update HPA", "HPA.Namespace", foundHpa.Namespace, "HPA.Name", foundHpa.Name) 
        
           			return err 
        
           		} 
        
           		logger.Info("Updated HPA according to ScaledObject", "HPA.Namespace", foundHpa.Namespace, "HPA.Name", foundHpa.Name) 
        
           	} 
        
           	return nil

Can't recall if this is by desing 🤷‍♂️ 😄

deefreak · 2023-12-12T12:01:11Z

@zroubalik @JorTurFer , I am willing to contribute and have the PR ready. I will link the issue to the PR .

deefreak added the bug Something isn't working label Dec 12, 2023

zroubalik self-assigned this Dec 12, 2023

deefreak mentioned this issue Dec 12, 2023

Added hpa update predicate to reconcile a scaledobject only if hpa spec,label or annotation is changed #5282

Merged

zroubalik closed this as completed in #5282 Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ScaledObject reconciliation delay due to redundant reconciles triggered by HPA status updates #5281

ScaledObject reconciliation delay due to redundant reconciles triggered by HPA status updates #5281

deefreak commented Dec 12, 2023

zroubalik commented Dec 12, 2023

zroubalik commented Dec 12, 2023

zroubalik commented Dec 12, 2023

deefreak commented Dec 12, 2023 •

edited

Loading

ScaledObject reconciliation delay due to redundant reconciles triggered by HPA status updates #5281

ScaledObject reconciliation delay due to redundant reconciles triggered by HPA status updates #5281

Comments

deefreak commented Dec 12, 2023

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

zroubalik commented Dec 12, 2023

zroubalik commented Dec 12, 2023

zroubalik commented Dec 12, 2023

deefreak commented Dec 12, 2023 • edited Loading

deefreak commented Dec 12, 2023 •

edited

Loading