Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScaledObject reconciliation delay due to redundant reconciles triggered by HPA status updates #5281

Closed
deefreak opened this issue Dec 12, 2023 · 4 comments · Fixed by #5282
Closed
Assignees
Labels
bug Something isn't working

Comments

@deefreak
Copy link
Contributor

Report

In our organisation, we have large clusters where workload autoscaling is being managed by KEDA where 350+ scaledobjects are present.

Lately, we have observed that the time taken for a change to the scaledobject to reflect in its child HPA takes around 10 min which is hampering the autoscaler responsiveness.
For example, if we update the scaledobject.spec.maxReplicaCount , it takes around 10 min to reflect in the hpa.spec.maxReplicas.

After debugging and analyzing the KEDA operator pod logs and the scaledobject_controller.go , we identified that ScaledObjects are continuously being reconciled when the child HPA is updated. This includes even upon the status updates to the HPA by other controllers such as HPA controller which is part of the kube controller manager. HPA controller keeps updating the hpa status with information such as conditions and the controller perceived resource metrics
(.status.currentMetrics). The volume of such updates is too high and this is causing unnecessary and redundant reconciles on the scaledobject further delaying any genuine updates.

Expected Behavior

Only changes to the hpa spec or labels/annotations should trigger scaledobject reconcilation and ignore any status updates.

Actual Behavior

In this controller initialisation part, any updates to the HPA are triggering scaledobject reconciles which can be redundant and unnecessary.

Steps to Reproduce the Problem

Have a cluster with 100+ scaledobjects being managed by KEDA. The time taken to update the HPA upon scaledobject spec updates is in the order of minutes.

Logs from KEDA operator

We are seeing multiple logs like this which are triggered by multiple HPA status updates (.status.currentMetrics).

2023-12-08T13:44:04.647+0530 INFO controllers.ScaledObject Reconciling ScaledObject {"ScaledObject.Namespace": "xx", "ScaledObject.Name": "yy"}

2023-12-08T13:51:08.931+0530 INFO controllers.ScaledObject Reconciling ScaledObject {"ScaledObject.Namespace": "xx", "ScaledObject.Name": "yy"}

KEDA Version

2.12.0

Kubernetes Version

1.26

Platform

Other

Scaler Details

CPU

Anything else?

This issue is reproducible in the older and the latest KEDA versions. Also, tweaking KEDA_SCALEDOBJECT_CTRL_MAX_RECONCILES will not completely resolve this issue as the redundant updates will still continue to be processed.

@deefreak deefreak added the bug Something isn't working label Dec 12, 2023
@zroubalik
Copy link
Member

@deefreak thanks for reporting. We should add predicate there.

I wonder, should we really trigger reconcile loop on HPA annotations changes?

@zroubalik zroubalik self-assigned this Dec 12, 2023
@zroubalik
Copy link
Member

I can contribute this 😄

@zroubalik
Copy link
Member

At the moment we only replace Spec and Labels if I am not mistaken:

func (r *ScaledObjectReconciler) updateHPAIfNeeded(ctx context.Context, logger logr.Logger, scaledObject *kedav1alpha1.ScaledObject, foundHpa *autoscalingv2.HorizontalPodAutoscaler, gvkr *kedav1alpha1.GroupVersionKindResource) error {
hpa, err := r.newHPAForScaledObject(ctx, logger, scaledObject, gvkr)
if err != nil {
logger.Error(err, "Failed to create new HPA resource", "HPA.Namespace", scaledObject.Namespace, "HPA.Name", getHPAName(scaledObject))
return err
}
// DeepDerivative ignores extra entries in arrays which makes removing the last trigger not update things, so trigger and update any time the metrics count is different.
if len(hpa.Spec.Metrics) != len(foundHpa.Spec.Metrics) || !equality.Semantic.DeepDerivative(hpa.Spec, foundHpa.Spec) {
logger.V(1).Info("Found difference in the HPA spec accordint to ScaledObject", "currentHPA", foundHpa.Spec, "newHPA", hpa.Spec)
if err = r.Client.Update(ctx, hpa); err != nil {
foundHpa.Spec = hpa.Spec
logger.Error(err, "Failed to update HPA", "HPA.Namespace", foundHpa.Namespace, "HPA.Name", foundHpa.Name)
return err
}
logger.Info("Updated HPA according to ScaledObject", "HPA.Namespace", foundHpa.Namespace, "HPA.Name", foundHpa.Name)
}
if !equality.Semantic.DeepDerivative(hpa.ObjectMeta.Labels, foundHpa.ObjectMeta.Labels) {
logger.V(1).Info("Found difference in the HPA labels accordint to ScaledObject", "currentHPA", foundHpa.ObjectMeta.Labels, "newHPA", hpa.ObjectMeta.Labels)
if err = r.Client.Update(ctx, hpa); err != nil {
foundHpa.ObjectMeta.Labels = hpa.ObjectMeta.Labels
logger.Error(err, "Failed to update HPA", "HPA.Namespace", foundHpa.Namespace, "HPA.Name", foundHpa.Name)
return err
}
logger.Info("Updated HPA according to ScaledObject", "HPA.Namespace", foundHpa.Namespace, "HPA.Name", foundHpa.Name)
}
return nil

Can't recall if this is by desing 🤷‍♂️ 😄

@deefreak
Copy link
Contributor Author

deefreak commented Dec 12, 2023

@zroubalik @JorTurFer , I am willing to contribute and have the PR ready. I will link the issue to the PR .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
2 participants