[loki-distributed] Add configurable scaling behaviour and KEDA autoscaler #2126

zaldnoay · 2023-01-16T08:03:45Z

Loki's document recommend using KEDA in querier to configure autoscaling based on Prometheus metrics. Also the default scaling behaviour is too frequent for Loki's components. I recommend adding a configurable scaling behaviour to the values and templates to make deployment more stable and flexible. Here are some of the examples I wrote:

values.yaml:

querier:
  autoscaling:
    scaler: native # native or keda
    behavior: {}
    # Configure KEDA Prometheus trigger.
    # See also: https://keda.sh/docs/latest/scalers/prometheus/
    targetMetricsConfigure:
      query: sum(max_over_time(cortex_query_scheduler_inflight_requests{namespace="loki-cluster", quantile="0.75"}[2m]))
      serverAddress: http://prometheus.default:9090/prometheus
      threshold: 4

templates:

# hpa.yaml
{{- if .Values.querier.autoscaling.enabled }}
{{- if eq .Values.querier.autoscaling.scaler "native" }}
{{- $apiVersion := include "loki.hpa.apiVersion" . -}}
apiVersion: {{ $apiVersion }}
kind: HorizontalPodAutoscaler
# ...
spec:
# ...
  {{- if (eq $apiVersion "autoscaling/v2") }}
  {{- with .Values.querier.autoscaling.behavior }}
  behavior:
    {{- toYaml . | nindent 4 }}
  {{- end }}
  {{- end }}
{{- else if eq .Values.querier.autoscaling.scaler "keda" }}
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
# ...
spec:
# ...
  {{- with .Values.querier.autoscaling.behavior }}
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        {{- toYaml . | nindent 8 }}
  {{- end }}
  triggers:
    {{- with .Values.querier.autoscaling.targetCPUUtilizationPercentage }}
    - type: cpu
      metricType: Utilization
      metadata:
        value: "60"
    {{- end }}
    # ...
    {{- with .Values.querier.autoscaling.targetMetricsConfigure }}
    - metadata:
        metricName: querier_autoscaling_metric
        query: {{ .query }}
        serverAddress: {{ .serverAddress }}
        threshold: {{ .threshold }}
      type: prometheus
    {{- end }}
{{- end }}
{{- end }}

Questions are welcome.

KEDA document: https://keda.sh/docs/latest/concepts/scaling-deployments/
K8S HPA document: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#default-behavior

jfusterm · 2023-04-26T13:23:13Z

We had exactly the same issue.

We wanted Loki to scale/downscale more steadily by tuning both the behavior.scaleUp and behavior.scaleDown policies, but we couldn't using the provided HPA resources, so we rolled out our own manifests on top of the chart.

One of the problems we had is that unless we enable HPA with autoscaling.enabled: true, which we don't want to given that we use our own HPA manifests, we can't avoid setting the replicas of each component.

spec:
{{- if not .Values.distributor.autoscaling.enabled }}
  replicas: {{ .Values.distributor.replicas }}
{{- end }}

That's a problem when using a GitOps operator like Argo CD, because once the HPA tries to scale, Argo CD will reconcile the state setting whatever the value is in the replicas option, preventing any scale up.

We solved it by ignoring that field in Argo CD but it'll be nice to be able to use custom HPAs configurations or KEDA objects, and still be able to avoid defining the replica in the templates.

    ignoreDifferences:
      - group: apps
        kind: Deployment
        name: loki-distributor
        namespace: loki
        jsonPointers:
          - /spec/replicas
      - group: apps
        kind: StatefulSet
        name: loki-ingester
        namespace: loki
        jsonPointers:
          - /spec/replicas
      - group: apps
        kind: Deployment
        name: loki-querier
        namespace: loki
        jsonPointers:
          - /spec/replicas
      - group: apps
        kind: Deployment
        name: loki-query-frontend
        namespace: loki
        jsonPointers:
          - /spec/replicas
    syncPolicy:
      syncOptions:
        - RespectIgnoreDifferences=true

- grafana#2558 - grafana#2493 - grafana#1391 - grafana#2126

- grafana#2558 - grafana#2493 - grafana#1391 - grafana#2126 Signed-off-by: Gritzko Daniel Kleiner <ext.gritzko.kleiner@dafiti.com.br>

siryur mentioned this issue Dec 19, 2023

[loki-distributed] loki-querier architecture does not allow KEDA ScaledObject #2860

Closed

gritzkoo mentioned this issue Mar 19, 2024

chore: fix hpa api selector #3033

Closed

gritzkoo added a commit to gritzkoo/grafana-helm-charts that referenced this issue Mar 31, 2024

chore: related issues:

99cd31c

- grafana#2558 - grafana#2493 - grafana#1391 - grafana#2126

gritzkoo mentioned this issue Mar 31, 2024

[grafana] Fix HPA selector #3050

Merged

gritzkoo added a commit to gritzkoo/grafana-helm-charts that referenced this issue Mar 31, 2024

chore: related issues:

eb62f38

- grafana#2558 - grafana#2493 - grafana#1391 - grafana#2126 Signed-off-by: Gritzko Daniel Kleiner <ext.gritzko.kleiner@dafiti.com.br>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[loki-distributed] Add configurable scaling behaviour and KEDA autoscaler #2126

[loki-distributed] Add configurable scaling behaviour and KEDA autoscaler #2126

zaldnoay commented Jan 16, 2023 •

edited

jfusterm commented Apr 26, 2023

[loki-distributed] Add configurable scaling behaviour and KEDA autoscaler #2126

[loki-distributed] Add configurable scaling behaviour and KEDA autoscaler #2126

Comments

zaldnoay commented Jan 16, 2023 • edited

jfusterm commented Apr 26, 2023

zaldnoay commented Jan 16, 2023 •

edited