Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[loki-distributed] Add configurable scaling behaviour and KEDA autoscaler #2126

Open
zaldnoay opened this issue Jan 16, 2023 · 1 comment
Open

Comments

@zaldnoay
Copy link
Contributor

zaldnoay commented Jan 16, 2023

Loki's document recommend using KEDA in querier to configure autoscaling based on Prometheus metrics. Also the default scaling behaviour is too frequent for Loki's components. I recommend adding a configurable scaling behaviour to the values and templates to make deployment more stable and flexible. Here are some of the examples I wrote:

values.yaml:

querier:
  autoscaling:
    scaler: native # native or keda
    behavior: {}
    # Configure KEDA Prometheus trigger.
    # See also: https://keda.sh/docs/latest/scalers/prometheus/
    targetMetricsConfigure:
      query: sum(max_over_time(cortex_query_scheduler_inflight_requests{namespace="loki-cluster", quantile="0.75"}[2m]))
      serverAddress: http://prometheus.default:9090/prometheus
      threshold: 4

templates:

# hpa.yaml
{{- if .Values.querier.autoscaling.enabled }}
{{- if eq .Values.querier.autoscaling.scaler "native" }}
{{- $apiVersion := include "loki.hpa.apiVersion" . -}}
apiVersion: {{ $apiVersion }}
kind: HorizontalPodAutoscaler
# ...
spec:
# ...
  {{- if (eq $apiVersion "autoscaling/v2") }}
  {{- with .Values.querier.autoscaling.behavior }}
  behavior:
    {{- toYaml . | nindent 4 }}
  {{- end }}
  {{- end }}
{{- else if eq .Values.querier.autoscaling.scaler "keda" }}
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
# ...
spec:
# ...
  {{- with .Values.querier.autoscaling.behavior }}
  advanced:
    horizontalPodAutoscalerConfig:
      behavior:
        {{- toYaml . | nindent 8 }}
  {{- end }}
  triggers:
    {{- with .Values.querier.autoscaling.targetCPUUtilizationPercentage }}
    - type: cpu
      metricType: Utilization
      metadata:
        value: "60"
    {{- end }}
    # ...
    {{- with .Values.querier.autoscaling.targetMetricsConfigure }}
    - metadata:
        metricName: querier_autoscaling_metric
        query: {{ .query }}
        serverAddress: {{ .serverAddress }}
        threshold: {{ .threshold }}
      type: prometheus
    {{- end }}
{{- end }}
{{- end }}

Questions are welcome.

KEDA document: https://keda.sh/docs/latest/concepts/scaling-deployments/
K8S HPA document: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#default-behavior

@jfusterm
Copy link

We had exactly the same issue.

We wanted Loki to scale/downscale more steadily by tuning both the behavior.scaleUp and behavior.scaleDown policies, but we couldn't using the provided HPA resources, so we rolled out our own manifests on top of the chart.

One of the problems we had is that unless we enable HPA with autoscaling.enabled: true, which we don't want to given that we use our own HPA manifests, we can't avoid setting the replicas of each component.

spec:
{{- if not .Values.distributor.autoscaling.enabled }}
  replicas: {{ .Values.distributor.replicas }}
{{- end }}

That's a problem when using a GitOps operator like Argo CD, because once the HPA tries to scale, Argo CD will reconcile the state setting whatever the value is in the replicas option, preventing any scale up.

We solved it by ignoring that field in Argo CD but it'll be nice to be able to use custom HPAs configurations or KEDA objects, and still be able to avoid defining the replica in the templates.

    ignoreDifferences:
      - group: apps
        kind: Deployment
        name: loki-distributor
        namespace: loki
        jsonPointers:
          - /spec/replicas
      - group: apps
        kind: StatefulSet
        name: loki-ingester
        namespace: loki
        jsonPointers:
          - /spec/replicas
      - group: apps
        kind: Deployment
        name: loki-querier
        namespace: loki
        jsonPointers:
          - /spec/replicas
      - group: apps
        kind: Deployment
        name: loki-query-frontend
        namespace: loki
        jsonPointers:
          - /spec/replicas
    syncPolicy:
      syncOptions:
        - RespectIgnoreDifferences=true

gritzkoo added a commit to gritzkoo/grafana-helm-charts that referenced this issue Mar 31, 2024
gritzkoo added a commit to gritzkoo/grafana-helm-charts that referenced this issue Mar 31, 2024
- grafana#2558
- grafana#2493
- grafana#1391
- grafana#2126

Signed-off-by: Gritzko Daniel Kleiner <ext.gritzko.kleiner@dafiti.com.br>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants