Question about HPA trigger to scale #1

vsoch · 2023-05-31T20:22:46Z

Hi! I found your slides here https://indico.cern.ch/event/968726/contributions/4118126/attachments/2153775/3632238/k8s-HEP_tedeschi.pdf and was hoping you might have some insights to (what I think is) a missing step. I'm working on similar functionality for our Flux Framework operator (in Kubernetes) to scale, and I have the metrics server that is outputting a metric for node cpus, and I have the APIService and the HPA that is pinging it. What I don't understand is the final step - how the HPA knows to act on a metric to tell a pod to, for example, scale up or down. This works fine for a CPU Resource:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: flux-sample-hpa
  namespace: flux-operator
spec:
  scaleTargetRef:
    apiVersion: flux-framework.org/v1alpha1
    kind: MiniCluster
    name: flux-sample
  minReplicas: 2
  maxReplicas: 4
  metrics:
  - type: Resource
    resource:
      name: cpu
      # This is explicitly set to be very low so it triggers
      target:
        type: Utilization
        averageUtilization: 2

But when I use a custom metric (provided by the server) I'm not sure how the custom metric actually advises the autoscaler:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: flux-sample-hpa
  namespace: flux-operator
spec:
  scaleTargetRef:
    apiVersion: flux-framework.org/v1alpha1
    kind: MiniCluster
    name: flux-sample
  minReplicas: 2
  maxReplicas: 4
  metrics:

  # https://docs.openshift.com/container-platform/4.11/rest_api/autoscale_apis/horizontalpodautoscaler-autoscaling-v2.html#spec-metrics-object
  - type: Object
    object:
      # This is the service we created
      target:
        value: 4
        type: "Value"

      # Where to get the data from
      describedObject:
        kind: Service
        name: custom-metrics-apiserver

      # This should scale until we hit 4
      metric:
        name: node_up_count

  # Behavior determines how to do the scaling
  # https://www.kloia.com/blog/advanced-hpa-in-kubernetes
  behavior:

    # select the preferred policy dynamically, "Max" or "Disabled"
    scaleUp:
      selectPolicy: Max
      stabilizationWindowSeconds: 120
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      - type: Pods
        value: 4
        periodSeconds: 60

    scaleDown:
      selectPolicy: Max
      stabilizationWindowSeconds: 120
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      - type: Pods
        value: 4
        periodSeconds: 60

I thought it was something to do with adding behavior, but looking in your PDF, it seems like there is some other rule that needs to be created? This part:

I'm still trying to figure out where that goes - in my case, my exporter is providing the service for the HPA directly, and I see that you are using Prometheus (the adapter) to convert exported metrics to some standard format with a rule? Do you know if there is documentation somewhere about how the adapted is providing metrics and how they trigger scaling? Right now mine look like this:

$ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta2/namespaces/flux-operator/services/custom-metrics-service/node_up_count | jq
{
  "items": [
    {
      "metric": {
        "name": "node_up_count"
      },
      "value": 2,
      "timestamp": "2023-05-31T20:21:58+00:00",
      "windowSeconds": 0,
      "describedObject": {
        "kind": "Service",
        "namespace": "flux-operator",
        "name": "custom-metrics-apiserver",
        "apiVersion": "v1beta2"
      }
    }
  ],
  "apiVersion": "custom.metrics.k8s.io/v1beta2",
  "kind": "MetricValueList",
  "metadata": {
    "selfLink": "/apis/custom.metrics.k8s.io/v1beta2"
  }
}

Thanks for any advice or pointers you can provide! I'm new at this so apologies if any of these questions are stupid.

The text was updated successfully, but these errors were encountered:

vsoch · 2023-05-31T20:39:17Z

ah so I think I figured it out. What helped was looking at the autoscaler status to see what was going on:

$ kubectl get hpa -n flux-operator flux-sample-hpa -o json | jq .status.conditions
[
  {
    "lastTransitionTime": "2023-05-31T19:50:20Z",
    "message": "recommended size matches current size",
    "reason": "ReadyForNewScale",
    "status": "True",
    "type": "AbleToScale"
  },
  {
    "lastTransitionTime": "2023-05-31T19:52:35Z",
    "message": "the HPA was able to successfully calculate a replica count from Service metric node_up_count",
    "reason": "ValidMetricFound",
    "status": "True",
    "type": "ScalingActive"
  },
  {
    "lastTransitionTime": "2023-05-31T20:30:54Z",
    "message": "the desired count is within the acceptable range",
    "reason": "DesiredWithinRange",
    "status": "False",
    "type": "ScalingLimited"
  }
]

And then I tweaked that config above:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: flux-sample-hpa
  namespace: flux-operator
spec:
  scaleTargetRef:
    apiVersion: flux-framework.org/v1alpha1
    kind: MiniCluster
    name: flux-sample
  minReplicas: 2
  maxReplicas: 4
  metrics:

  # https://docs.openshift.com/container-platform/4.11/rest_api/autoscale_apis/horizontalpodautoscaler-autoscaling-v2.html#spec-metrics-object
  - type: Object
    object:
      # This is the service we created
      target:
        value: 4
        type: "Value"

      # Where to get the data from
      describedObject:
        kind: Service
        name: custom-metrics-apiserver

      # This should scale until we hit 4
      metric:
        name: node_up_count

  # Behavior determines how to do the scaling
  # Without this, nothing would happen
  # https://www.kloia.com/blog/advanced-hpa-in-kubernetes
  behavior:

    # select the preferred policy dynamically, "Max" or "Disabled"
    scaleUp:
      selectPolicy: Max
      stabilizationWindowSeconds: 120
      policies:
      - type: Percent
        value: 100
        periodSeconds: 60

    scaleDown:
      selectPolicy: Max
      stabilizationWindowSeconds: 120
      policies:
      - type: Percent
        value: 100
        periodSeconds: 60

And then finally I saw a response (note I started at 2 pods in the cluster!)

$ kubectl get -n flux-operator pods
NAME                  READY   STATUS    RESTARTS   AGE
flux-sample-0-kg8mq   1/1     Running   0          42m
flux-sample-1-dntwk   1/1     Running   0          42m
flux-sample-2-p8vhn   1/1     Running   0          2m3s
flux-sample-3-pvg6l   1/1     Running   0          2m3s

Feel free to close, but thank you in advance if there is any cool discussion! Sorry for the noise otherwise.

madestro · 2023-06-07T15:06:33Z

glad to help ;-)

madestro closed this as completed Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about HPA trigger to scale #1

Question about HPA trigger to scale #1

vsoch commented May 31, 2023 •

edited

vsoch commented May 31, 2023

madestro commented Jun 7, 2023

Question about HPA trigger to scale #1

Question about HPA trigger to scale #1

Comments

vsoch commented May 31, 2023 • edited

vsoch commented May 31, 2023

madestro commented Jun 7, 2023

vsoch commented May 31, 2023 •

edited