Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPA on custom metrics #3134

Closed
josephburnett opened this issue Feb 7, 2019 · 18 comments · Fixed by #12277
Closed

HPA on custom metrics #3134

josephburnett opened this issue Feb 7, 2019 · 18 comments · Fixed by #12277
Labels
area/autoscale kind/feature Well-understood/specified features, ready for coding. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@josephburnett
Copy link
Contributor

Proposal

HPA-class PodAutoscalers should be able to autoscale on any custom metric emitted from the user-container. For example, if the user serves JVM memory usage on a Promethus endpoint from their application container, they should be able to wire that up to a Kubernetes HPA with custom metrics.

For example, given this Service, Knative would create an HPA which points to the "average.memory.heartshapedbox.com" metric.

apiVersion: serving.knative.dev/v1alpha1
kind: Service
metadata:
  name: love
spec:
  runLatest:
    configuration:
      revisionTemplate:
        metadata:
          annotations:
            autoscaling.knative.dev/class: "hpa.autoscaling.knative.dev"
            autoscaling.knative.dev/metric: "custom"
            autoscaling.knative.dev/customMetric: "average.memory.heartshapedbox.com"
        spec:
          container:
            image: gcr.io/joe-does-knative/love:latest

Non-requirements

Knative would not configure Prometheus to scrape the right metrics from the right path. That would be left to the operator to setup. Re-configuring a custom metrics server implementation on the fly is out of scope for what Knative autoscaler should do out-of-the-box. This plumbing is just to unlock a full custom metrics solution if necessary.

Note: #3132 does propose configuring Prometheus to scrape a metric off the queue-proxy. But since it's our metric and well known, it doesn't require reconfiguration.

@knative-prow-robot knative-prow-robot added area/autoscale kind/feature Well-understood/specified features, ready for coding. labels Feb 7, 2019
@josephburnett josephburnett added this to To do in Scaling: Pluggability and HPA via automation Feb 7, 2019
@mattmoor mattmoor added this to the Ice Box milestone Feb 11, 2019
@josephburnett
Copy link
Contributor Author

TODO:

  1. upgrade Knative to autoscaling/v2beta2 for custom metrics
  2. point the HPA to a custom metric if found in the PodAutoscaler annotations in the makeHpa method.

@josephburnett
Copy link
Contributor Author

Note: you'll put the actual annotation key strings in register.go.

@josephburnett
Copy link
Contributor Author

@kevinswiber is working on this.

@kevinswiber
Copy link

kevinswiber commented Apr 15, 2019

Based on the latest autoscaling API version supported by GKE, I'm upgrading to autoscaling/v2beta1, which is similar though not identical.

To match the fields required to create an HPA resource, I propose using the following annotations.

autoscaling.knative.dev/class: "hpa.autoscaling.knative.dev"
autoscaling.knative.dev/metric: "custom"
autoscaling.knative.dev/metricSourceType: "resource" # or... pods, object, external
autoscaling.knative.dev/metricName: "average.memory.heartshapedbox.com"
autoscaling.knative.dev/targetAverageUtilization: "50" # valid on resource type
autoscaling.knative.dev/targetAverageValue: 400m # valid on object, resource, pods, external types
autoscaling.knative.dev/targetValue: 800m # valid on object, external types

autoscaling.knative.dev/targetName: main-route # valid on object types
autoscaling.knative.dev/targetKind: Ingress # valid on object types
autoscaling.knative.dev/targetAPIVersion: extensions/v1beta1 # valid on object types

Reference: https://godoc.org/k8s.io/api/autoscaling/v2beta1#MetricSpec

This does add a non-trivial amount of complexity to this feature. Happy to receive feedback.

@markusthoemmes
Copy link
Contributor

Thanks for taking this on @kevinswiber.

In the last Autoscaling WG we had a discussion on the API surface of our autoscaling configuration. The bottomline was a question-mark to whether or not it is a good idea to offload the whole API surface into what is essentially a map[string]string. It'll be a lot to document, hard to get right (typos...) and hard to validate.

In this case it also becomes almost easier to leave the HPA spec alone and tell the user to manipulate that herself vs. configuring it through annotations and reconciliation of the revision.

@josephburnett What's your take here? Should we think about a new API surface as part of this?

@kevinswiber
Copy link

@markusthoemmes Thanks for your input. This all makes sense. The set-it-and-forget-it flow of just deploying a serving.knative.dev/v1alpha1/Service is nice, but I agree that this is a copy/paste of hefty config that's owned--and will always be owned--by the Kubernetes autoscaling API.

I'm not sure of the simplest solution. I think we'd also want the PodAutoscaler to retain ownership over the HPA. It might be possible for the user to add an annotation to a manually created HPA and have a reconciler auto-associate the Knative PodAutoscaler resource with the HPA. At that point, Knative could take control to include useful functionality like cascading deletes.

@kevinswiber
Copy link

FYI: I have the above proposal mostly working. I hesitate to update docs until we get through the design-prototyping cycle. I can put up a WIP PR if you like or hold off until design is solidified.

@josephburnett
Copy link
Contributor Author

I don't think we need all those annotations. We can focus on pod metrics and leave object metrics unimplemented. With that limitation, we would need the following annotations:

autoscaling.knative.dev/class: "hpa.autoscaling.knative.dev" # preexisting
autoscaling.knative.dev/metric: "custom" # preexisting
autoscaling.knative.dev/target: 400m # preexisting, translates to averageValue
autoscaling.knative.dev/metricName: "average.memory.heartshapedbox.com" # NEW

This adds only one new annotation, the metricName. If someone needs access to the full HPA spec, they can implement a custom controller.

@markusthoemmes @kevinswiber what do you think?

@kevinswiber
Copy link

@josephburnett This is fine. I'll have to add some defaults for the other fields. I'm happy to be involved in the discussion for the next round of design on this, as well. Piecing together a PR today.

@vagababov
Copy link
Contributor

@markusthoemmes , does the work you did for HPA cover this?

@markusthoemmes
Copy link
Contributor

@vagababov it does not, no.

@vagababov
Copy link
Contributor

@kevinswiber I presume you're no longer working on this?

@gadelkareem
Copy link

Is it already possible to disable the custom-metrics-server and switch the class to https://github.com/kubernetes-sigs/metrics-server ?

@knative-housekeeping-robot

Issues go stale after 90 days of inactivity.
Mark the issue as fresh by adding the comment /remove-lifecycle stale.
Stale issues rot after an additional 30 days of inactivity and eventually close.
If this issue is safe to close now please do so by adding the comment /close.

Send feedback to Knative Productivity Slack channel or file an issue in knative/test-infra.

/lifecycle stale

@knative-prow-robot knative-prow-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 29, 2020
@markusthoemmes
Copy link
Contributor

I'm leaning towards closing this as we haven't talked about it in ages and nobody asked about it in ages. Feel free to reopen if you disagree.

Scaling: Pluggability and HPA automation moved this from To do to Done May 27, 2020
@siddharth-mitra
Copy link

siddharth-mitra commented Jul 18, 2021

Has this feature been incorporated? I am trying to use it the definition of a kfserving inference service and the following is the error I get.

image

@julz
Copy link
Member

julz commented Jul 19, 2021

Hi @siddharth-mitra - I don't believe this feature was ever implemented (the issue was closed in May 2020 since there weren't many requests for the feature). The HPA autoscaler class in knative currently only supports "cpu" (and, when #11668 lands, "memory") as a metric

@siddharth-mitra
Copy link

siddharth-mitra commented Jul 22, 2021

Hi @julz - Thank you for letting me know about the same.
Do you know of any workaround I could use to configure custom metrics to the HPA in the Knative environment?

@dprotaso dprotaso removed this from the Ice Box milestone Oct 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/autoscale kind/feature Well-understood/specified features, ready for coding. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
No open projects