-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Improvements and structure changes for autoscaling docs #2439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,18 @@ | ||
| One of the main features of Knative is automatic scaling of replicas for an application to closely match incoming demand, including scaling applications to zero if no traffic is being received. | ||
| Knative Serving enables this by default, using the Knative Pod Autoscaler (KPA). | ||
| The Autoscaler component watches traffic flow to the application, and scales replicas up or down based on configured metrics. | ||
|
|
||
| Knative services default to using autoscaling settings that are suitable for the majority of use cases. However, some workloads may require a custom, more finely-tuned configuration. | ||
| This guide provides information about configuration options that you can modify to fit the requirements of your workload. | ||
|
|
||
| For more information about how autoscaling for Knative works, see the [Autoscaling concepts](./autoscaling-concepts.md) documentation. | ||
|
|
||
| For more information about which metrics can be used to control the Autoscaler, see the [metrics](./autoscaling-metrics.md) documentation. | ||
|
|
||
| ## Optional autoscaling configuration tasks | ||
|
|
||
| * Configure your Knative deployment to use the Kubernetes [Horizontal Pod Autoscaler (HPA)](../../install/any-kubernetes-cluster.md#optional-serving-extensions) instead of the default KPA. | ||
| * Disable scale to zero functionality for your cluster ([global configuration only](./scale-to-zero.md)). | ||
| * Configure the [type of metrics](./autoscaling-metrics.md) your Autoscaler consumes. | ||
| * Configure [concurrency limits](./concurrency.md) for applications. | ||
| * Try out the [Go Autoscale Sample App](./autoscale-go/README.md). | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| --- | ||
| title: "Autoscaling" | ||
| linkTitle: "Autoscaling" | ||
| weight: 20 | ||
| type: "docs" | ||
| aliases: | ||
| - /docs/serving/configuring-autoscaling/ | ||
| --- | ||
|
|
||
| {{% readfile file="README.md" %}} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| --- | ||
| title: "Autoscale Sample App - Go" | ||
| linkTitle: "Autoscale Sample App - Go" | ||
| weight: 100 | ||
| type: "docs" | ||
abrennan89 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| aliases: | ||
| - /docs/serving/samples/autoscale-go | ||
| --- | ||
|
|
||
| {{% readfile file="README.md" %}} | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,135 @@ | ||
| --- | ||
| title: "Autoscaling concepts" | ||
| linkTitle: "Autoscaling concepts" | ||
| weight: 01 | ||
| type: "docs" | ||
| --- | ||
|
|
||
| This section covers conceptual information about which Autoscaler types are supported, as well as fundamental information about how autoscaling is configured. | ||
|
|
||
| ## Supported Autoscaler types | ||
|
|
||
| Knative Serving supports the implementation of Knative Pod Autoscaler (KPA) and Kubernetes' Horizontal Pod Autoscaler (HPA). The features and limitations of each of these Autoscalers are listed below. | ||
|
|
||
| **IMPORTANT:** If you want to use Kubernetes Horizontal Pod Autoscaler (HPA), you must install it after you install [Knative Serving](../../install/any-kubernetes-cluster.md#optional-serving-extensions). | ||
|
|
||
| ### Knative Pod Autoscaler (KPA) | ||
|
|
||
| * Part of the Knative Serving core and enabled by default once Knative Serving is installed. | ||
| * Supports scale to zero functionality. | ||
| * Does not support CPU-based autoscaling. | ||
|
|
||
| ### Horizontal Pod Autoscaler (HPA) | ||
|
|
||
| * Not part of the Knative Serving core, and must be enabled after [Knative Serving installation](../../install/any-kubernetes-cluster.md#optional-serving-extensions). | ||
| * Does not support scale to zero functionality. | ||
| * Supports CPU-based autoscaling. | ||
|
|
||
| ### Configuring the Autoscaler implementation | ||
|
|
||
| The type of Autoscaler implementation (KPA or HPA) can be configured by using the `class` annotation. | ||
|
|
||
| * **Global settings key:** `pod-autoscaler-class` | ||
| * **Per-revision annotation key:** `autoscaling.knative.dev/class` | ||
| * **Possible values:** `"kpa.autoscaling.knative.dev"` or `"hpa.autoscaling.knative.dev"` | ||
| * **Default:** `"kpa.autoscaling.knative.dev"` | ||
|
|
||
| **Example:** | ||
| {{< tabs name="class" default="Per Revision" >}} | ||
| {{% tab name="Per Revision" %}} | ||
| ```yaml | ||
| apiVersion: serving.knative.dev/v1 | ||
| kind: Service | ||
| metadata: | ||
| name: helloworld-go | ||
| namespace: default | ||
| spec: | ||
| template: | ||
| metadata: | ||
| annotations: | ||
| autoscaling.knative.dev/class: "kpa.autoscaling.knative.dev" | ||
| spec: | ||
| containers: | ||
| - image: gcr.io/knative-samples/helloworld-go | ||
| ``` | ||
| {{< /tab >}} | ||
| {{% tab name="Global (ConfigMap)" %}} | ||
| ```yaml | ||
| apiVersion: v1 | ||
| kind: ConfigMap | ||
| metadata: | ||
| name: config-autoscaler | ||
| namespace: knative-serving | ||
| data: | ||
| pod-autoscaler-class: "kpa.autoscaling.knative.dev" | ||
| ``` | ||
| {{< /tab >}} | ||
| {{% tab name="Global (Operator)" %}} | ||
| ```yaml | ||
| apiVersion: operator.knative.dev/v1alpha1 | ||
| kind: KnativeServing | ||
| metadata: | ||
| name: knative-serving | ||
| spec: | ||
| config: | ||
| autoscaler: | ||
| pod-autoscaler-class: "kpa.autoscaling.knative.dev" | ||
| ``` | ||
| {{< /tab >}} | ||
| {{< /tabs >}} | ||
|
|
||
| ## Global versus per-revision settings | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should probably go above the class, since it affects the choices made there as well.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this fits here as well -- it provides more detail, but if users are happy with the first choice they found, it should work.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is why we need to be able to reuse / include content - so we can have a blurb about something in more than one place in the docs but maintain it from one file. |
||
|
|
||
| Configuring for autoscaling in Knative can be set using either global or per-revision settings. | ||
|
|
||
| 1. If no per-revision autoscaling settings are specified, the global settings will be used. | ||
| 1. If per-revision settings are specified, these will override the global settings when both types of settings exist. | ||
|
|
||
| ### Global settings | ||
|
|
||
| Global settings for autoscaling are configured using the `config-autoscaler` ConfigMap. If you installed Knative Serving using the Operator, you can set global configuration settings in the `spec.config.autoscaler` ConfigMap, located in the `KnativeServing` custom resource (CR). | ||
|
|
||
| #### Example of the default autoscaling ConfigMap | ||
|
|
||
| ```yaml | ||
| apiVersion: v1 | ||
| kind: ConfigMap | ||
| metadata: | ||
| name: config-autoscaler | ||
| namespace: knative-serving | ||
| data: | ||
| container-concurrency-target-default: "100" | ||
| container-concurrency-target-percentage: "0.7" | ||
| enable-scale-to-zero: "true" | ||
| max-scale-up-rate: "1000" | ||
| max-scale-down-rate: "2" | ||
| panic-window-percentage: "10" | ||
| panic-threshold-percentage: "200" | ||
| scale-to-zero-grace-period: "30s" | ||
abrennan89 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| scale-to-zero-pod-retention-period: "0s" | ||
| stable-window: "60s" | ||
| target-burst-capacity: "200" | ||
| requests-per-second-target-default: "200" | ||
| ``` | ||
|
|
||
| ### Per-revision settings | ||
|
|
||
| Per-revision settings for autoscaling are configured by adding _annotations_ to a revision. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are the annotation keys the same ones as are present in the ConfigMap?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nope, they're different afaik, it's explained in the section for each key. |
||
|
|
||
| ### Example | ||
|
|
||
| ```yaml | ||
| apiVersion: serving.knative.dev/v1 | ||
| kind: Service | ||
| metadata: | ||
| name: helloworld-go | ||
| namespace: default | ||
| spec: | ||
| template: | ||
| metadata: | ||
| annotations: | ||
| autoscaling.knative.dev/target: "70" | ||
| ``` | ||
|
|
||
| **IMPORTANT:** If you are creating revisions by using a service or configuration, you must set the annotations in the _revision template_ so that any modifications will be applied to each revision as they are created. | ||
| Setting annotations in the top level metadata of a single revision will not propagate the changes to other revisions and will not apply changes to the autoscaling configuration for your application. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| --- | ||
| title: "Metrics" | ||
| linkTitle: "Metrics" | ||
| weight: 03 | ||
| type: "docs" | ||
| --- | ||
|
|
||
| The metric configuration defines which metric type is watched by the Autoscaler. | ||
|
|
||
| ## Setting metrics per revision | ||
|
|
||
| For [per-revision](./autoscaling-concepts.md) configuration, this is determined using the `autoscaling.knative.dev/metric` annotation. | ||
| The possible metric types that can be configured per revision depend on the type of Autoscaler implementation you are using: | ||
|
|
||
| * The default KPA Autoscaler supports the `concurrency` and `rps` metrics. | ||
| * The HPA Autoscaler supports the `concurrency`, `rps` and `cpu` metrics. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you describe what these options mean?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In particular, how concurrency and RPS differ. Feel free to defer with a
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a TODO note :) I think I had pretty much the same note previously and removed it because I knew I wouldn't get to it 😂
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A TODO lets someone else pick up the item if you don't get back to it. 😁 |
||
|
|
||
| <!-- TODO: Add details about different metrics types, how concurrency and rps differ. Explain cpu. --> | ||
|
|
||
| For more information about KPA and HPA, see the documentation on [Supported Autoscaler types](./autoscaling-concepts.md). | ||
|
|
||
| * **Per-revision annotation key:** `autoscaling.knative.dev/metric` | ||
| * **Possible values:** `"concurrency"`, `"rps"` or `"cpu"`, depending on your Autoscaler type. The `cpu` metric is only supported on revisions with the HPA class. | ||
| * **Default:** `"concurrency"` | ||
|
|
||
| {{< tabs name="Examples of configuring metric types per revision" default="Per-revision concurrency configuration" >}} | ||
| {{% tab name="Per-revision concurrency configuration" %}} | ||
| ```yaml | ||
| apiVersion: serving.knative.dev/v1 | ||
| kind: Service | ||
| metadata: | ||
| name: helloworld-go | ||
| namespace: default | ||
| spec: | ||
| template: | ||
| metadata: | ||
| annotations: | ||
| autoscaling.knative.dev/metric: "concurrency" | ||
| ``` | ||
| {{< /tab >}} | ||
| {{% tab name="Per-revision rps configuration" %}} | ||
| ```yaml | ||
| apiVersion: serving.knative.dev/v1 | ||
| kind: Service | ||
| metadata: | ||
| name: helloworld-go | ||
| namespace: default | ||
| spec: | ||
| template: | ||
| metadata: | ||
| annotations: | ||
| autoscaling.knative.dev/metric: "rps" | ||
| ``` | ||
| {{< /tab >}} | ||
| {{% tab name="Per-revision cpu configuration" %}} | ||
| ```yaml | ||
| apiVersion: serving.knative.dev/v1 | ||
| kind: Service | ||
| metadata: | ||
| name: helloworld-go | ||
| namespace: default | ||
| spec: | ||
| template: | ||
| metadata: | ||
| annotations: | ||
| autoscaling.knative.dev/metric: "cpu" | ||
| ``` | ||
| {{< /tab >}} | ||
| {{< /tabs >}} | ||
|
|
||
| ## Next steps | ||
|
|
||
| * Configure [concurrency targets](./concurrency.md) for applications | ||
| * Configure [requests per second targets](./rps-target.md) for replicas of an application | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| --- | ||
| title: "Targets" | ||
| linkTitle: "Targets" | ||
| weight: 04 | ||
| type: "docs" | ||
| --- | ||
|
|
||
| Configuring a target provide the Autoscaler with a value that it tries to maintain for the configured metric for a revision. | ||
| See the [metrics](./autoscaling-metrics.md) documentation for more information about configurable metric types. | ||
|
|
||
| The `target` annotation, used to configure per-revision targets, is _metric agnostic_. This means the target is simply an integer value, which can be applied for any metric type. | ||
|
|
||
| ## Configuring targets | ||
|
|
||
| * **Global settings key:** `container-concurrency-target-default` for setting a concurrency target, and `requests-per-second-target-default` for setting a requests-per-second (RPS) target. For more information, see the documentation on [metrics](./autoscaling-metrics.md). | ||
| * **Per-revision annotation key:** `autoscaling.knative.dev/target` | ||
| * **Possible values:** An integer (metric agnostic). | ||
| * **Default:** `"100"` for `container-concurrency-target-default`, and `"200"` for `requests-per-second-target-default`. There is no default value set for the `target` annotation. | ||
|
|
||
| {{< tabs name="Configuring targets" default="Target annotation - Per-revision" >}} | ||
| {{% tab name="Target annotation - Per-revision" %}} | ||
| ```yaml | ||
| apiVersion: serving.knative.dev/v1 | ||
| kind: Service | ||
| metadata: | ||
| name: helloworld-go | ||
| namespace: default | ||
| spec: | ||
| template: | ||
| metadata: | ||
| annotations: | ||
| autoscaling.knative.dev/target: "50" | ||
| ``` | ||
| {{< /tab >}} | ||
| {{% tab name="Concurrency target - Global (ConfigMap)" %}} | ||
| ```yaml | ||
| apiVersion: v1 | ||
| kind: ConfigMap | ||
| metadata: | ||
| name: config-autoscaler | ||
| namespace: knative-serving | ||
| data: | ||
| container-concurrency-target-default: "200" | ||
| ``` | ||
| {{< /tab >}} | ||
| {{% tab name="Concurrency target - Container Global (Operator)" %}} | ||
| ```yaml | ||
| apiVersion: operator.knative.dev/v1alpha1 | ||
| kind: KnativeServing | ||
| metadata: | ||
| name: knative-serving | ||
| spec: | ||
| config: | ||
| autoscaler: | ||
| container-concurrency-target-default: "200" | ||
| ``` | ||
| {{< /tab >}} | ||
| {{% tab name="Requests per second (RPS) target - Global (ConfigMap)" %}} | ||
| ```yaml | ||
| apiVersion: v1 | ||
| kind: ConfigMap | ||
| metadata: | ||
| name: config-autoscaler | ||
| namespace: knative-serving | ||
| data: | ||
| requests-per-second-target-default: "150" | ||
| ``` | ||
| {{< /tab >}} | ||
| {{% tab name="Requests per second (RPS) target - Global (Operator)" %}} | ||
| ```yaml | ||
| apiVersion: operator.knative.dev/v1alpha1 | ||
| kind: KnativeServing | ||
| metadata: | ||
| name: knative-serving | ||
| spec: | ||
| config: | ||
| autoscaler: | ||
| requests-per-second-target-default: "150" | ||
| ``` | ||
| {{< /tab >}} | ||
| {{< /tabs >}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see doing a similar move for other samples in the future (e.g. traffic splitting samples into the networking subsection)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'd like to do this with all of them, it just makes more sense to me as a place where someone would look for it. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we've discussed it; I don't have an intrinsic problem with it, but it would be good to have a documented strategy.