diff --git a/_topic_map.yml b/_topic_map.yml index 39d099d63b50..96b037a455b4 100644 --- a/_topic_map.yml +++ b/_topic_map.yml @@ -3189,9 +3189,6 @@ Topics: # Knative services - Name: Serverless applications File: serverless-applications - # Autoscaling - - Name: Configuring Knative Serving autoscaling - File: configuring-knative-serving-autoscaling - Name: Traffic management File: serverless-traffic-management - Name: Cluster logging with OpenShift Serverless @@ -3208,6 +3205,17 @@ Topics: - Name: Metrics File: serverless-serving-metrics # +# Autoscaling +- Name: Autoscaling + Dir: autoscaling + Topics: + - Name: About autoscaling + File: serverless-autoscaling + - Name: Scale bounds + File: serverless-autoscaling-scale-bounds + - Name: Concurrency + File: serverless-autoscaling-concurrency +# # Knative Eventing - Name: Knative Eventing Dir: knative_eventing diff --git a/modules/configuring-scale-bounds-knative.adoc b/modules/configuring-scale-bounds-knative.adoc deleted file mode 100644 index d6a392c91679..000000000000 --- a/modules/configuring-scale-bounds-knative.adoc +++ /dev/null @@ -1,32 +0,0 @@ -// Module included in the following assemblies: -// -// * serverless/configuring-knative-serving-autoscaling.adoc - -[id="configuring-scale-bounds-knative_{context}"] -= Configuring scale bounds Knative Serving autoscaling - -The `minScale` and `maxScale` annotations can be used to configure the minimum and maximum number of pods that can serve applications. -These annotations can be used to prevent cold starts or to help control computing costs. - -minScale:: If the `minScale` annotation is not set, pods will scale to zero (or to 1 if enable-scale-to-zero is false per the `ConfigMap`). - -maxScale:: If the `maxScale` annotation is not set, there will be no upper limit for the number of pods created. - -`minScale` and `maxScale` can be configured as follows in the revision template: - -[source,yaml] ----- -spec: - template: - metadata: - annotations: - autoscaling.knative.dev/minScale: "2" - autoscaling.knative.dev/maxScale: "10" ----- - -Using these annotations in the revision template will propagate this confguration to `PodAutoscaler` objects. - -[NOTE] -==== -These annotations apply for the full lifetime of a revision. Even when a revision is not referenced by any route, the minimal Pod count specified by `minScale` will still be provided. Keep in mind that non-routeable revisions may be garbage collected, which enables Knative to reclaim the resources. -==== diff --git a/modules/knative-serving-concurrent-autoscaling-requests.adoc b/modules/knative-serving-concurrent-autoscaling-requests.adoc deleted file mode 100644 index 9d5462785213..000000000000 --- a/modules/knative-serving-concurrent-autoscaling-requests.adoc +++ /dev/null @@ -1,84 +0,0 @@ -// Module included in the following assemblies: -// -// * serverless/configuring-knative-serving-autoscaling.adoc - -[id="knative-serving-concurrent-autoscaling-requests_{context}"] -= Configuring concurrent requests for Knative Serving autoscaling - -You can specify the number of concurrent requests that should be handled by each instance of a revision container, or application, by adding the `target` annotation or the `containerConcurrency` field in the revision template. - -.Example revision template YAML using target annotation - -[source,yaml] ----- -apiVersion: serving.knative.dev/v1 -kind: Service -metadata: - name: myapp -spec: - template: - metadata: - annotations: - autoscaling.knative.dev/target: 50 - spec: - containers: - - image: myimage ----- - -.Example revision template YAML using containerConcurrency annotation - -[source,yaml] ----- -apiVersion: serving.knative.dev/v1 -kind: Service -metadata: - name: myapp -spec: - template: - metadata: - annotations: - spec: - containerConcurrency: 100 - containers: - - image: myimage ----- - -Adding a value for both `target` and `containerConcurrency` will target the `target` number of concurrent requests, but impose a hard limit of the `containerConcurrency` number of requests. - -For example, if the `target` value is 50 and the `containerConcurrency` value is 100, the targeted number of requests will be 50, but the hard limit will be 100. - -If the `containerConcurrency` value is less than the `target` value, the `target` value will be tuned down, since there is no need to target more requests than the number that can actually be handled. - -[NOTE] -==== -`containerConcurrency` should only be used if there is a clear need to limit how many requests reach the application at a given time. Using `containerConcurrency` is only advised if the application needs to have an enforced constraint of concurrency. -==== - -== Configuring concurrent requests using the target annotation - -The default target for the number of concurrent requests is `100`, but you can override this value by adding or modifying the `autoscaling.knative.dev/target` annotation value in the revision template. - -Here is an example of how this annotation is used in the revision template to set the target to `50`: - -[source,yaml] ----- -autoscaling.knative.dev/target: 50 ----- - -== Configuring concurrent requests using the containerConcurrency field - -`containerConcurrency` sets a hard limit on the number of concurrent requests handled. - -[source,yaml] ----- -containerConcurrency: 0 | 1 | 2-N ----- - -0:: allows unlimited concurrent requests. -1:: guarantees that only one request is handled at a time by a given instance of the revision container. -2 or more:: will limit request concurrency to that value. - -[NOTE] -==== -If there is no `target` annotation, autoscaling is configured as if `target` is equal to the value of `containerConcurrency`. -==== diff --git a/modules/serverless-autoscaling-maxscale-kn.adoc b/modules/serverless-autoscaling-maxscale-kn.adoc new file mode 100644 index 000000000000..74e9cd34f44c --- /dev/null +++ b/modules/serverless-autoscaling-maxscale-kn.adoc @@ -0,0 +1,19 @@ +[id="serverless-autoscaling-maxscale-kn_{context}"] += Setting the maxScale annotation by using the Knative CLI + +You can use the `kn service` command with the `--max-scale` flag to create or modify the `--max-scale` value for a service. + +.Procedure + +* Set the maximum number of pods for the service by using the `--max-scale` flag: ++ +[source,terminal] +---- +$ kn service create --image --max-scale +---- ++ +.Example command +[source,terminal] +---- +$ kn service create example-service --image quay.io/openshift-knative/knative-eventing-sources-event-display:latest --max-scale 10 +---- diff --git a/modules/serverless-autoscaling-minscale-kn.adoc b/modules/serverless-autoscaling-minscale-kn.adoc new file mode 100644 index 000000000000..f28e3fc35fc5 --- /dev/null +++ b/modules/serverless-autoscaling-minscale-kn.adoc @@ -0,0 +1,21 @@ +[id="serverless-autoscaling-minscale_{context}"] += Setting the minScale annotation by using the Knative CLI + +You can use the `kn service` command with the `--min-scale` flag to create or modify the `--min-scale` value for a service. + +.Procedure + +* Set the maximum number of pods for the service by using the `--min-scale` flag: ++ +.Examples +[source,terminal] +---- +$ kn service create --image --min-scale +---- ++ +[source,terminal] +---- +$ kn service create example-service --image quay.io/openshift-knative/knative-eventing-sources-event-display:latest --min-scale 2 +---- + +// TODO: Check if it can be used with update and other service commands. diff --git a/modules/serverless-concurrency-limits-configure-hard.adoc b/modules/serverless-concurrency-limits-configure-hard.adoc new file mode 100644 index 000000000000..7dfdd5f19e79 --- /dev/null +++ b/modules/serverless-concurrency-limits-configure-hard.adoc @@ -0,0 +1,42 @@ +[id="serverless-concurrency-limits-configure-hard_{context}"] += Configuring a hard concurrency limit + +You can specify a hard concurrency limit for your Knative service by modifying the `containerConcurrency` spec or by using the `kn service` command with the correct flags. + +// However, a default value can be set for the Revision's containerConcurrency field in config-defaults.yaml. +// add note about this for admins to see? Need more details about config-defaults though + +.Procedure + +* Optional: Set the `containerConcurrency` spec for your Knative service in the spec of the `Service` custom resource: ++ +.Example service spec +[source,yaml] +---- +apiVersion: serving.knative.dev/v1 +kind: Service +metadata: + name: example-service + namespace: default +spec: + template: + spec: + containerConcurrency: 50 +---- ++ +The default value is `0`, which means that there is no limit on the number of requests that are permitted to flow into one pod of the service at a time. ++ +A value greater than `0` specifies the exact number of requests that are permitted to flow into one pod of the service at a time. This example would enable a hard concurrency limit of 50 requests at a time. + +* Optional: Use the `kn service` command to specify the `--concurrency-limit` flag: ++ +[source,terminal] +---- +$ kn service create --image --concurrency-limit +---- ++ +.Example command to create a service with a concurrency limit of 50 requests +[source,terminal] +---- +$ kn service create example-service --image quay.io/openshift-knative/knative-eventing-sources-event-display:latest --concurrency-limit 50 +---- diff --git a/modules/serverless-concurrency-limits-configure-soft.adoc b/modules/serverless-concurrency-limits-configure-soft.adoc new file mode 100644 index 000000000000..b878b065f218 --- /dev/null +++ b/modules/serverless-concurrency-limits-configure-soft.adoc @@ -0,0 +1,36 @@ +[id="serverless-concurrency-limits-configure-soft_{context}"] += Configuring a soft concurrency target + +You can specify a soft concurrency target for your Knative service by setting the `autoscaling.knative.dev/target` annotation in the spec, or by using the `kn service` command with the correct flags. + +.Procedure + +* Optional: Set the `autoscaling.knative.dev/target` annotation for your Knative service in the spec of the `Service` custom resource: ++ +.Example service spec +[source,yaml] +---- +apiVersion: serving.knative.dev/v1 +kind: Service +metadata: + name: example-service + namespace: default +spec: + template: + metadata: + annotations: + autoscaling.knative.dev/target: "200" +---- + +* Optional: Use the `kn service` command to specify the `--concurrency-target` flag: ++ +[source,terminal] +---- +$ kn service create --image --concurrency-target +---- ++ +.Example command to create a service with a concurrency target of 50 requests +[source,terminal] +---- +$ kn service create example-service --image quay.io/openshift-knative/knative-eventing-sources-event-display:latest --concurrency-target 50 +---- diff --git a/modules/serverless-concurrency-limits.adoc b/modules/serverless-concurrency-limits.adoc new file mode 100644 index 000000000000..75fe8f860a6f --- /dev/null +++ b/modules/serverless-concurrency-limits.adoc @@ -0,0 +1,17 @@ +[id="serverless-concurrency-limits_{context}"] += Concurrency limits and targets + +Concurrency can be configured as either a _soft limit_ or a _hard limit_: + +* A soft limit is a targeted requests limit, rather than a strictly enforced bound. For example, if there is a sudden burst of traffic, the soft limit target can be exceeded. + +* A hard limit is a strictly enforced upper bound requests limit. If concurrency reaches the hard limit, surplus requests are buffered and must wait until there is enough free capacity to execute the requests. ++ +[IMPORTANT] +==== +Using a hard limit configuration is only recommended if there is a clear use case for it with your application. Having a low, hard limit specified may have a negative impact on the throughput and latency of an application, and might cause cold starts. +==== + +Adding a soft target and a hard limit means that the autoscaler targets the soft target number of concurrent requests, but imposes a hard limit of the hard limit value for the maximum number of requests. + +If the hard limit value is less than the soft limit value, the soft limit value is tuned down, because there is no need to target more requests than the number that can actually be handled. diff --git a/modules/serverless-workflow-autoscaling-kn.adoc b/modules/serverless-workflow-autoscaling-kn.adoc deleted file mode 100644 index 7312014cecac..000000000000 --- a/modules/serverless-workflow-autoscaling-kn.adoc +++ /dev/null @@ -1,24 +0,0 @@ -[id="autoscaling-workflow-kn_{context}"] -= Autoscaling workflows by using the Knative CLI - -You can edit autoscaling capabilities for your cluster by using `kn` to modify Knative services without editing YAML files directly. - -You can use the `kn service create` and `kn service update` commands with the appropriate flags as described below to configure autoscaling behavior. - -[cols=2*,options="header"] -|=== -|Flag -|Description - -|`--concurrency-limit int` -|Sets a hard limit of concurrent requests to be processed by a single revision. - -|`--concurrency-target int` -|Provides a recommendation for when to scale up revisions, based on the concurrent number of incoming requests. Defaults to `--concurrency-limit`. - -|`--max-scale int` -|Maximum number of revisions. - -|`--min-scale int` -|Minimum number of revisions. -|=== diff --git a/serverless/autoscaling/serverless-autoscaling-concurrency.adoc b/serverless/autoscaling/serverless-autoscaling-concurrency.adoc new file mode 100644 index 000000000000..30b41088e814 --- /dev/null +++ b/serverless/autoscaling/serverless-autoscaling-concurrency.adoc @@ -0,0 +1,13 @@ +[id="serverless-autoscaling-concurrency"] += Concurrency +include::modules/common-attributes.adoc[] +include::modules/serverless-document-attributes.adoc[] +:context: serverless-autoscaling-concurrency + +toc::[] + +Concurrency determines the number of simultaneous requests that can be processed by each pod of an application at any given time. + +include::modules/serverless-concurrency-limits.adoc[leveloffset=+1] +include::modules/serverless-concurrency-limits-configure-soft.adoc[leveloffset=+2] +include::modules/serverless-concurrency-limits-configure-hard.adoc[leveloffset=+2] diff --git a/serverless/autoscaling/serverless-autoscaling-scale-bounds.adoc b/serverless/autoscaling/serverless-autoscaling-scale-bounds.adoc new file mode 100644 index 000000000000..680173fb00ab --- /dev/null +++ b/serverless/autoscaling/serverless-autoscaling-scale-bounds.adoc @@ -0,0 +1,71 @@ +[id="serverless-autoscaling-scale-bounds"] += Scale bounds +include::modules/common-attributes.adoc[] +include::modules/serverless-document-attributes.adoc[] +:context: serverless-autoscaling-scale-bounds + +toc::[] + +Scale bounds determine the minimum and maximum numbers of pods that can serve an application at any given time. + +You can set scale bounds for an application to help prevent cold starts or control computing costs. + +[id="serverless-autoscaling-minscale"] +== Minimum scale bounds + +The minimum number of pods that can serve an application is determined by the `minScale` annotation. + +The `minScale` value defaults to `0` pods if the following conditions are met: + +* The `minScale` annotation is not set +* Scaling to zero is enabled +* The class `KPA` is used + +If scale to zero is not enabled, the `minScale` value defaults to `1`. + +// TODO: Document KPA if supported, link to docs about setting class + +// TO DO: +// Add info / links about enabling and disabling autoscaling (admin docs) +// if `enable-scale-to-zero` is set to `false` in the `config-autoscaler` config map. + +.Example service spec with `minScale` spec +[source,yaml] +---- +apiVersion: serving.knative.dev/v1 +kind: Service +metadata: + name: example-service + namespace: default +spec: + template: + metadata: + annotations: + autoscaling.knative.dev/minScale: "0" +... +---- + +include::modules/serverless-autoscaling-minscale-kn.adoc[leveloffset=+2] + +[id="serverless-autoscaling-maxscale"] +== Maximum scale bounds + +The maximum number of pods that can serve an application is determined by the `maxScale` annotation. If the `maxScale` annotation is not set, there is no upper limit for the number of pods created. + +.Example service spec with `maxScale` spec +[source,yaml] +---- +apiVersion: serving.knative.dev/v1 +kind: Service +metadata: + name: example-service + namespace: default +spec: + template: + metadata: + annotations: + autoscaling.knative.dev/maxScale: "10" +... +---- + +include::modules/serverless-autoscaling-maxscale-kn.adoc[leveloffset=+2] diff --git a/serverless/autoscaling/serverless-autoscaling.adoc b/serverless/autoscaling/serverless-autoscaling.adoc new file mode 100644 index 000000000000..5162888c6643 --- /dev/null +++ b/serverless/autoscaling/serverless-autoscaling.adoc @@ -0,0 +1,23 @@ +[id="serverless-autoscaling"] += About autoscaling +include::modules/common-attributes.adoc[] +include::modules/serverless-document-attributes.adoc[] +:context: serverless-autoscaling + +toc::[] + +Knative Serving provides automatic scaling, or _autoscaling_, for applications to match incoming demand. For example, if an application is receiving no traffic, and scale to zero is enabled, Knative Serving scales the application down to zero pods. If scaling to zero is disabled, the application is scaled down to the minimum number of pods specified for applications on the cluster. Pods can also be scaled up to meet demand if traffic to the application increases. + +To enable autoscaling for Knative Serving, you must configure xref:../autoscaling/serverless-autoscaling-concurrency.adoc#serverless-autoscaling-concurrency[concurrency] and xref:../autoscaling/serverless-autoscaling-scale-bounds.adoc#serverless-autoscaling-scale-bounds[scale bounds] for your application. + +[NOTE] +==== +Any limits or targets set in the revision template are measured against a single instance of your application. For example, setting the `target` annotation to `50` configures the autoscaler to scale the application so that each revision handles 50 requests at a time. +==== + +// TODO: Add section talking about what can be configured and why + +[id="additional-resources_serverless-autoscaling"] +== Additional resources + +* See the documentation about xref:../admin_guide/serverless-admin-metrics.adoc#serverless-autoscaler-metrics_serverless-admin-metrics[autoscaling metrics] diff --git a/serverless/knative_serving/configuring-knative-serving-autoscaling.adoc b/serverless/knative_serving/configuring-knative-serving-autoscaling.adoc deleted file mode 100644 index 1ecce753fa17..000000000000 --- a/serverless/knative_serving/configuring-knative-serving-autoscaling.adoc +++ /dev/null @@ -1,20 +0,0 @@ -[id="configuring-knative-serving-autoscaling"] -= Configuring Knative Serving autoscaling -include::modules/common-attributes.adoc[] -include::modules/serverless-document-attributes.adoc[] -:context: configuring-knative-serving-autoscaling - -toc::[] - -{ServerlessProductName} provides capabilities for automatic pod scaling, including scaling inactive pods to zero. -To enable autoscaling for Knative Serving, you must configure concurrency and scale bounds in the revision template. - -[NOTE] -==== -Any limits or targets set in the revision template are measured against a single instance of your application. For example, setting the `target` annotation to `50` will configure the autoscaler to scale the application so that each revision will handle 50 requests at a time. -==== - -// Autoscaling -include::modules/serverless-workflow-autoscaling-kn.adoc[leveloffset=+1] -include::modules/knative-serving-concurrent-autoscaling-requests.adoc[leveloffset=+1] -include::modules/configuring-scale-bounds-knative.adoc[leveloffset=+1]