Skip to content

Commit

Permalink
Merge pull request #38284 from abrennan89/SRVKS-573
Browse files Browse the repository at this point in the history
SRVKS-573: Updating autoscaling docs
  • Loading branch information
abrennan89 committed Nov 15, 2021
2 parents 8cde50a + 510b974 commit 58fa259
Show file tree
Hide file tree
Showing 13 changed files with 253 additions and 163 deletions.
14 changes: 11 additions & 3 deletions _topic_map.yml
Expand Up @@ -3189,9 +3189,6 @@ Topics:
# Knative services
- Name: Serverless applications
File: serverless-applications
# Autoscaling
- Name: Configuring Knative Serving autoscaling
File: configuring-knative-serving-autoscaling
- Name: Traffic management
File: serverless-traffic-management
- Name: Cluster logging with OpenShift Serverless
Expand All @@ -3208,6 +3205,17 @@ Topics:
- Name: Metrics
File: serverless-serving-metrics
#
# Autoscaling
- Name: Autoscaling
Dir: autoscaling
Topics:
- Name: About autoscaling
File: serverless-autoscaling
- Name: Scale bounds
File: serverless-autoscaling-scale-bounds
- Name: Concurrency
File: serverless-autoscaling-concurrency
#
# Knative Eventing
- Name: Knative Eventing
Dir: knative_eventing
Expand Down
32 changes: 0 additions & 32 deletions modules/configuring-scale-bounds-knative.adoc

This file was deleted.

84 changes: 0 additions & 84 deletions modules/knative-serving-concurrent-autoscaling-requests.adoc

This file was deleted.

19 changes: 19 additions & 0 deletions modules/serverless-autoscaling-maxscale-kn.adoc
@@ -0,0 +1,19 @@
[id="serverless-autoscaling-maxscale-kn_{context}"]
= Setting the maxScale annotation by using the Knative CLI

You can use the `kn service` command with the `--max-scale` flag to create or modify the `--max-scale` value for a service.

.Procedure

* Set the maximum number of pods for the service by using the `--max-scale` flag:
+
[source,terminal]
----
$ kn service create <service_name> --image <image_uri> --max-scale <integer>
----
+
.Example command
[source,terminal]
----
$ kn service create example-service --image quay.io/openshift-knative/knative-eventing-sources-event-display:latest --max-scale 10
----
21 changes: 21 additions & 0 deletions modules/serverless-autoscaling-minscale-kn.adoc
@@ -0,0 +1,21 @@
[id="serverless-autoscaling-minscale_{context}"]
= Setting the minScale annotation by using the Knative CLI

You can use the `kn service` command with the `--min-scale` flag to create or modify the `--min-scale` value for a service.

.Procedure

* Set the maximum number of pods for the service by using the `--min-scale` flag:
+
.Examples
[source,terminal]
----
$ kn service create <service_name> --image <image_uri> --min-scale <integer>
----
+
[source,terminal]
----
$ kn service create example-service --image quay.io/openshift-knative/knative-eventing-sources-event-display:latest --min-scale 2
----

// TODO: Check if it can be used with update and other service commands.
42 changes: 42 additions & 0 deletions modules/serverless-concurrency-limits-configure-hard.adoc
@@ -0,0 +1,42 @@
[id="serverless-concurrency-limits-configure-hard_{context}"]
= Configuring a hard concurrency limit

You can specify a hard concurrency limit for your Knative service by modifying the `containerConcurrency` spec or by using the `kn service` command with the correct flags.

// However, a default value can be set for the Revision's containerConcurrency field in config-defaults.yaml.
// add note about this for admins to see? Need more details about config-defaults though

.Procedure

* Optional: Set the `containerConcurrency` spec for your Knative service in the spec of the `Service` custom resource:
+
.Example service spec
[source,yaml]
----
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: example-service
namespace: default
spec:
template:
spec:
containerConcurrency: 50
----
+
The default value is `0`, which means that there is no limit on the number of requests that are permitted to flow into one pod of the service at a time.
+
A value greater than `0` specifies the exact number of requests that are permitted to flow into one pod of the service at a time. This example would enable a hard concurrency limit of 50 requests at a time.

* Optional: Use the `kn service` command to specify the `--concurrency-limit` flag:
+
[source,terminal]
----
$ kn service create <service_name> --image <image_uri> --concurrency-limit <integer>
----
+
.Example command to create a service with a concurrency limit of 50 requests
[source,terminal]
----
$ kn service create example-service --image quay.io/openshift-knative/knative-eventing-sources-event-display:latest --concurrency-limit 50
----
36 changes: 36 additions & 0 deletions modules/serverless-concurrency-limits-configure-soft.adoc
@@ -0,0 +1,36 @@
[id="serverless-concurrency-limits-configure-soft_{context}"]
= Configuring a soft concurrency target

You can specify a soft concurrency target for your Knative service by setting the `autoscaling.knative.dev/target` annotation in the spec, or by using the `kn service` command with the correct flags.

.Procedure

* Optional: Set the `autoscaling.knative.dev/target` annotation for your Knative service in the spec of the `Service` custom resource:
+
.Example service spec
[source,yaml]
----
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: example-service
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/target: "200"
----

* Optional: Use the `kn service` command to specify the `--concurrency-target` flag:
+
[source,terminal]
----
$ kn service create <service_name> --image <image_uri> --concurrency-target <integer>
----
+
.Example command to create a service with a concurrency target of 50 requests
[source,terminal]
----
$ kn service create example-service --image quay.io/openshift-knative/knative-eventing-sources-event-display:latest --concurrency-target 50
----
17 changes: 17 additions & 0 deletions modules/serverless-concurrency-limits.adoc
@@ -0,0 +1,17 @@
[id="serverless-concurrency-limits_{context}"]
= Concurrency limits and targets

Concurrency can be configured as either a _soft limit_ or a _hard limit_:

* A soft limit is a targeted requests limit, rather than a strictly enforced bound. For example, if there is a sudden burst of traffic, the soft limit target can be exceeded.

* A hard limit is a strictly enforced upper bound requests limit. If concurrency reaches the hard limit, surplus requests are buffered and must wait until there is enough free capacity to execute the requests.
+
[IMPORTANT]
====
Using a hard limit configuration is only recommended if there is a clear use case for it with your application. Having a low, hard limit specified may have a negative impact on the throughput and latency of an application, and might cause cold starts.
====

Adding a soft target and a hard limit means that the autoscaler targets the soft target number of concurrent requests, but imposes a hard limit of the hard limit value for the maximum number of requests.

If the hard limit value is less than the soft limit value, the soft limit value is tuned down, because there is no need to target more requests than the number that can actually be handled.
24 changes: 0 additions & 24 deletions modules/serverless-workflow-autoscaling-kn.adoc

This file was deleted.

13 changes: 13 additions & 0 deletions serverless/autoscaling/serverless-autoscaling-concurrency.adoc
@@ -0,0 +1,13 @@
[id="serverless-autoscaling-concurrency"]
= Concurrency
include::modules/common-attributes.adoc[]
include::modules/serverless-document-attributes.adoc[]
:context: serverless-autoscaling-concurrency

toc::[]

Concurrency determines the number of simultaneous requests that can be processed by each pod of an application at any given time.

include::modules/serverless-concurrency-limits.adoc[leveloffset=+1]
include::modules/serverless-concurrency-limits-configure-soft.adoc[leveloffset=+2]
include::modules/serverless-concurrency-limits-configure-hard.adoc[leveloffset=+2]
71 changes: 71 additions & 0 deletions serverless/autoscaling/serverless-autoscaling-scale-bounds.adoc
@@ -0,0 +1,71 @@
[id="serverless-autoscaling-scale-bounds"]
= Scale bounds
include::modules/common-attributes.adoc[]
include::modules/serverless-document-attributes.adoc[]
:context: serverless-autoscaling-scale-bounds

toc::[]

Scale bounds determine the minimum and maximum numbers of pods that can serve an application at any given time.

You can set scale bounds for an application to help prevent cold starts or control computing costs.

[id="serverless-autoscaling-minscale"]
== Minimum scale bounds

The minimum number of pods that can serve an application is determined by the `minScale` annotation.

The `minScale` value defaults to `0` pods if the following conditions are met:

* The `minScale` annotation is not set
* Scaling to zero is enabled
* The class `KPA` is used

If scale to zero is not enabled, the `minScale` value defaults to `1`.

// TODO: Document KPA if supported, link to docs about setting class

// TO DO:
// Add info / links about enabling and disabling autoscaling (admin docs)
// if `enable-scale-to-zero` is set to `false` in the `config-autoscaler` config map.

.Example service spec with `minScale` spec
[source,yaml]
----
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: example-service
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "0"
...
----

include::modules/serverless-autoscaling-minscale-kn.adoc[leveloffset=+2]

[id="serverless-autoscaling-maxscale"]
== Maximum scale bounds

The maximum number of pods that can serve an application is determined by the `maxScale` annotation. If the `maxScale` annotation is not set, there is no upper limit for the number of pods created.

.Example service spec with `maxScale` spec
[source,yaml]
----
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: example-service
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/maxScale: "10"
...
----

include::modules/serverless-autoscaling-maxscale-kn.adoc[leveloffset=+2]

0 comments on commit 58fa259

Please sign in to comment.