Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions _topic_maps/_topic_map.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2707,11 +2707,7 @@ Topics:
- Name: Serverless applications
File: serverless-applications
- Name: Autoscaling
File: serverless-autoscaling
- Name: Scale bounds
File: serverless-autoscaling-scale-bounds
- Name: Concurrency
File: serverless-autoscaling-concurrency
File: serverless-autoscaling-developer
- Name: Traffic management
File: serverless-traffic-management
- Name: Routing
Expand Down Expand Up @@ -2749,6 +2745,8 @@ Topics:
File: serverless-cluster-admin-serving
- Name: Configuring the Knative Serving custom resource
File: knative-serving-CR-config
- Name: Autoscaling
File: serverless-admin-autoscaling
# Ingress options
- Name: Integrating Service Mesh with OpenShift Serverless
File: serverless-ossm-setup
Expand Down
2 changes: 1 addition & 1 deletion modules/serverless-autoscaling-minscale-kn.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ You can use the `kn service` command with the `--min-scale` flag to create or mo

* Set the minimum number of replicas for the service by using the `--min-scale` flag:
+
.Examples
[source,terminal]
----
$ kn service create <service_name> --image <image_uri> --min-scale <integer>
----
+
.Example command
[source,terminal]
----
$ kn service create example-service --image quay.io/openshift-knative/knative-eventing-sources-event-display:latest --min-scale 2
Expand Down
31 changes: 31 additions & 0 deletions modules/serverless-enable-scale-to-zero.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
// Module included in the following assemblies:
//
// * serverless/admin_guide/serverless-admin-autoscaling.adoc

[id="serverless-enable-scale-to-zero_{context}"]
= Enabling scale-to-zero

Cluster administrators can enable or disable scale-to-zero globally for the cluster.

.Prerequisites

* You have installed {ServerlessOperatorName} and Knative Serving on your cluster.
* You have cluster administrator permissions.
* You are using the default Knative Pod Autoscaler. The scale to zero feature is not available if you are using the Kubernetes Horizontal Pod Autoscaler.

.Procedure

* Modify the `enable-scale-to-zero` spec in the `KnativeServing` CR:
+
[source,yaml]
----
apiVersion: operator.knative.dev/v1alpha1
kind: KnativeServing
metadata:
name: knative-serving
spec:
config:
autoscaler:
enable-scale-to-zero: "false" <1>
----
<1> The `enable-scale-to-zero` spec can be either `"true"` or `"false"`. If set to true, scale-to-zero is enabled. If set to false, applications are scaled down to the configured _minimum scale bound_. The default value is `"true"`.
31 changes: 31 additions & 0 deletions modules/serverless-scale-to-zero-grace-period.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
// Module included in the following assemblies:
//
// * serverless/admin_guide/serverless-admin-autoscaling.adoc

[id="serverless-scale-to-zero-grace-period_{context}"]
= Configuring the scale-to-zero grace period

This setting specifies an upper bound time limit that Knative waits for scale-from-zero machinery to be in place before the last replica of an application is removed.

.Prerequisites

* You have installed {ServerlessOperatorName} and Knative Serving on your cluster.
* You have cluster administrator permissions.
* You are using the default Knative Pod Autoscaler. The scale to zero feature is not available if you are using the Kubernetes Horizontal Pod Autoscaler.

.Procedure

* Modify the `scale-to-zero-grace-period` spec in the `KnativeServing` CR:
+
[source,yaml]
----
apiVersion: operator.knative.dev/v1alpha1
kind: KnativeServing
metadata:
name: knative-serving
spec:
config:
autoscaler:
scale-to-zero-grace-period: "30s" <1>
----
<1> The grace period time in seconds. The default value is 30 seconds.
12 changes: 12 additions & 0 deletions serverless/admin_guide/serverless-admin-autoscaling.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
[id="serverless-admin-autoscaling"]
= Autoscaling
include::modules/common-attributes.adoc[]
include::modules/serverless-document-attributes.adoc[]
:context: serverless-admin-autoscaling

toc::[]

As a cluster administrator, you can set global and per-namespace default configurations for autoscaling features by modifying the `KnativeServing` custom resource (CR). This propagates changes to the relevant config maps.

include::modules/serverless-enable-scale-to-zero.adoc[leveloffset=+1]
include::modules/serverless-scale-to-zero-grace-period.adoc[leveloffset=+1]
14 changes: 0 additions & 14 deletions serverless/develop/serverless-autoscaling-concurrency.adoc

This file was deleted.

98 changes: 98 additions & 0 deletions serverless/develop/serverless-autoscaling-developer.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
[id="serverless-autoscaling-developer"]
= Autoscaling
include::modules/common-attributes.adoc[]
include::modules/serverless-document-attributes.adoc[]
:context: serverless-autoscaling-developer

toc::[]

Knative Serving provides automatic scaling, or _autoscaling_, for applications to match incoming demand. For example, if an application is receiving no traffic, and scale-to-zero is enabled, Knative Serving scales the application down to zero replicas. If scale-to-zero is disabled, the application is scaled down to the xref:../../serverless/develop/serverless-autoscaling-developer.adoc#serverless-autoscaling-developer-minscale[minimum number of replicas specified for applications on the cluster]. Replicas can also be scaled up to meet demand if traffic to the application increases.

If Knative autoscaling is enabled for your cluster, you can configure concurrency and scale bounds for your application.

[NOTE]
====
Any limits or targets set in the revision template are measured against a single instance of your application. For example, setting the `target` annotation to `50` configures the autoscaler to scale the application so that each revision handles 50 requests at a time.
====

[id="serverless-autoscaling-developer-scale-bounds"]
== Scale bounds

Scale bounds determine the minimum and maximum numbers of replicas that can serve an application at any given time.

You can set scale bounds for an application to help prevent cold starts or control computing costs.

[id="serverless-autoscaling-developer-minscale"]
=== Minimum scale bounds

The minimum number of replicas that can serve an application is determined by the `minScale` annotation.

The `minScale` value defaults to `0` replicas if the following conditions are met:

* The `minScale` annotation is not set
* Scaling to zero is enabled
* The class `KPA` is used

If scale to zero is not enabled, the `minScale` value defaults to `1`.

// TODO: Document KPA if supported, link to docs about setting class

// TO DO:
// Add info / links about enabling and disabling autoscaling (admin docs)
// if `enable-scale-to-zero` is set to `false` in the `config-autoscaler` config map.

.Example service spec with `minScale` spec
[source,yaml]
----
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: example-service
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/minScale: "0"
...
----

include::modules/serverless-autoscaling-minscale-kn.adoc[leveloffset=+3]

[id="serverless-autoscaling-developer-maxscale"]
=== Maximum scale bounds

The maximum number of replicas that can serve an application is determined by the `maxScale` annotation. If the `maxScale` annotation is not set, there is no upper limit for the number of replicas created.

.Example service spec with `maxScale` spec
[source,yaml]
----
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: example-service
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/maxScale: "10"
...
----

include::modules/serverless-autoscaling-maxscale-kn.adoc[leveloffset=+3]

[id="serverless-autoscaling-developer-concurrency"]
== Concurrency

Concurrency determines the number of simultaneous requests that can be processed by each replica of an application at any given time.

include::modules/serverless-concurrency-limits.adoc[leveloffset=+2]
include::modules/serverless-concurrency-limits-configure-soft.adoc[leveloffset=+2]
include::modules/serverless-concurrency-limits-configure-hard.adoc[leveloffset=+2]
include::modules/serverless-target-utilization.adoc[leveloffset=+2]

[id="additional-resources_serverless-autoscaling-developer"]
== Additional resources

* Scale-to-zero can be enabled or disabled for the cluster by cluster administrators. For more information, see xref:../../serverless/admin_guide/serverless-admin-autoscaling.adoc#serverless-enable-scale-to-zero_serverless-admin-autoscaling[Enabling scale-to-zero].
71 changes: 0 additions & 71 deletions serverless/develop/serverless-autoscaling-scale-bounds.adoc

This file was deleted.

16 changes: 0 additions & 16 deletions serverless/develop/serverless-autoscaling.adoc

This file was deleted.