knative · knative-prow-robot · Jun 4, 2020 · May 7, 2020 · evankanderson · May 28, 2020
diff --git a/docs/README.md b/docs/README.md
@@ -12,7 +12,7 @@ focus on solving mundane but difficult tasks such as:
 
 - [Deploying a container](./serving/getting-started-knative-app.md)
 - [Routing and managing traffic with blue/green deployment](./serving/samples/blue-green-deployment.md)
-- [Scaling automatically and sizing workloads based on demand](./serving/configuring-autoscaling.md)
+- [Scaling automatically and sizing workloads based on demand](./serving/autoscaling)
 - [Binding running services to eventing ecosystems](./eventing/samples/kubernetes-event-source/)
 
 Developers on Knative can use familiar idioms, languages, and frameworks to
@@ -88,7 +88,7 @@ Follow the links below to learn more about Knative.
 
 ### Samples and demos
 
-- [Autoscaling](./serving/samples/autoscale-go/README.md)
+- [Autoscaling](./serving/autoscaling/autoscale-go/)
 - [Binding running services to eventing ecosystems](./eventing/samples/kubernetes-event-source/)
 - [Telemetry](./serving/samples/telemetry-go/README.md)
 - [REST API sample](./serving/samples/rest-api-go/README.md)

diff --git a/docs/serving/README.md b/docs/serving/README.md
@@ -5,7 +5,7 @@ and scales to support advanced scenarios.
 The Knative Serving project provides middleware primitives that enable:
 
 - Rapid deployment of serverless containers
-- Automatic scaling up and down to zero
+- [Automatic scaling up and down to zero](./autoscaling/README.md)
 - Routing and network programming for Istio components
 - Point-in-time snapshots of deployed code and configurations
 
@@ -36,7 +36,7 @@ serverless workload behaves on the cluster:
   are immutable objects and can be retained for as long as useful. Knative
   Serving Revisions can be automatically scaled up and down according to
   incoming traffic. See
-  [Configuring the Autoscaler](./configuring-autoscaling.md) for more
+  [Configuring the Autoscaler](./autoscaling) for more
   information.
 
 ![Diagram that displays how the Serving resources coordinate with each other.](https://github.com/knative/serving/raw/master/docs/spec/images/object_model.png)
@@ -83,5 +83,3 @@ in the Knative Serving repository.
 
 See the [Knative Serving Issues](https://github.com/knative/serving/issues) page
 for a full list of known issues.
-
-
diff --git a/docs/serving/autoscaling/README.md b/docs/serving/autoscaling/README.md
@@ -0,0 +1,18 @@
+One of the main features of Knative is automatic scaling of replicas for an application to closely match incoming demand, including scaling applications to zero if no traffic is being received.
+Knative Serving enables this by default, using the Knative Pod Autoscaler (KPA).
+The Autoscaler component watches traffic flow to the application, and scales replicas up or down based on configured metrics.
+
+Knative services default to using autoscaling settings that are suitable for the majority of use cases. However, some workloads may require a custom, more finely-tuned configuration.
+This guide provides information about configuration options that you can modify to fit the requirements of your workload.
+
+For more information about how autoscaling for Knative works, see the [Autoscaling concepts](./autoscaling-concepts.md) documentation.
+
+For more information about which metrics can be used to control the Autoscaler, see the [metrics](./autoscaling-metrics.md) documentation.
+
+## Optional autoscaling configuration tasks
+
+* Configure your Knative deployment to use the Kubernetes [Horizontal Pod Autoscaler (HPA)](../../install/any-kubernetes-cluster.md#optional-serving-extensions) instead of the default KPA.
+* Disable scale to zero functionality for your cluster ([global configuration only](./scale-to-zero.md)).
+* Configure the [type of metrics](./autoscaling-metrics.md) your Autoscaler consumes.
+* Configure [concurrency limits](./concurrency.md) for applications.
+* Try out the [Go Autoscale Sample App](./autoscale-go/README.md).
diff --git a/docs/serving/autoscaling/_index.md b/docs/serving/autoscaling/_index.md
@@ -0,0 +1,10 @@
+---
+title: "Autoscaling"
+linkTitle: "Autoscaling"
+weight: 20
+type: "docs"
+aliases:
+    - /docs/serving/configuring-autoscaling/
+---
+
+{{% readfile file="README.md" %}}
diff --git a/docs/serving/samples/autoscale-go/Dockerfile → ...rving/autoscaling/autoscale-go/Dockerfile b/docs/serving/samples/autoscale-go/Dockerfile → ...rving/autoscaling/autoscale-go/Dockerfile
diff --git a/docs/serving/samples/autoscale-go/OWNERS → docs/serving/autoscaling/autoscale-go/OWNERS b/docs/serving/samples/autoscale-go/OWNERS → docs/serving/autoscaling/autoscale-go/OWNERS
diff --git a/docs/serving/samples/autoscale-go/README.md → ...erving/autoscaling/autoscale-go/README.md b/docs/serving/samples/autoscale-go/README.md → ...erving/autoscaling/autoscale-go/README.md
diff --git a/...serving/samples/autoscale-go/autoscale.go → ...ing/autoscaling/autoscale-go/autoscale.go b/...serving/samples/autoscale-go/autoscale.go → ...ing/autoscaling/autoscale-go/autoscale.go
diff --git a/docs/serving/autoscaling/autoscale-go/index.md b/docs/serving/autoscaling/autoscale-go/index.md
@@ -0,0 +1,10 @@
+---
+title: "Autoscale Sample App - Go"
+linkTitle: "Autoscale Sample App - Go"
+weight: 100
+type: "docs"
+aliases:
+    - /docs/serving/samples/autoscale-go
+---
+
+{{% readfile file="README.md" %}}
diff --git a/...amples/autoscale-go/request-dashboard.png → ...caling/autoscale-go/request-dashboard.png b/...amples/autoscale-go/request-dashboard.png → ...caling/autoscale-go/request-dashboard.png
diff --git a/.../samples/autoscale-go/scale-dashboard.png → ...oscaling/autoscale-go/scale-dashboard.png b/.../samples/autoscale-go/scale-dashboard.png → ...oscaling/autoscale-go/scale-dashboard.png
diff --git a/...serving/samples/autoscale-go/service.yaml → ...ing/autoscaling/autoscale-go/service.yaml b/...serving/samples/autoscale-go/service.yaml → ...ing/autoscaling/autoscale-go/service.yaml
diff --git a/...serving/samples/autoscale-go/test/test.go → ...ing/autoscaling/autoscale-go/test/test.go b/...serving/samples/autoscale-go/test/test.go → ...ing/autoscaling/autoscale-go/test/test.go
diff --git a/docs/serving/autoscaling/autoscaling-concepts.md b/docs/serving/autoscaling/autoscaling-concepts.md
@@ -0,0 +1,135 @@
+---
+title: "Autoscaling concepts"
+linkTitle: "Autoscaling concepts"
+weight: 01
+type: "docs"
+---
+
+This section covers conceptual information about which Autoscaler types are supported, as well as fundamental information about how autoscaling is configured.
+
+## Supported Autoscaler types
+
+Knative Serving supports the implementation of Knative Pod Autoscaler (KPA) and Kubernetes' Horizontal Pod Autoscaler (HPA). The features and limitations of each of these Autoscalers are listed below.
+
+**IMPORTANT:** If you want to use Kubernetes Horizontal Pod Autoscaler (HPA), you must install it after you install [Knative Serving](../../install/any-kubernetes-cluster.md#optional-serving-extensions).
+
+### Knative Pod Autoscaler (KPA)
+
+* Part of the Knative Serving core and enabled by default once Knative Serving is installed.
+* Supports scale to zero functionality.
+* Does not support CPU-based autoscaling.
+
+### Horizontal Pod Autoscaler (HPA)
+
+* Not part of the Knative Serving core, and must be enabled after [Knative Serving installation](../../install/any-kubernetes-cluster.md#optional-serving-extensions).
+* Does not support scale to zero functionality.
+* Supports CPU-based autoscaling.
+
+### Configuring the Autoscaler implementation
+
+The type of Autoscaler implementation (KPA or HPA) can be configured by using the `class` annotation.
+
+* **Global settings key:** `pod-autoscaler-class`
+* **Per-revision annotation key:** `autoscaling.knative.dev/class`
+* **Possible values:** `"kpa.autoscaling.knative.dev"` or `"hpa.autoscaling.knative.dev"`
+* **Default:** `"kpa.autoscaling.knative.dev"`
+
+**Example:**
+{{< tabs name="class" default="Per Revision" >}}
+{{% tab name="Per Revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/class: "kpa.autoscaling.knative.dev"
+    spec:
+      containers:
+        - image: gcr.io/knative-samples/helloworld-go
+```
+{{< /tab >}}
+{{% tab name="Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ pod-autoscaler-class: "kpa.autoscaling.knative.dev"
+```
+{{< /tab >}}
+{{% tab name="Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      pod-autoscaler-class: "kpa.autoscaling.knative.dev"
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+## Global versus per-revision settings
+
+Configuring for autoscaling in Knative can be set using either global or per-revision settings.
+
+1. If no per-revision autoscaling settings are specified, the global settings will be used.
+1. If per-revision settings are specified, these will override the global settings when both types of settings exist.
+
+### Global settings
+
+Global settings for autoscaling are configured using the `config-autoscaler` ConfigMap. If you installed Knative Serving using the Operator, you can set global configuration settings in the `spec.config.autoscaler` ConfigMap, located in the `KnativeServing` custom resource (CR).
+
+#### Example of the default autoscaling ConfigMap
+
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ container-concurrency-target-default: "100"
+ container-concurrency-target-percentage: "0.7"
+ enable-scale-to-zero: "true"
+ max-scale-up-rate: "1000"
+ max-scale-down-rate: "2"
+ panic-window-percentage: "10"
+ panic-threshold-percentage: "200"
+ scale-to-zero-grace-period: "30s"
+ scale-to-zero-pod-retention-period: "0s"
+ stable-window: "60s"
+ target-burst-capacity: "200"
+ requests-per-second-target-default: "200"
+```
+
+### Per-revision settings
+
+Per-revision settings for autoscaling are configured by adding _annotations_ to a revision.
+
+### Example
+
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/target: "70"
+```
+
+**IMPORTANT:** If you are creating revisions by using a service or configuration, you must set the annotations in the _revision template_ so that any modifications will be applied to each revision as they are created.
+Setting annotations in the top level metadata of a single revision will not propagate the changes to other revisions and will not apply changes to the autoscaling configuration for your application.
diff --git a/docs/serving/autoscaling/autoscaling-metrics.md b/docs/serving/autoscaling/autoscaling-metrics.md
@@ -0,0 +1,74 @@
+---
+title: "Metrics"
+linkTitle: "Metrics"
+weight: 03
+type: "docs"
+---
+
+The metric configuration defines which metric type is watched by the Autoscaler.
+
+## Setting metrics per revision
+
+For [per-revision](./autoscaling-concepts.md) configuration, this is determined using the `autoscaling.knative.dev/metric` annotation.
+The possible metric types that can be configured per revision depend on the type of Autoscaler implementation you are using:
+
+* The default KPA Autoscaler supports the `concurrency` and `rps` metrics.
+* The HPA Autoscaler supports the `concurrency`, `rps` and `cpu` metrics.
+
+<!-- TODO: Add details about different metrics types, how concurrency and rps differ. Explain cpu. -->
+
+For more information about KPA and HPA, see the documentation on [Supported Autoscaler types](./autoscaling-concepts.md).
+
+* **Per-revision annotation key:** `autoscaling.knative.dev/metric`
+* **Possible values:** `"concurrency"`, `"rps"` or `"cpu"`, depending on your Autoscaler type. The `cpu` metric is only supported on revisions with the HPA class.
+* **Default:** `"concurrency"`
+
+{{< tabs name="Examples of configuring metric types per revision" default="Per-revision concurrency configuration" >}}
+{{% tab name="Per-revision concurrency configuration" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/metric: "concurrency"
+```
+{{< /tab >}}
+{{% tab name="Per-revision rps configuration" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/metric: "rps"
+```
+{{< /tab >}}
+{{% tab name="Per-revision cpu configuration" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/metric: "cpu"
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+## Next steps
+
+* Configure [concurrency targets](./concurrency.md) for applications
+* Configure [requests per second targets](./rps-target.md) for replicas of an application
diff --git a/docs/serving/autoscaling/autoscaling-targets.md b/docs/serving/autoscaling/autoscaling-targets.md
@@ -0,0 +1,81 @@
+---
+title: "Targets"
+linkTitle: "Targets"
+weight: 04
+type: "docs"
+---
+
+Configuring a target provide the Autoscaler with a value that it tries to maintain for the configured metric for a revision.
+See the [metrics](./autoscaling-metrics.md) documentation for more information about configurable metric types.
+
+The `target` annotation, used to configure per-revision targets,  is _metric agnostic_. This means the target is simply an integer value, which can be applied for any metric type.
+
+## Configuring targets
+
+* **Global settings key:** `container-concurrency-target-default` for setting a concurrency target, and `requests-per-second-target-default` for setting a requests-per-second (RPS) target. For more information, see the documentation on [metrics](./autoscaling-metrics.md).
+* **Per-revision annotation key:** `autoscaling.knative.dev/target`
+* **Possible values:** An integer (metric agnostic).
+* **Default:** `"100"` for `container-concurrency-target-default`, and `"200"` for `requests-per-second-target-default`. There is no default value set for the `target` annotation.
+
+{{< tabs name="Configuring targets" default="Target annotation - Per-revision" >}}
+{{% tab name="Target annotation - Per-revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  name: helloworld-go
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/target: "50"
+```
+{{< /tab >}}
+{{% tab name="Concurrency target - Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ container-concurrency-target-default: "200"
+```
+{{< /tab >}}
+{{% tab name="Concurrency target - Container Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      container-concurrency-target-default: "200"
+```
+{{< /tab >}}
+{{% tab name="Requests per second (RPS) target - Global (ConfigMap)" %}}
+```yaml
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: config-autoscaler
+ namespace: knative-serving
+data:
+ requests-per-second-target-default: "150"
+```
+{{< /tab >}}
+{{% tab name="Requests per second (RPS) target - Global (Operator)" %}}
+```yaml
+apiVersion: operator.knative.dev/v1alpha1
+kind: KnativeServing
+metadata:
+  name: knative-serving
+spec:
+  config:
+    autoscaler:
+      requests-per-second-target-default: "150"
+```
+{{< /tab >}}
+{{< /tabs >}}