[WIP] Add load balancing docs

abrennan89 · abrennan89 · commit 199e8aed9323 · 2020-06-09T08:43:57.000-05:00
diff --git a/docs/serving/autoscaling/target-burst-capacity.md b/docs/serving/autoscaling/target-burst-capacity.md
diff --git a/docs/serving/load-balancing/README.md b/docs/serving/load-balancing/README.md
@@ -0,0 +1,33 @@
+You can configure load balancing on Knative to use either:
+
+- An ingress gateway, such as Istio or Kourier.
+- The Knative activator in the request path acting as a load balancer.
+
+For more information about load balancing using an ingress gateway, see the [Serving API](../../reference/serving-api.md) documentation.
+
+This guide explains how you can configure load balancing for your Knative system using the activator.
+
+## About the activator
+
+Knative assigns a subset of activators for each revision, depending on the revision size. More revision pods will mean a greater number of activators for that revision. Activators are scaled horizontally, so there may be multiple activators in a deployment.
+
+In general, the system will perform best if the number of existing pods is larger than the number of activators, and those numbers divide equally.
+
+The activator load balancing algorithm works as follows:
+- If concurrency is unlimited, the request is sent to a random pod.
+- If concurrency is set to a limited value, the activator will send the request to the first pod that has capacity.
+
+### Prerequisites
+
+- Ensure that there is no ingress gateway enabled.
+- Ensure that individual pod addressability is enabled.
+
+### Configuring target burst capacity
+
+Target burst capacity is mainly responsible for determining whether the activator is in the request path outside of scale from zero scenarios.
+
+Target burst capacity can be configured using a combination of the following parameters:
+
+* Setting the targeted concurrency limits for the revision. For more information, see the documentation on [concurrency](../../serving/autoscaling/concurrency.md).
+* Setting the target utilization parameters. For more information, see the documentation on [target utilization](../../serving/autoscaling/concurrency.md#target-utilization).
+* Setting the target burst capacity [per revision](./target-burst-capacity.md).
diff --git a/docs/serving/load-balancing/_index.md b/docs/serving/load-balancing/_index.md
@@ -0,0 +1,7 @@
+---
+title: "Load balancing"
+weight: 30
+type: "docs"
+---
+
+{{% readfile file="README.md" %}}
diff --git a/docs/serving/load-balancing/target-burst-capacity.md b/docs/serving/load-balancing/target-burst-capacity.md
@@ -0,0 +1,55 @@
+---
+title: "Configuring target burst capacity"
+linkTitle: "Configuring target burst capacity"
+weight: 50
+type: "docs"
+aliases:
+    - /docs/serving/autoscaling/target-burst-capacity
+---
+
+_Target burst capacity_ determines the size of traffic burst a Knative application can handle without buffering.
+If a traffic burst is too large for an application to handle without buffering, the activator will be placed in the request path to protect the revision and optimize request load balancing.
+The activator can also quickly spin up additional pods for capacity, and throttle how quickly requests are sent to pods.
+
+You can configure target burst capacity using the `autoscaling.knative.dev/targetBurstCapacity` annotation key in `config-autoscaler` ConfigMap, as shown in the following example:
+
+* **Global key:** No global key.
+* **Per-revision annotation key:** `autoscaling.knative.dev/targetBurstCapacity`
+* **Possible values:** float
+* **Default:** `70`
+
+**Note:** If the activator is in the path, it will fully load all replicas up to `containerConcurrency`. It currently applies target utilization only on revision level.
+<!-- TODO: clarify what this note means-->
+
+**Example:**
+{{< tabs name="targetBurstCapacity" default="Per Revision" >}}
+{{% tab name="Per Revision" %}}
+```yaml
+apiVersion: serving.knative.dev/v1
+kind: Service
+metadata:
+  annotations:
+  name: s3
+  namespace: default
+spec:
+  template:
+    metadata:
+      annotations:
+        autoscaling.knative.dev/minScale: "2"
+        autoscaling.knative.dev/targetBurstCapacity: "70"
+```
+{{< /tab >}}
+{{< /tabs >}}
+
+- If `autoscaling.knative.dev/targetBurstCapacity` is set to `0`, the activator is only added to the request path during scale from zero scenarios, and ingress gateway load balancing will be applied.
+
+  **NOTE:** Ingress gateway load balancing requires additional configuration. For more information about load balancing using an ingress gateway, see the [Serving API](../../reference/serving-api.md) documentation.
+  
+- If `autoscaling.knative.dev/targetBurstCapacity` is set to `-1`, the activator is always in the request path, regardless of the revision size.
+
+- If `autoscaling.knative.dev/targetBurstCapacity` is set to another integer, the activator may be in the path, depending on the revision scale and load.
+
+<!--Target burst capacity can alternatively be configured globally, by configuring the following settings together:
+
+* Setting the targeted concurrency limits for the revision. For more information, see the documentation on [concurrency](../../serving/autoscaling/concurrency.md).
+* Setting the target utilization parameters. For more information, see the documentation on [target utilization](../../serving/autoscaling/concurrency.md#target-utilization).-->