Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 40 additions & 1 deletion docs/serving/configuring-autoscaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -496,7 +496,7 @@ This period is an upper bound amount of time the system waits internally for the

* **Global key:** `scale-to-zero-grace-period`
* **Per-revision annotation key:** n/a
* **Possible values:** Duration
* **Possible values:** Duration (must be at least 6s).
* **Default:** `30s`

**Example:**
Expand Down Expand Up @@ -526,6 +526,45 @@ spec:
{{< /tab >}}
{{< /tabs >}}


### Scale To Zero Last Pod Retention Period

The `scale-to-zero-pod-retention-period` flag determines the **minimum** amount of time that the last pod will remain active after the Autoscaler has decided to scale pods to zero.

This contrasts with the `scale-to-zero-grace-period` flag, which determines the **maximum** amount of time that the last pod will remain active after the Autoscaler has decided to scale pods to zero.

* **Global key:** `scale-to-zero-pod-retention-period`
* **Per-revision annotation key:** n/a
* **Possible values:** Non-negative duration string
* **Default:** `0s`

**Example:**
{{< tabs name="scale-to-zero-grace" default="Global (ConfigMap)" >}}
{{% tab name="Global (ConfigMap)" %}}
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: config-autoscaler
namespace: knative-serving
data:
scale-to-zero-pod-retention-period: "42s"
```
{{< /tab >}}
{{% tab name="Global (Operator)" %}}
```yaml
apiVersion: operator.knative.dev/v1alpha1
kind: KnativeServing
metadata:
name: knative-serving
spec:
config:
autoscaler:
scale-to-zero-pod-retention-period: "42s"
```
{{< /tab >}}
{{< /tabs >}}

## Modes: Stable and Panic

The KPA acts on the respective metrics (concurrency or RPS) aggregated over time-based windows. These windows define the amount of historical data the autoscaler takes into account and are used to smooth the data over the specified amount of time. The shorter these windows are, the more quickly the autoscaler will react but the more hysterically it will react as well.
Expand Down