Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ focus on solving mundane but difficult tasks such as:

- [Deploying a container](./serving/getting-started-knative-app.md)
- [Routing and managing traffic with blue/green deployment](./serving/samples/blue-green-deployment.md)
- [Scaling automatically and sizing workloads based on demand](./serving/configuring-autoscaling.md)
- [Scaling automatically and sizing workloads based on demand](./serving/autoscaling)
- [Binding running services to eventing ecosystems](./eventing/samples/kubernetes-event-source/)

Developers on Knative can use familiar idioms, languages, and frameworks to
Expand Down Expand Up @@ -88,7 +88,7 @@ Follow the links below to learn more about Knative.

### Samples and demos

- [Autoscaling](./serving/samples/autoscale-go/README.md)
- [Autoscaling](./serving/autoscaling/autoscale-go/)
- [Binding running services to eventing ecosystems](./eventing/samples/kubernetes-event-source/)
- [Telemetry](./serving/samples/telemetry-go/README.md)
- [REST API sample](./serving/samples/rest-api-go/README.md)
Expand Down
6 changes: 2 additions & 4 deletions docs/serving/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ and scales to support advanced scenarios.
The Knative Serving project provides middleware primitives that enable:

- Rapid deployment of serverless containers
- Automatic scaling up and down to zero
- [Automatic scaling up and down to zero](./autoscaling/README.md)
- Routing and network programming for Istio components
- Point-in-time snapshots of deployed code and configurations

Expand Down Expand Up @@ -36,7 +36,7 @@ serverless workload behaves on the cluster:
are immutable objects and can be retained for as long as useful. Knative
Serving Revisions can be automatically scaled up and down according to
incoming traffic. See
[Configuring the Autoscaler](./configuring-autoscaling.md) for more
[Configuring the Autoscaler](./autoscaling) for more
information.

![Diagram that displays how the Serving resources coordinate with each other.](https://github.com/knative/serving/raw/master/docs/spec/images/object_model.png)
Expand Down Expand Up @@ -83,5 +83,3 @@ in the Knative Serving repository.

See the [Knative Serving Issues](https://github.com/knative/serving/issues) page
for a full list of known issues.


18 changes: 18 additions & 0 deletions docs/serving/autoscaling/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
One of the main features of Knative is automatic scaling of replicas for an application to closely match incoming demand, including scaling applications to zero if no traffic is being received.
Knative Serving enables this by default, using the Knative Pod Autoscaler (KPA).
The Autoscaler component watches traffic flow to the application, and scales replicas up or down based on configured metrics.

Knative services default to using autoscaling settings that are suitable for the majority of use cases. However, some workloads may require a custom, more finely-tuned configuration.
This guide provides information about configuration options that you can modify to fit the requirements of your workload.

For more information about how autoscaling for Knative works, see the [Autoscaling concepts](./autoscaling-concepts.md) documentation.

For more information about which metrics can be used to control the Autoscaler, see the [metrics](./autoscaling-metrics.md) documentation.

## Optional autoscaling configuration tasks

* Configure your Knative deployment to use the Kubernetes [Horizontal Pod Autoscaler (HPA)](../../install/any-kubernetes-cluster.md#optional-serving-extensions) instead of the default KPA.
* Disable scale to zero functionality for your cluster ([global configuration only](./scale-to-zero.md)).
* Configure the [type of metrics](./autoscaling-metrics.md) your Autoscaler consumes.
* Configure [concurrency limits](./concurrency.md) for applications.
* Try out the [Go Autoscale Sample App](./autoscale-go/README.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see doing a similar move for other samples in the future (e.g. traffic splitting samples into the networking subsection)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I'd like to do this with all of them, it just makes more sense to me as a place where someone would look for it. WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we've discussed it; I don't have an intrinsic problem with it, but it would be good to have a documented strategy.

10 changes: 10 additions & 0 deletions docs/serving/autoscaling/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: "Autoscaling"
linkTitle: "Autoscaling"
weight: 20
type: "docs"
aliases:
- /docs/serving/configuring-autoscaling/
---

{{% readfile file="README.md" %}}
10 changes: 10 additions & 0 deletions docs/serving/autoscaling/autoscale-go/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
---
title: "Autoscale Sample App - Go"
linkTitle: "Autoscale Sample App - Go"
weight: 100
type: "docs"
aliases:
- /docs/serving/samples/autoscale-go
---

{{% readfile file="README.md" %}}
135 changes: 135 additions & 0 deletions docs/serving/autoscaling/autoscaling-concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
---
title: "Autoscaling concepts"
linkTitle: "Autoscaling concepts"
weight: 01
type: "docs"
---

This section covers conceptual information about which Autoscaler types are supported, as well as fundamental information about how autoscaling is configured.

## Supported Autoscaler types

Knative Serving supports the implementation of Knative Pod Autoscaler (KPA) and Kubernetes' Horizontal Pod Autoscaler (HPA). The features and limitations of each of these Autoscalers are listed below.

**IMPORTANT:** If you want to use Kubernetes Horizontal Pod Autoscaler (HPA), you must install it after you install [Knative Serving](../../install/any-kubernetes-cluster.md#optional-serving-extensions).

### Knative Pod Autoscaler (KPA)

* Part of the Knative Serving core and enabled by default once Knative Serving is installed.
* Supports scale to zero functionality.
* Does not support CPU-based autoscaling.

### Horizontal Pod Autoscaler (HPA)

* Not part of the Knative Serving core, and must be enabled after [Knative Serving installation](../../install/any-kubernetes-cluster.md#optional-serving-extensions).
* Does not support scale to zero functionality.
* Supports CPU-based autoscaling.

### Configuring the Autoscaler implementation

The type of Autoscaler implementation (KPA or HPA) can be configured by using the `class` annotation.

* **Global settings key:** `pod-autoscaler-class`
* **Per-revision annotation key:** `autoscaling.knative.dev/class`
* **Possible values:** `"kpa.autoscaling.knative.dev"` or `"hpa.autoscaling.knative.dev"`
* **Default:** `"kpa.autoscaling.knative.dev"`

**Example:**
{{< tabs name="class" default="Per Revision" >}}
{{% tab name="Per Revision" %}}
```yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/class: "kpa.autoscaling.knative.dev"
spec:
containers:
- image: gcr.io/knative-samples/helloworld-go
```
{{< /tab >}}
{{% tab name="Global (ConfigMap)" %}}
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: config-autoscaler
namespace: knative-serving
data:
pod-autoscaler-class: "kpa.autoscaling.knative.dev"
```
{{< /tab >}}
{{% tab name="Global (Operator)" %}}
```yaml
apiVersion: operator.knative.dev/v1alpha1
kind: KnativeServing
metadata:
name: knative-serving
spec:
config:
autoscaler:
pod-autoscaler-class: "kpa.autoscaling.knative.dev"
```
{{< /tab >}}
{{< /tabs >}}

## Global versus per-revision settings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably go above the class, since it affects the choices made there as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this fits here as well -- it provides more detail, but if users are happy with the first choice they found, it should work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is why we need to be able to reuse / include content - so we can have a blurb about something in more than one place in the docs but maintain it from one file.


Configuring for autoscaling in Knative can be set using either global or per-revision settings.

1. If no per-revision autoscaling settings are specified, the global settings will be used.
1. If per-revision settings are specified, these will override the global settings when both types of settings exist.

### Global settings

Global settings for autoscaling are configured using the `config-autoscaler` ConfigMap. If you installed Knative Serving using the Operator, you can set global configuration settings in the `spec.config.autoscaler` ConfigMap, located in the `KnativeServing` custom resource (CR).

#### Example of the default autoscaling ConfigMap

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: config-autoscaler
namespace: knative-serving
data:
container-concurrency-target-default: "100"
container-concurrency-target-percentage: "0.7"
enable-scale-to-zero: "true"
max-scale-up-rate: "1000"
max-scale-down-rate: "2"
panic-window-percentage: "10"
panic-threshold-percentage: "200"
scale-to-zero-grace-period: "30s"
scale-to-zero-pod-retention-period: "0s"
stable-window: "60s"
target-burst-capacity: "200"
requests-per-second-target-default: "200"
```

### Per-revision settings

Per-revision settings for autoscaling are configured by adding _annotations_ to a revision.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the annotation keys the same ones as are present in the ConfigMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, they're different afaik, it's explained in the section for each key.


### Example

```yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/target: "70"
```

**IMPORTANT:** If you are creating revisions by using a service or configuration, you must set the annotations in the _revision template_ so that any modifications will be applied to each revision as they are created.
Setting annotations in the top level metadata of a single revision will not propagate the changes to other revisions and will not apply changes to the autoscaling configuration for your application.
74 changes: 74 additions & 0 deletions docs/serving/autoscaling/autoscaling-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
title: "Metrics"
linkTitle: "Metrics"
weight: 03
type: "docs"
---

The metric configuration defines which metric type is watched by the Autoscaler.

## Setting metrics per revision

For [per-revision](./autoscaling-concepts.md) configuration, this is determined using the `autoscaling.knative.dev/metric` annotation.
The possible metric types that can be configured per revision depend on the type of Autoscaler implementation you are using:

* The default KPA Autoscaler supports the `concurrency` and `rps` metrics.
* The HPA Autoscaler supports the `concurrency`, `rps` and `cpu` metrics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe what these options mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, how concurrency and RPS differ.

Feel free to defer with a <!-- TODO: ... --> if you don't have bandwidth right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TODO note :) I think I had pretty much the same note previously and removed it because I knew I wouldn't get to it 😂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TODO lets someone else pick up the item if you don't get back to it. 😁


<!-- TODO: Add details about different metrics types, how concurrency and rps differ. Explain cpu. -->

For more information about KPA and HPA, see the documentation on [Supported Autoscaler types](./autoscaling-concepts.md).

* **Per-revision annotation key:** `autoscaling.knative.dev/metric`
* **Possible values:** `"concurrency"`, `"rps"` or `"cpu"`, depending on your Autoscaler type. The `cpu` metric is only supported on revisions with the HPA class.
* **Default:** `"concurrency"`

{{< tabs name="Examples of configuring metric types per revision" default="Per-revision concurrency configuration" >}}
{{% tab name="Per-revision concurrency configuration" %}}
```yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/metric: "concurrency"
```
{{< /tab >}}
{{% tab name="Per-revision rps configuration" %}}
```yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/metric: "rps"
```
{{< /tab >}}
{{% tab name="Per-revision cpu configuration" %}}
```yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/metric: "cpu"
```
{{< /tab >}}
{{< /tabs >}}

## Next steps

* Configure [concurrency targets](./concurrency.md) for applications
* Configure [requests per second targets](./rps-target.md) for replicas of an application
81 changes: 81 additions & 0 deletions docs/serving/autoscaling/autoscaling-targets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
title: "Targets"
linkTitle: "Targets"
weight: 04
type: "docs"
---

Configuring a target provide the Autoscaler with a value that it tries to maintain for the configured metric for a revision.
See the [metrics](./autoscaling-metrics.md) documentation for more information about configurable metric types.

The `target` annotation, used to configure per-revision targets, is _metric agnostic_. This means the target is simply an integer value, which can be applied for any metric type.

## Configuring targets

* **Global settings key:** `container-concurrency-target-default` for setting a concurrency target, and `requests-per-second-target-default` for setting a requests-per-second (RPS) target. For more information, see the documentation on [metrics](./autoscaling-metrics.md).
* **Per-revision annotation key:** `autoscaling.knative.dev/target`
* **Possible values:** An integer (metric agnostic).
* **Default:** `"100"` for `container-concurrency-target-default`, and `"200"` for `requests-per-second-target-default`. There is no default value set for the `target` annotation.

{{< tabs name="Configuring targets" default="Target annotation - Per-revision" >}}
{{% tab name="Target annotation - Per-revision" %}}
```yaml
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: helloworld-go
namespace: default
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/target: "50"
```
{{< /tab >}}
{{% tab name="Concurrency target - Global (ConfigMap)" %}}
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: config-autoscaler
namespace: knative-serving
data:
container-concurrency-target-default: "200"
```
{{< /tab >}}
{{% tab name="Concurrency target - Container Global (Operator)" %}}
```yaml
apiVersion: operator.knative.dev/v1alpha1
kind: KnativeServing
metadata:
name: knative-serving
spec:
config:
autoscaler:
container-concurrency-target-default: "200"
```
{{< /tab >}}
{{% tab name="Requests per second (RPS) target - Global (ConfigMap)" %}}
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: config-autoscaler
namespace: knative-serving
data:
requests-per-second-target-default: "150"
```
{{< /tab >}}
{{% tab name="Requests per second (RPS) target - Global (Operator)" %}}
```yaml
apiVersion: operator.knative.dev/v1alpha1
kind: KnativeServing
metadata:
name: knative-serving
spec:
config:
autoscaler:
requests-per-second-target-default: "150"
```
{{< /tab >}}
{{< /tabs >}}
Loading