redpanda-data · Feediver1 · Apr 30, 2024 · Apr 16, 2024 · Apr 18, 2024 · Apr 18, 2024
diff --git a/modules/manage/pages/cluster-maintenance/manage-throughput.adoc b/modules/manage/pages/cluster-maintenance/manage-throughput.adoc
@@ -2,17 +2,17 @@
 :description: Manage the throughput of Kafka traffic with configurable properties.
 :page-categories: Management, Networking
 
-Manage the throughput of Kafka traffic at the cluster level, with configurable properties that limit and protect the use of disk and network resources for individual brokers and for an entire cluster. Set broker-wide throughput limits for Kafka API traffic.
+Redpanda supports applying throughput throttling on both ingress and egress independently, and allows configuration at the broker and client levels. The purpose of this is to prevent unbounded network and disk usage of the broker by clients. Broker-wide limits apply to all clients connected to the broker and restrict total traffic on the broker. Client limits apply to a set of clients defined by their `client_id` and help prevent a set of clients from starving other clients using the same broker.
 
-== Broker-wide throughput limits
+== Throughput throttling enforcement
 
-The network bandwidth and disk utilization of brokers may be overloaded by clients that produce or consume throughput without limits. To prevent resource overloading caused by unconstrained throughput and to configure back pressure, Redpanda provides runtime-configurable properties that limit and balance throughput of Kafka API traffic.
+Throughput limits are enforced by applying backpressure to clients. When a connection is in breach of the throughput limit, the throttler advises the client about the delay (throttle time) that would bring the rate back to the allowed level. Redpanda starts by adding a `throttle_time_ms` field to responses. If that isn't honored, delays are inserted on the connection's next read operation. The throttling delay may not exceed the limit set by xref:reference:tunable-properties.adoc#max_kafka_throttle_delay_ms[`max_kafka_throttle_delay_ms`].
 
-To manage the volume of traffic going through a broker, Redpanda implements throughput quotas on the ingress and egress sides of every broker. The throughput quota accounts for all Kafka API traffic going in or out of a broker, with the value of quota representing the allowed rate of data passing through in one direction. When a connection is in breach of the quota, the throttler advises the client about the delay (throttle time) that would bring the rate back to the allowed level, and it implements that delay before handling Kafka API requests. To control the quotas, Redpanda provides configurable rate limits for total ingress and egress traffic through a broker.
+== Broker-wide throughput limits
 
-With Redpanda's xref:get-started:architecture.adoc#thread-per-core-model[thread-per-core model], the Kafka API traffic to and from a client connection is processed by a single core (shard). In order to manage throughput quotas efficiently, broker quotas are distributed between shards, and each per-shard quota is in turn shared by all connections served by the shard. Splitting broker quota optimally between shards is done behind the scenes by the quota balancer component.
+Broker-wide throughput limits account for all Kafka API traffic going in to or out of the broker. The limit values represent the allowed rate of data in bytes per second passing through in each direction. Redpanda also provides administrators the ability to exclude clients from throughput throttling and to fine-tune which Kafka request types are subject to throttling limits.
 
-To distribute the broker throughput quota, the balancer periodically monitors the throughput rate of a broker's shards, and it distributes more quota to the shards that can make better use of it than the others. Each shard has a minimum throughput quota value, which is configurable both as a percentage of the default quota and as an absolute rate limit.
+=== Broker-wide throughput limit properties
 
 The properties for broker-wide throughput quota balancing are configured at the cluster level, for all brokers in a cluster:
 
@@ -25,27 +25,20 @@ The properties for broker-wide throughput quota balancing are configured at the
 | xref:reference:cluster-properties.adoc#kafka_throughput_limit_node_out_bps[kafka_throughput_limit_node_out_bps]
 | A broker's total throughput limit for egress Kafka traffic.
 
-| xref:reference:cluster-properties.adoc#kafka_quota_balancer_node_period_ms[kafka_quota_balancer_node_period_ms]
-| The period at which the quota balancer runs to balance throughput quota between a broker's shards.
-
-| xref:reference:cluster-properties.adoc#kafka_quota_balancer_min_shard_throughput_ratio[kafka_quota_balancer_min_shard_throughput_ratio]
-| The lowest value of the throughput quota a shard can get in the process of quota balancing, expressed as a ratio of the default shard quota. If set as `0`, there is no minimum, and if set as `1`, no quota can be taken away by the balancer.
+| xref:reference:cluster-properties.adoc#kafka_throughput_control[kafka_throughput_control]
+| List of clients for whom broker-wide limits do not apply
 
-| xref:reference:cluster-properties.adoc#kafka_quota_balancer_min_shard_throughput_bps[kafka_quota_balancer_min_shard_throughput_bps]
-| The lowest value of the throughput quota a shard can get in the process of quota balancing, in bytes per second. If set as `0`, there is no minimum.
+| xref:reference:cluster-properties.adoc#kafka_throughput_controlled_api_keys[kafka_throughput_controlled_api_keys]
+| List of Kafka request types subject to broker-wide throughput limits; defaults to `produce` and `fetch`.
 
 | xref:reference:tunable-properties.adoc#max_kafka_throttle_delay_ms[max_kafka_throttle_delay_ms]
 | The maximum delay inserted in the data path of Kafka API requests to throttle them down. Configuring this to be less than the Kafka client timeout can ensure that the delay that's inserted won't be long enough to cause a client timeout by itself.
 
-| xref:reference:cluster-properties.adoc#kafka_quota_balancer_window_ms[kafka_quota_balancer_window_ms]
-| The time window the balancer uses to average the current throughput measurement.
 |===
 
 [NOTE]
 ====
-* By default, both `kafka_throughput_limit_node_in_bps` and `kafka_throughput_limit_node_out_bps` are disabled, no throughput limits are applied. You must manually set them to enable quota balancing with throughput limits.
-* `kafka_quota_balancer_min_shard_throughput_bps` doesn't override the `kafka_throughput_limit_node_in_bps` and `kafka_throughput_limit_node_out_bps` limit settings. Consequently, the value of
-`kafka_throughput_limit_node_in_bps` or `kafka_throughput_limit_node_out_bps` can result in lesser throughput than `kafka_quota_balancer_min_shard_throughput_bps`.
+* By default, both `kafka_throughput_limit_node_in_bps` and `kafka_throughput_limit_node_out_bps` are disabled, and no throughput limits are applied. You must manually set them to enable throughput throttling.
 ====
 
 == Client throughput limits

diff --git a/modules/reference/pages/cluster-properties.adoc b/modules/reference/pages/cluster-properties.adoc
@@ -899,127 +899,76 @@ Maximum latency threshold for Kafka queue depth control depth tracking.
 
 ---
 
-=== kafka_quota_balancer_node_period_ms
-
-The period at which the intra-node throughput quota balancer runs.
-
-It may take longer for the balancer to complete a single balancing step than the period this property specifies, so the actual period may be more than configured here.
-
-If `0`, the balancer is disabled and all throughput quotas are immutable.
-
-*Units*: milliseconds
-
-*Default*: 750
-
-*Range*: [0, ]
+=== kafka_rpc_server_tcp_recv_buf
 
-*Restart required*: no
+Size of the Kafka server TCP receive buffer. If `null`, the property is disabled.
 
-*Related topics*:
+*Units*: bytes
 
-* xref:manage:cluster-maintenance/manage-throughput.adoc#node-wide-throughput-limits[Node-wide throughput limits]
+*Default*: null
 
-*Supported versions*: Redpanda v23.1 or later
+*Range*: [32 KiB, ...], aligned to 4096 bytes
 
 ---
 
-=== kafka_quota_balancer_min_shard_throughput_ratio
-
-The minimum value of the throughput quota a shard can get in the process of quota balancing, expressed as a ratio of default shard quota. While the value applies equally to ingress and egress traffic, the default shard quota can be different for ingress and egress and therefore result in different minimum throughput bytes-per-second (bps) values.
-
-Both `kafka_quota_balancer_min_shard_throughput_ratio` and <<kafka_quota_balancer_min_shard_throughput_bps,kafka_quota_balancer_min_shard_throughput_bps>> can be specified at the same time. In this case, the balancer will not decrease the effective shard quota below the largest bps value of each of these two properties.
-
-If set to `0.0`, the minimum is disabled. If set to `1.0`, then the balancer won't be able to rebalance quota without violating this ratio, consequently precluding the balancer from adjusting shards' quotas.
-
-*Type*: double
-
-*Units*: ratio of default shard quota
-
-*Default*: 0.01
-
-*Range*: [0.0, 1.0]
+=== kafka_rpc_server_tcp_send_buf
 
-*Restart required*: no
+Size of the Kafka server TCP transmit buffer. If `null`, the property is disabled.
 
-*Related topics*:
+*Units*: bytes
 
-* xref:manage:cluster-maintenance/manage-throughput.adoc#node-wide-throughput-limits[Node-wide throughput limits]
+*Default*: null
 
-*Supported versions*: Redpanda v23.1 or later
+*Range*: [32 KiB, ...], aligned to 4096 bytes
 
 ---
 
-=== kafka_quota_balancer_min_shard_throughput_bps
+[[kafka_throughput_control]]
+=== kafka_throughput_control
 
-The minimum value of the throughput quota a shard can get in the process of quota balancing, expressed in bytes per second. The value applies equally to ingress and egress traffic.
+List of throughput control groups that define exclusions from node-wide throughput limits. Clients excluded from node-wide throughput limits are still potentially subject to client-specific throughput limits.
 
-kafka_quota_balancer_min_shard_throughput_bps doesn't override the limit settings, <<kafka_throughput_limit_node_in_bps,kafka_throughput_limit_node_in_bps>> and <<kafka_throughput_limit_node_out_bps,kafka_throughput_limit_node_out_bps>>. Consequently, the value of
-`kafka_throughput_limit_node_in_bps` or `kafka_throughput_limit_node_out_bps` can result in lesser throughput than kafka_quota_balancer_min_shard_throughput_bps.
+Each throughput control group consists of:
 
-Both <<kafka_quota_balancer_min_shard_throughput_ratio,kafka_quota_balancer_min_shard_throughput_ratio>> and kafka_quota_balancer_min_shard_throughput_bps can be specified at the same time. In this case, the balancer will not decrease the effective shard quota below the largest bps value of each of these two properties.
+* `name` (optional) - any unique group name
+* `client_id` - regex to match client_id
 
-If set to `0`, no minimum is enforced.
+Example values:
 
-*Units*: bytes per second
+* `[{'name': 'first_group','client_id': 'client1'}, {'client_id': 'consumer-\d+'}]`
+* `[{'name': 'catch all'}]`
+* `[{'name': 'missing_id', 'client_id': '+empty'}]`
+
+A connection is assigned the first matching group and is then excluded from throughput control. A `name` is not required, but can help you categorize the exclusions. Specifying `+empty` for the `client_id` will match on clients that opt not to send a `client_id`. You can also optionally omit the `client_id` and specify only a `name`, as shown. In this situation, all clients will match the rule and Redpanda will exclude them from all from node-wide throughput control.
 
-*Default*: 256
+*Type*: list of control groups of the format `{'name' : 'group name', 'client_id' : 'regex pattern'}`
 
-*Range*: [0, ...]
+*Default*: `[]`
 
 *Restart required*: no
 
 *Related topics*:
 
-* xref:manage:cluster-maintenance/manage-throughput.adoc#node-wide-throughput-limits[Node-wide throughput limits]
-
-*Supported versions*: Redpanda v23.1 or later
+* xref:manage:cluster-maintenance/manage-throughput.adoc[Manage throughput]
 
 ---
 
-=== kafka_quota_balancer_window_ms
+[[kafka_throughput_controlled_api_keys]]
+=== kafka_throughput_controlled_api_keys
 
-Time window used to average the current throughput measurement for the quota balancer.
+List of Kafka request types subject to broker-wide throughput limits.
 
-*Units*: milliseconds
+*Type*: list<string>
 
-*Default*: 5000
-
-*Range*: [1, ...]
+*Default*: `["produce", "fetch"]`
 
 *Restart required*: no
 
 *Related topics*:
 
-* xref:manage:cluster-maintenance/manage-throughput.adoc#node-wide-throughput-limits[Node-wide throughput limits]
-
-*Supported versions*: Redpanda v23.1 or later
-
----
-
-=== kafka_rpc_server_tcp_recv_buf
-
-Size of the Kafka server TCP receive buffer. If `null`, the property is disabled.
-
-*Units*: bytes
-
-*Default*: null
-
-*Range*: [32 KiB, ...], aligned to 4096 bytes
-
----
-
-=== kafka_rpc_server_tcp_send_buf
-
-Size of the Kafka server TCP transmit buffer. If `null`, the property is disabled.
-
-*Units*: bytes
-
-*Default*: null
-
-*Range*: [32 KiB, ...], aligned to 4096 bytes
-
 ---
 
+[[kafka_throughput_limit_node_in_bps]]
 === kafka_throughput_limit_node_in_bps
 
 The maximum rate of all ingress Kafka API traffic for a node. Includes all Kafka API traffic (requests, responses, headers, fetched data, produced data, etc.).
@@ -1042,6 +991,7 @@ If `null`, the property is disabled, and traffic is not limited.
 
 ---
 
+[[kafka_throughput_limit_node_out_bps]]
 === kafka_throughput_limit_node_out_bps
 
 The maximum rate of all egress Kafka traffic for a node. Includes all Kafka API traffic (requests, responses, headers, fetched data, produced data, etc.).

diff --git a/modules/reference/pages/internal-metrics-reference.adoc b/modules/reference/pages/internal-metrics-reference.adoc
@@ -71,6 +71,46 @@ Can indicate latency caused by disk operations.
 
 ---
 
+=== vectorized_kafka_quotas_balancer_runs
+
+Number of times the throughput quota balancer has executed.
+
+*Type*: counter
+
+---
+
+=== vectorized_kafka_quotas_quota_effective
+
+Current effective quota for the quota balancer, in bytes per second.
+
+*Type*: counter
+
+---
+
+=== vectorized_kafka_quotas_throttle_time
+
+Histogram of throttle times, in seconds.
+
+*Type*: histogram
+
+---
+
+=== vectorized_kafka_quotas_traffic_intake
+
+Total amount of Kafka traffic (in bytes) taken in from clients for processing that was considered by the throttler.
+
+*Type*: counter
+
+---
+
+=== vectorized_kafka_quotas_traffic_egress
+
+Total amount of Kafka traffic (in bytes) published to clients that was considered by the throttler.
+
+*Type*: counter
+
+---
+
 === vectorized_kafka_rpc_active_connections
 
 Number of currently active Kafka RPC connections, or clients.

diff --git a/modules/reference/pages/tunable-properties.adoc b/modules/reference/pages/tunable-properties.adoc
@@ -1373,6 +1373,42 @@ Maximum size of the user-space receive buffer. If `null`, this limit is not appl
 
 ---
 
+[[kafka_throughput_throttling_v2]]
+=== kafka_throughput_throttling_v2
+
+Enables an updated algorithm for enforcing node throughput limits based on a shared token bucket, introduced with Redpanda v23.3.8. Set this property to `false` if you need to use the quota balancing algorithm from Redpanda v23.3.7 and older.  This property defaults to `true` for all new or upgraded Redpanda clusters.
+
+*Type*: boolean
+
+*Default*: true
+
+*Restart required*: no
+
+WARNING: Disabling this property is not recommended. It causes your Redpanda cluster to use an outdated throughput throttling mechanism. Only set this to `false` when advised to do so by Redpanda support.
+
+---
+
+[[kafka_throughput_replenish_threshold]]
+=== kafka_throughput_replenish_threshold
+
+Threshold for refilling the token bucket as part of enforcing throughput limits. This only applies when xref:kafka_throughput_throttling_v2[] is `true`.
+
+This threshold is evaluated with each request for data. When the number of tokens to replenish exceeds this threshold, then tokens are added to the token bucket. This ensures that the atomic is not being updated for the token count with each request. The range for this threshold is automatically clamped to the corresponding throughput limit for ingress and egress.
+
+*Type*: signed 64-bit integer
+
+*Default*: 1
+
+*Range*: For ingress, [1, xref:reference:cluster-properties.adoc#kafka_throughput_limit_node_in_bps[`kafka_throughput_limit_node_in_bps`]]. For egress, [1, xref:reference:cluster-properties.adoc#kafka_throughput_limit_node_out_bps[`kafka_throughput_limit_node_out_bps`]]
+
+*Restart required*: no
+
+*Related topics*:
+
+* xref:manage:cluster-maintenance/manage-throughput.adoc[Manage Throughput]
+
+---
+
 === legacy_group_offset_retention_enabled
 
 With group offset retention enabled by default starting in Redpanda version 23.1, this flag enables group offset retention for deployments of Redpanda upgraded from earlier versions.
@@ -1401,6 +1437,7 @@ For Redpanda versions *earlier than 23.1*:
 
 ---
 
+[[max_kafka_throttle_delay_ms]]
 === max_kafka_throttle_delay_ms
 
 The maximum delay inserted in the data path of Kafka API requests to throttle them down. Configuring this to be less than the Kafka client timeout can ensure that the delay that's inserted won't be long enough to cause a client timeout by itself.