docs(clients): updates the tuning content for producers and consumers (…

…#9260) Signed-off-by: prmellor <pmellor@redhat.com>
strimzi · Feb 22, 2024 · 6975ee3 · 6975ee3
1 parent 4ea83ce
commit 6975ee3
Show file tree

Hide file tree

Showing 4 changed files with 120 additions and 55 deletions.
diff --git a/documentation/assemblies/security/assembly-securing-access.adoc b/documentation/assemblies/security/assembly-securing-access.adoc
@@ -7,15 +7,32 @@
 
 [role="_abstract"]
 Secure your Kafka cluster by managing the access a client has to Kafka brokers.
-Specify configuration options to secure Kafka brokers and clients
-
 A secure connection between Kafka brokers and clients can encompass the following:
 
 * Encryption for data exchange
 * Authentication to prove identity
 * Authorization to allow or decline actions executed by users
 
-The authentication and authorization mechanisms specified for a client must match those specified for the Kafka brokers.
+In Strimzi, securing a connection involves configuring listeners and user accounts:
+
+Listener configuration:: Use the `Kafka` resource to configure listeners for client connections to Kafka brokers.
+Listeners define how clients authenticate, such as using mTLS, SCRAM-SHA-512, OAuth 2.0, or custom authentication methods.
+To enhance security, configure TLS encryption to secure communication between Kafka brokers and clients.
+You can further secure TLS-based communication by specifying the supported TLS versions and cipher suites in the Kafka broker configuration.
++
+For an added layer of protection, use the `Kafka` resource to specify authorization methods for the Kafka cluster, such as simple, OAuth 2.0, OPA, or custom authorization.
+
+User accounts:: Set up user accounts and credentials with `KafkaUser` resources in Strimzi. 
+Users represent your clients and determine how they should authenticate and authorize with the Kafka cluster. 
+The authentication and authorization mechanisms specified in the user configuration must match the Kafka configuration. 
+Additionally, define Access Control Lists (ACLs) to control user access to specific topics and actions for more fine-grained authorization.
+To further enhance security, specify user quotas to limit client access to Kafka brokers based on byte rates or CPU utilization.
++
+You can also add producer or consumer configuration to your clients if you wish to limit the TLS versions and cipher suites they use.
+The configuration on the clients must only use protocols and cipher suites that are enabled on the broker.
+
+NOTE: If you are using an OAuth 2.0 to manage client access, user authentication and authorization credentials are managed through the authorization server.
+
 Strimzi operators automate the configuration process and create the certificates required for authentication.
 The Cluster Operator automatically sets up TLS certificates for data encryption and authentication within your cluster.
 

diff --git a/documentation/modules/managing/con-consumer-config-properties.adoc b/documentation/modules/managing/con-consumer-config-properties.adoc
@@ -6,11 +6,25 @@
 = Kafka consumer configuration tuning
 
 [role="_abstract"]
-Use a basic consumer configuration with optional properties that are tailored to specific use cases.
-
+Use configuration properties to optimize the performance of Kafka consumers.
 When tuning your consumers your primary concern will be ensuring that they cope efficiently with the amount of data ingested.
 As with the producer tuning, be prepared to make incremental changes until the consumers operate as expected.
 
+When tuning a consumer, consider the following aspects carefully, as they significantly impact its performance and behavior:
+
+Scaling:: Consumer groups enable parallel processing of messages by distributing the load across multiple consumers, enhancing scalability and throughput. 
+The number of topic partitions determines the maximum level of parallelism that you can achieve, as one partition can only be assigned to one consumer in a consumer group. 
+Message ordering:: 
+If absolute ordering within a topic is important, use a single-partition topic. 
+A consumer observes messages in a single partition in the same order that they were committed to the broker, which means that Kafka only provides ordering guarantees for messages in a single partition. 
+It is also possible to maintain message ordering for events specific to individual entities, such as users.
+If a new entity is created, you can create a new topic dedicated to that entity.
+You can use a unique ID, like a user ID, as the message key and route all messages with the same key to a single partition within the topic. 
+Offset reset policy:: Setting the appropriate offset policy ensures that the consumer consumes messages from the desired starting point and handles message processing accordingly. 
+The default Kafka reset value is `latest`, which starts at the end of the partition, and consequently means some messages might be missed, depending on the consumer's behavior and the state of the partition. 
+Setting `auto.offset.reset` to `earliest` ensures that when connecting with a new `group.id`, all messages are retrieved from the beginning of the log.
+Securing access:: Implement security measures for authentication, encryption, and authorization by setting up user accounts to xref:assembly-securing-access-{context}[manage secure access to Kafka].
+
 == Basic consumer configuration
 
 Connection and deserializer properties are required for every consumer.
@@ -55,6 +69,7 @@ but it does mean that there are consumers on standby should one stop functioning
 If you can meet throughput goals with fewer consumers, you save on resources.
 
 Consumers within the same consumer group send offset commits and heartbeats to the same broker.
+The consumer sends heartbeats to the Kafka broker to indicate its activity within the consumer group.
 So the greater the number of consumers in the group, the higher the request load on the broker.
 
 [source,env]
@@ -65,6 +80,30 @@ group.id=my-group-id <1>
 ----
 <1> Add a consumer to a consumer group using a group id.
 
+== Choosing the right partition assignment strategy
+
+Select an appropriate partition assignment strategy, which determines how Kafka topic partitions are distributed among consumer instances in a group.
+
+Partition strategies are supported by the following classes:
+
+* `org.apache.kafka.clients.consumer.RangeAssignor`
+* `org.apache.kafka.clients.consumer.RoundRobinAssignor`
+* `org.apache.kafka.clients.consumer.StickyAssignor`
+* `org.apache.kafka.clients.consumer.CooperativeStickyAssignor`
+
+Specify a class using the `partition.assignment.strategy` consumer configuration property. 
+The *range* assignment strategy assigns a range of partitions to each consumer, and is useful when you want to process related data together.
+
+Alternatively, opt for a *round robin* assignment strategy for equal partition distribution among consumers, which is ideal for high-throughput scenarios requiring parallel processing.
+
+For more stable partition assignments, consider the *sticky* and *cooperative sticky* strategies. 
+Sticky strategies aim to maintain assigned partitions during rebalances, when possible. 
+If a consumer was previously assigned certain partitions, the sticky strategies prioritize retaining those same partitions with the same consumer after a rebalance, while only revoking and reassigning the partitions that are actually moved to another consumer.
+Leaving partition assignments in place reduces the overhead on partition movements.
+The cooperative sticky strategy also supports cooperative rebalances, enabling uninterrupted consumption from partitions that are not reassigned.
+
+If none of the available strategies suit your data, you can create a custom strategy tailored to your specific requirements.
+
 == Message ordering guarantees
 
 Kafka brokers receive fetch requests from consumers that ask the broker to send messages from a list of topics, partitions and offset positions.
@@ -193,17 +232,22 @@ isolation.level=read_committed <1>
 
 == Recovering from failure to avoid data loss
 
-Use the `session.timeout.ms` and `heartbeat.interval.ms` properties to configure the time taken to check and recover from consumer failure within a consumer group.
+In the event of failures within a consumer group, Kafka provides a rebalance protocol designed for effective detection and recovery.
+To minimize the potential impact of these failures, one key strategy is to adjust the `max.poll.records` property to balance efficient processing with system stability. 
+This property determines the maximum number of records a consumer can fetch in a single poll.
+Fine-tuning `max.poll.records` helps to maintain a controlled consumption rate, preventing the consumer from overwhelming itself or the Kafka broker.
 
-The `session.timeout.ms` property specifies the maximum amount of time in milliseconds a consumer within a consumer group can be out of contact with a broker before being considered inactive and a _rebalancing_ is triggered between the active consumers in the group.
-When the group rebalances, the partitions are reassigned to the members of the group.
+Additionally, Kafka offers advanced configuration properties like `session.timeout.ms` and `heartbeat.interval.ms`. 
+These settings are typically reserved for more specialized use cases and may not require adjustment in standard scenarios.
 
-The `heartbeat.interval.ms` property specifies the interval in milliseconds between _heartbeat_ checks to the consumer group coordinator to indicate that the consumer is active and connected.
-The heartbeat interval must be lower, usually by a third, than the session timeout interval.
-
-If you set the `session.timeout.ms` property lower, failing consumers are detected earlier, and rebalancing can take place quicker.
-However, take care not to set the timeout so low that the broker fails to receive a heartbeat in time and triggers an unnecessary rebalance.
+The `session.timeout.ms` property specifies the maximum amount of time a consumer can go without sending a heartbeat to the Kafka broker to indicate it is active within the consumer group. 
+If a consumer fails to send a heartbeat within the session timeout, it is considered inactive.
+A consumer marked as inactive triggers a rebalancing of the partitions for the topic. 
+Setting the `session.timeout.ms` property value too low can result in false-positive outcomes, while setting it too high can lead to delayed recovery from failures.
 
+The `heartbeat.interval.ms` property determines how frequently a consumer sends heartbeats to the Kafka broker.    
+A shorter interval between consecutive heartbeats allows for quicker detection of consumer failures. 
+The heartbeat interval must be lower, usually by a third, than the session timeout. 
 Decreasing the heartbeat interval reduces the chance of accidental rebalancing, but more frequent heartbeats increases the overhead on broker resources.
 
 == Managing offset policy
@@ -239,26 +283,24 @@ In this case, you can lower `max.partition.fetch.bytes` or increase `session.tim
 
 == Minimizing the impact of rebalances
 
-The rebalancing of a partition between active consumers in a group is the time it takes for:
+The rebalancing of a partition between active consumers in a group is the time it takes for the following to take place:
 
 * Consumers to commit their offsets
 * The new consumer group to be formed
 * The group leader to assign partitions to group members
 * The consumers in the group to receive their assignments and start fetching
 
-Clearly, the process increases the downtime of a service, particularly when it happens repeatedly during a rolling restart of a consumer group cluster.
+The rebalancing process can increase the downtime of a service, particularly if it happens repeatedly during a rolling restart of a consumer group cluster.
 
-In this situation, you can use the concept of _static membership_ to reduce the number of rebalances.
-Rebalancing assigns topic partitions evenly among consumer group members.
+In this situation, you can introduce _static membership_ by assigning a unique identifier (`group.instance.id`) to each consumer instance within the group.
 Static membership uses persistence so that a consumer instance is recognized during a restart after a session timeout.
-
-The consumer group coordinator can identify a new consumer instance using a unique id that is specified using the `group.instance.id` property.
-During a restart, the consumer is assigned a new member id, but as a static member it continues with the same instance id,
-and the same assignment of topic partitions is made.
-
-If the consumer application does not make a call to poll at least every `max.poll.interval.ms` milliseconds, the consumer is considered to be failed, causing a rebalance.
-If the application cannot process all the records returned from poll in time, you can avoid a rebalance by using the `max.poll.interval.ms` property to specify the interval in milliseconds between polls for new messages from a consumer.
-Or you can use the `max.poll.records` property to set a maximum limit on the number of records returned from the consumer buffer, allowing your application to process fewer records within the `max.poll.interval.ms` limit.
+Consequently, the consumer maintains its assignment of topic partitions, reducing unnecessary rebalancing when it rejoins the group after a failure or restart.
+
+Additionally, adjusting the `max.poll.interval.ms` configuration can prevent rebalances caused by prolonged processing tasks, allowing you to specify the maximum interval between polls for new messages.
+Use the `max.poll.records` property to cap the number of records returned from the consumer buffer during each poll. 
+Reducing the number of records allows the consumer to process fewer messages more efficiently.  
+In cases where lengthy message processing is unavoidable, consider offloading such tasks to a pool of worker threads. 
+This parallel processing approach prevents delays and potential rebalances caused by overwhelming the consumer with a large volume of records.
 
 [source,shell,subs="+quotes"]
 ----