Skip to content

Kafka Partitions and Groups πŸ§™πŸ½β€β™‚οΈ

Lyes Sefiane edited this page Jan 18, 2023 · 1 revision

Table Of Contents

Partitions

Each topic in Kafka can contain multiple partitions. A topic can have 1-n partitions. The number of the partitions are set up during the topic creation. The maximum number of partitions per cluster and per topic varies by the specific version of Kafka.

Partitions allow Kafka to scale by parallelizing ingestion, storage and consumption of messages. It provides horizontal scalability. However, creating too many partitions may result in increase memory usage and file handles.

Each partition has separate physical log files which will rollover as they reach reconfigured sizes. A given massage in Kafka is stored in only 1 partition. Each partition is assigned a broker process, known as its leader broker. In order to write to a specific partition, the message needs to be sent to its corresponding leader. The leader takes care of updating its log file as well as replicating that partition to other copies. The leader will also send data to the subscribers of the partition.

With multiple partitions for a topic, consumers can share workloads through consumer groups. Partitions ca also be replicated for fault tolerance purposes.

Each published message gets stored in only 1 partition. If the partition is replicated, each replicated copy will also get an instance of this message.

Message ordering is guaranteed only within a partition.

The partition for a message is determined by its message key. Kafka uses a hashing function to allocate a partition based on the message key. Messages with the same key will always end up in the same partition (same message key = same partition).

Creating Topics with Partitions

  • Create a Topic with 03 partitions
I have no name!@f871c6f4b068:/opt/bitnami/kafka/bin$         
>      ./kafka-topics.sh \
>             --zookeeper zookeeper:2181 \
>             --create \
>             --topic kafka.learning.orders \
>             --partitions 3 \
>             --replication-factor 1
WARNING: Due to limitations in metric names, topics with a period ('.') or underscore ('_') could collide. To avoid issues it is best to use either, but not both.
Created topic kafka.learning.orders.
  • Check topic partitioning
I have no name!@f871c6f4b068:/opt/bitnami/kafka/bin$         
>        ./kafka-topics.sh \
>             --zookeeper zookeeper:2181 \
>             --topic kafka.learning.orders \
>             --describe
Topic: kafka.learning.orders    PartitionCount: 3       ReplicationFactor: 1    Configs:
        Topic: kafka.learning.orders    Partition: 0    Leader: 1001    Replicas: 1001  Isr: 1001
        Topic: kafka.learning.orders    Partition: 1    Leader: 1001    Replicas: 1001  Isr: 1001
        Topic: kafka.learning.orders    Partition: 2    Leader: 1001    Replicas: 1001  Isr: 1001

Publishing Messages To Topics with Keys

  • Publishing with Keys
I have no name!@f871c6f4b068:/opt/bitnami/kafka/bin$         
>       ./kafka-console-producer.sh \
>             --bootstrap-server localhost:29092 \
>             --property "parse.key=true" \
>             --property "key.separator=:" \
>             --topic kafka.learning.orders
>1001:"Mouse,23.05"
>1002:"Keyboard,10.00"
  • Navigate to data directory
I have no name!@f871c6f4b068:/bitnami/kafka/data$ ls -la
total 40
drwxrwxr-x 1 root root 4096 Jun 29 17:26 .
drwxrwxr-x 1 root root 4096 Apr 20  2021 ..
-rw-r--r-- 1 1001 root    0 Jun 29 16:40 .lock
-rw-r--r-- 1 1001 root    0 Jun 29 16:40 cleaner-offset-checkpoint
drwxr-xr-x 2 1001 root 4096 Jun 29 17:16 kafka.learning.orders-0
drwxr-xr-x 2 1001 root 4096 Jun 29 17:16 kafka.learning.orders-1
drwxr-xr-x 2 1001 root 4096 Jun 29 17:16 kafka.learning.orders-2
-rw-r--r-- 1 1001 root    4 Jun 29 17:26 log-start-offset-checkpoint
-rw-r--r-- 1 1001 root   91 Jun 29 16:41 meta.properties
-rw-r--r-- 1 1001 root   82 Jun 29 17:26 recovery-point-offset-checkpoint
-rw-r--r-- 1 1001 root   82 Jun 29 17:26 replication-offset-checkpoint
  • Kafka.learning.order* directories
I have no name!@f871c6f4b068:/bitnami/kafka/data$ ls kafka.learning.orders*
kafka.learning.orders-0:
00000000000000000000.index  00000000000000000000.log  00000000000000000000.timeindex  leader-epoch-checkpoint

kafka.learning.orders-1:
00000000000000000000.index  00000000000000000000.log  00000000000000000000.timeindex  leader-epoch-checkpoint

kafka.learning.orders-2:
00000000000000000000.index  00000000000000000000.log  00000000000000000000.timeindex  leader-epoch-checkpoint
  • Inspect Partitions
I have no name!@f871c6f4b068:/bitnami/kafka/data$ cat kafka.learning.orders-0/00000000000000000000.log
I☻i���☺��~4�☺��~4���������������☺1001β†’"Mouse,23.05"☺L☻��-j☺��~��☺��~����������������☺1002 "Keyboard,10.00"
# Empty for now
I have no name!@f871c6f4b068:/bitnami/kafka/data$ cat kafka.learning.orders-1/00000000000000000000.log 
# Empty for now
I have no name!@f871c6f4b068:/bitnami/kafka/data$ cat kafka.learning.orders-2/00000000000000000000.log

Consumer Groups

A consumer group is a group of consumers that share a topic workload. A topic may be generating thousands of messages in a short amount of time. It may not be possible for one single consumer process to keep up with processing these messages. For scalability, multiple processes can be started and the messages ca be distributed among them for load balancing. A consumer group is a logical group of consumers that Kafka uses for such load distribution.

Each message will be sent to only one consumer within a consumer group. That consumer is then responsible for processing the message and acknowledge back to Kafka.

Consumers split workload among themselves using partitions. Kafka keep tracks of the active number of consumers for a given topic, it then distributes the messages between these consumers. Kafka only considers the number of partitions for distribution, not the number of messages expected in each partition. It is expected that the Number of partitions are equal or higher than the number of consumers in a group.

We can create multiple consumer groups, each with a different set of consumers. Each group will get a full copy of all the messages, but each message will be sent only to one consumer within each consumer group.

When new consumers come up or existing consumers go down, Kafka takes care of re-balancing the load by reassigning partitions among live consumers.

Source [1]

Consumer Offset Management

Consumer Offset us a number to track message consumption by each consumer and partition. As each message is received by Kafka, it allocates a message ID to the message. Kafka then maintains the message ID offset on a by consumer and by partition basis to track consumption.

Kafka brokers keep track of both what is sent to the consumer and what is acknowledged by the consumer by using two offset values:

  • Current offset : last message sent to a given consumer
  • Committed offset : last message acknowledged by consumer

By default, Kafka consumers auto acknowledge on receipt, but this can be changed by the consumer.

When Kafka brokers don't receive acknowledgement within a set timeout, they will resend the message to the consumer (in case failure/timeout). This ensures at least one delivery of each message to a consumer group.

A message can be delivered multiple times if acknowledgement does not happen within a timeout.

When a consumer group starts up, it has the option of requesting messages either from the start, only the latest or from given offset (start/new/from-offset).

Source [1]

Consuming Partitioned Data

  • Consume messages using a consumer group
I have no name!@f871c6f4b068:/opt/bitnami/kafka/bin$         
>        ./kafka-console-consumer.sh \
>             --bootstrap-server localhost:29092 \
>             --topic kafka.learning.orders \
>             --group test-consumer-group \
>             --property print.key=true \
>             --property key.separator=" = " \
>             --from-beginning
1001 = "Mouse,23.05"
1002 = "Keyboard,10.00"
^CProcessed a total of 2 messages
  • Check current status of offsets
I have no name!@f871c6f4b068:/opt/bitnami/kafka/bin$         
>        ./kafka-consumer-groups.sh \
>             --bootstrap-server localhost:29092 \
>             --describe \
>             --all-groups

GROUP               TOPIC                 PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                                         HOST            CLIENT-ID
test-consumer-group kafka.learning.orders 0          2               2               0               consumer-test-consumer-group-1-cd6d8963-d850-471f-bfd1-2798599f4369 /172.18.0.3     consumer-test-consumer-group-1
test-consumer-group kafka.learning.orders 1          0               0               0               consumer-test-consumer-group-1-cd6d8963-d850-471f-bfd1-2798599f4369 /172.18.0.3     consumer-test-consumer-group-1
test-consumer-group kafka.learning.orders 2          0               0               0               consumer-test-consumer-group-1-cd6d8963-d850-471f-bfd1-2798599f4369 /172.18.0.3     consumer-test-consumer-group-1
Clone this wiki locally