<div style="text-align: center; line-height: 0; padding-top: 2px;">
  <img src="https://www.quantiaconsulting.com/logos/quantia_logo_orizz.png" alt="Quantia Consulting" style="width: 600px; height: 250px">
</div>

# Multi Partition Writing

By definition, a topic is a category or feed name to which records are published.

For each topic, the Kafka cluster maintains a partitioned log. Partitions allow the parallelization of a topic by splitting the data across multiple brokers.

Each partition can be placed on a separate machine to allow for multiple consumers to read from a topic in parallel

The number of partition is defined at topic creation time.

## Initialization

Let's initialize a producer and test writing operation on a topic with more than one partition

In [None]:
%load_ext autotime

In [None]:
from confluent_kafka import Producer, KafkaError
import json
import qcutils
import time

servers=qcutils.read_config_value("kafka.server") + ":" + str(qcutils.read_config_value("kafka.port"))

topic = ''

assert len(topic) > 0, "In order to avoid conflicts during write operation, please name the topic as <surname>-topic"

# The qcutils.create_kafka_topic allows you to specify the number of partition (deafault: 1)
qcutils.create_kafka_topic(topic,partitions=2)

producerconf = {
        'bootstrap.servers': servers,
    }

p = Producer(producerconf)

The `delivery_report(...)` function shows the partition where the message has been written.

This information is not available at production time, it is available as a result of the sending operation.

In [None]:
def delivery_report(err, msg):
    if err is not None:
        print("Failed to deliver message: {}".format(err))
    else:
        print("Produced record to topic {} partition [{}] @ offset {}"
              .format(msg.topic(), msg.partition(), msg.offset()))

n=0
while True:
    record_key = str(n)
    record_value = json.dumps({'count': n})
    print("Producing record: {}\t{}".format(record_key, record_value))
    p.produce(topic, key=record_key, value=record_value, on_delivery=delivery_report)
    p.poll(0)
    time.sleep(1)
    n = n+1

p.flush(10)

print("{} messages were produced to topic {}!".format(n + 1, topic))


Why a message can be sent to different partitions?

Because of a function that choose the partition based on the key value (Default partitioner)

```
DefaultPartitioner is a Partitioner that uses a 32-bit murmur2 hash to compute the partition for a record (with the key defined) or chooses a partition in a round-robin fashion (per the available partitions of the topic).
```

[src](https://jaceklaskowski.gitbooks.io/apache-kafka/kafka-producer-internals-DefaultPartitioner.html) and [src-code](https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java)

We can customize this behavior by specify the partition at production time (in the `produce(...)` call).

...but the best practice is to create a custom partitioner (python client does not allow this operation yet and it is an advanced concept).

##### ![Quantia Tiny Logo](https://www.quantiaconsulting.com/logos/quantia_logo_tiny.png) 2020 Quantia Consulting, srl. All rights reserved.