# Python Kafka

There exists a neat Python package for communicating with a running Kafka cluster. You can even admin the cluster using it:

In [1]:
from kafka import KafkaAdminClient

admin_client = KafkaAdminClient()
admin_client.delete_topics(admin_client.list_topics())

DeleteTopicsResponse_v3(throttle_time_ms=0, topic_error_codes=[(topic='main_topic', error_code=0)])

# Writing Data to Kafka

If you write some data to a non-existent topic, it's been created automatically:

In [2]:
from kafka import KafkaProducer

print(admin_client.list_topics())
MAIN_TOPIC = "main_topic"
producer = KafkaProducer()
for i in range(10):
    producer.send(MAIN_TOPIC, bytes(str(i), encoding="utf-8"))
print(admin_client.list_topics())

[]
['main_topic']


# Reading Data from Kafka

Reading is done with a consumer

In [3]:
from kafka import KafkaConsumer

consumer = KafkaConsumer()

# Topics, Partitions, and Offsets

Messages in Kafka are organised into topics:

In [19]:
consumer.topics()

{'main_topic'}

Topic can have several partitions with different starting and ending offsets:

In [5]:
from kafka import TopicPartition

partitions = [
    TopicPartition(MAIN_TOPIC, partition)
    for partition in consumer.partitions_for_topic(MAIN_TOPIC)
]
print(consumer.beginning_offsets(partitions))
print(consumer.end_offsets(partitions))

{TopicPartition(topic='main_topic', partition=0): 0}
{TopicPartition(topic='main_topic', partition=0): 10}


Before reading something from Kafka, one should assign the consumer to a topic and partition:

# Reading from a given offset of a partition

In [6]:
consumer.assign(partitions)

One can read from any offset of the partition:

In [11]:
print(consumer.position(partitions[0]))
consumer.seek(partitions[0], 0)
print(consumer.position(partitions[0]))

10
0


Reading data can be done by batches of any desired size:

In [12]:
for _ in range(10):
    data = consumer.poll(
        timeout_ms=10,
        max_records=1
    )[partitions[0]][0].value
    print(data)
print(consumer.position(partitions[0]))

b'0'
b'1'
b'2'
b'3'
b'4'
b'5'
b'6'
b'7'
b'8'
b'9'
10
