# Python Kafka

There exists a neat Python package for communicating with a running Kafka cluster. You can even admin the cluster using it:

In [1]:
!pip install kafka-python

[33mYou are using pip version 9.0.1, however version 20.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [1]:
params = {
    "bootstrap_servers": "pkc-ewzgj.europe-west4.gcp.confluent.cloud:9092",
    "security_protocol": "SASL_SSL",
    "sasl_mechanism": "PLAIN",
    "sasl_plain_username": "6T3V4SHANAZC5EXK",
    "sasl_plain_password": "CJFtpZwGh/w8mltYpFJRSs2ePW+ho2/7VbwMn4ZarXej23Gm9p1cGgArfsP3o342"
}

In [20]:
from kafka import KafkaAdminClient
from kafka.admin import NewTopic

MAIN_TOPIC = "main_topic"
admin_client = KafkaAdminClient(**params)
# with AdminClient you can do anything with Kafka
admin_client.delete_topics(admin_client.list_topics())
# replication_factor is defined by cluster's configuration
# we won't use topic partitioning for this example
# thus leaving the number of partitions to 1
admin_client.create_topics([
    NewTopic(
        name=MAIN_TOPIC,
        num_partitions=1,
        replication_factor=3
    )
])

CreateTopicsResponse_v3(throttle_time_ms=0, topic_errors=[(topic=u'main_topic', error_code=0, error_message=None)])

# Writing Data to Kafka

Mind that one can write only `bytes` to Kafka, not strings!

In [21]:
from kafka import KafkaProducer

print(admin_client.list_topics())
producer = KafkaProducer(**params)
for i in range(10):
    producer.send(MAIN_TOPIC, bytes(i))
print(admin_client.list_topics())

[u'main_topic']
[u'main_topic']


# Reading Data from Kafka

Reading is done with a consumer

In [13]:
from kafka import KafkaConsumer

consumer = KafkaConsumer(**params)

# Topics, Partitions, and Offsets

Messages in Kafka are organised into topics:

In [14]:
consumer.topics()

{u'main_topic'}

Topic can have several partitions with different starting and ending offsets:

In [15]:
from kafka import TopicPartition

partitions = [
    TopicPartition(MAIN_TOPIC, partition)
    for partition in consumer.partitions_for_topic(MAIN_TOPIC)
]
print(consumer.beginning_offsets(partitions))
print(consumer.end_offsets(partitions))

{TopicPartition(topic=u'main_topic', partition=0): 0}
{TopicPartition(topic=u'main_topic', partition=0): 10}


Before reading something from Kafka, one should assign the consumer to a topic and partition:

# Reading from a given offset of a partition

In [16]:
consumer.assign(partitions)

One can read from any offset of the partition:

In [17]:
print(consumer.position(partitions[0]))
consumer.seek(partitions[0], 0)
print(consumer.position(partitions[0]))

10
0


Reading data can be done by batches of any desired size:

In [18]:
for _ in range(10):
    data = consumer.poll(
        timeout_ms=10,
        max_records=1
    )[partitions[0]][0].value
    print(data)
print(consumer.position(partitions[0]))

0
1
2
3
4
5
6
7
8
9
10
