# Kafka Consumer
---


## Preparation

The first step is to install the required libraries, in our case `kafka-python` (and `pandas` to support visualization)

In [None]:
!pip install kafka-python

declare variable

In [None]:
username = "pujo"
server_ip = "34.87.150.250"
bootstrap_servers = f"{server_ip}:9092,{server_ip}:9093,{server_ip}:9094"
schema_registry_url = "http://34.87.150.250:8081"

---

## Subscribe from topic


In [None]:
from kafka import KafkaConsumer

topic_name = f"{username}-topic1"

consumer = KafkaConsumer(
    bootstrap_servers = bootstrap_servers,
    max_poll_records = 10
)

check out topics

In [None]:
consumer.topics()

and subscribe to the topic

In [None]:
consumer.subscribe(topics=[topic_name])
consumer.subscription()

Now we start reading

In [None]:
for message in consumer:
    print ("%d:%d: k=%s v=%s" % (message.partition,
                                 message.offset,
                                 message.key,
                                 message.value))

join a consumer group for dynamic partition assignment and offset commits

In [None]:
group_id = f"{username}-group1"
client_id = f"{username}-client1"

consumer = KafkaConsumer(
    client_id = client_id,
    group_id = group_id,
    bootstrap_servers = bootstrap_servers,
    max_poll_records = 10
)

consumer.subscribe(topics=[topic_name])
consumer.subscription()

for message in consumer:
    print ("%d:%d: k=%s v=%s" % (message.partition,
                                 message.offset,
                                 message.key,
                                 message.value))

---

## Configuration

Kafka consumer API https://kafka-python.readthedocs.io/en/master/apidoc/KafkaConsumer.html


**auto.offset.reset**

This property controls the behavior of the consumer when it starts reading a partition for which it doesn’t have a committed offset or if the committed offset it has is invalid (usually because the consumer was down for so long that the record with that offset was already aged out of the broker). The default is “latest,” which means that lacking a valid offset, the consumer will start reading from the newest records (records that were written after the consumer started running). The alternative is “earliest”

for example:


In [None]:
group_id = f"{username}-group1"
client_id = f"{username}-client1"

consumer = KafkaConsumer(
    client_id = client_id,
    group_id = group_id + "_2",
    bootstrap_servers = bootstrap_servers,
    max_poll_records = 10,
    auto_offset_reset = "earliest"
)

consumer.subscribe(topics=[topic_name])
consumer.subscription()

for message in consumer:
    print ("%d:%d: k=%s v=%s" % (message.partition,
                                 message.offset,
                                 message.key,
                                 message.value))

**enable.auto.commit**

This parameter controls whether the consumer will commit offsets automatically, and defaults to true. Set it to false if you prefer to control when offsets are committed, which is necessary to minimize duplicates and avoid missing data. 

**client.id**

This can be any string, and will be used by the brokers to identify messages sent from the client. It is used in logging and metrics, and for quotas.

**group.id**

This can be any string and is used to provide a consumer with static group membership.

---

## Deserializer

### JSON deserializer

In [None]:
from kafka import KafkaConsumer
import json

group_id = f"{username}-group1-json"
client_id = f"{username}-client1"

consumer = KafkaConsumer(
 client_id = client_id,
 group_id = group_id,
 bootstrap_servers = bootstrap_servers,
 value_deserializer = lambda v: json.loads(v.decode('ascii')),
 key_deserializer = lambda v: json.loads(v.decode('ascii')),
 max_poll_records = 10
)

consumer.subscribe(topics=[f"{topic_name}_json"])
consumer.subscription()

for message in consumer:
    print ("%d:%d: k=%s v=%s" % (message.partition,
                                 message.offset,
                                 message.key,
                                 message.value))


---

## If possible use confluent-kafka-python library ([or other specific lang](https://docs.confluent.io/platform/current/clients/index.html))

Install libs, [documentation](https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html)

In [None]:
!pip install fastavro
!pip install pyrsistent
!pip install jsonschema
!pip install protobuf
!pip install requests
!pip install pycodestyle
!pip install "avro-python3==1.9.2"
!pip install confluent-kafka==1.7.0

Initialize consumers

all configurations could be seen in here https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md

In [None]:
from confluent_kafka import Consumer

group_id = f"{username}-group2"
client_id = f"{username}-client2"


consumer = Consumer({
    'bootstrap.servers': bootstrap_servers,
    'group.id': group_id,
    'client.id': client_id,
    'auto.offset.reset': 'earliest'
})

topic_name = f"{username}-topic2"

consumer.subscribe([topic_name])

while True:
    msg = consumer.poll(1.0)

    if msg is None:
        continue
    if msg.error():
        print("Consumer error: {}".format(msg.error()))
        continue

    print('Received message: {}'.format(msg.value().decode('utf-8')))

consumer.close()

### Deserializer

#### JSON

In [None]:
from confluent_kafka import DeserializingConsumer
from confluent_kafka.schema_registry.json_schema import JSONDeserializer
from confluent_kafka.serialization import StringDeserializer

topic_name_json = f"{topic_name}_json"

schema_str = """
{
    "$schema": "http://json-schema.org/draft-07/schema#",
    "title": "User",
    "description": "A Confluent Kafka Python User",
    "type": "object",
    "properties": {
    "name": {
        "description": "User's name",
        "type": "string"
    },
    "favorite_number": {
        "description": "User's favorite number",
        "type": "number",
        "exclusiveMinimum": 0
    },
    "favorite_color": {
        "description": "User's favorite color",
        "type": "string"
    }
    },
    "required": [ "name", "favorite_number", "favorite_color" ]
}
"""

json_deserializer = JSONDeserializer(schema_str)
string_deserializer = StringDeserializer('utf_8')

consumer_conf = {'bootstrap.servers': bootstrap_servers,
                    'key.deserializer': string_deserializer,
                    'value.deserializer': json_deserializer,
                    'group.id': group_id + '-json',
                    'auto.offset.reset': "earliest"}

consumer = DeserializingConsumer(consumer_conf)
consumer.subscribe([topic_name_json])

while True:
    try:
        # SIGINT can't be handled when polling, limit timeout to 1 second.
        msg = consumer.poll(1.0)
        if msg is None:
            continue

        user = msg.value()
        if user is not None:
            print(user)
    except KeyboardInterrupt:
        break

consumer.close()

#### Avro

In [None]:
from confluent_kafka.avro import AvroConsumer
from confluent_kafka.avro.serializer import SerializerError

topic_name_avro = f"{username}-topic2_avro"
group_id = f"{username}-group2-avro"
client_id = f"{username}-client2-avro"

consumer = AvroConsumer({
    'bootstrap.servers': bootstrap_servers,
    'group.id': group_id,
    'client.id': client_id,
    'auto.offset.reset': 'earliest',
    'schema.registry.url': schema_registry_url
})

consumer.subscribe([topic_name_avro])

while True:
    try:
        msg = consumer.poll(1)

    except SerializerError as e:
        print("Message deserialization failed for {}: {}".format(msg, e))
        break

    if msg is None:
        continue

    if msg.error():
        print("AvroConsumer error: {}".format(msg.error()))
        continue

    print(msg.value())

consumer.close()

---

## Partition

different consumer group has different offset from partition


In [None]:
from confluent_kafka import Consumer

group_id = f"{username}-group2-a"
client_id = f"{username}-client2-a"


consumer = Consumer({
    'bootstrap.servers': bootstrap_servers,
    'group.id': group_id,
    'client.id': client_id,
    'auto.offset.reset': 'earliest'
})

topic_name = f"{username}-topic2"

consumer.subscribe([topic_name])

while True:
    msg = consumer.poll(1.0)

    if msg is None:
        continue
    if msg.error():
        print("Consumer error: {}".format(msg.error()))
        continue

    print('Received message: {}'.format(msg.value().decode('utf-8')))

consumer.close()

In [None]:
from confluent_kafka import Consumer

group_id = f"{username}-group2-b"
client_id = f"{username}-client2-b"


consumer = Consumer({
    'bootstrap.servers': bootstrap_servers,
    'group.id': group_id,
    'client.id': client_id,
    'auto.offset.reset': 'earliest'
})

topic_name = f"{username}-topic2"

consumer.subscribe([topic_name])

while True:
    msg = consumer.poll(1.0)

    if msg is None:
        continue
    if msg.error():
        print("Consumer error: {}".format(msg.error()))
        continue

    print('Received message: {}'.format(msg.value().decode('utf-8')))

consumer.close()