<div style="text-align: center; line-height: 0; padding-top: 2px;">
  <img src="https://www.quantiaconsulting.com/logos/quantia_logo_orizz.png" alt="Quantia Consulting" style="width: 600px; height: 250px">
</div>

# Kafka Producer

**Technical Accomplishments:**
- Start working with Kafka
- Introduce the python `Producer`
- Produce data to a kafka topic
- Introduce the encoding problem

Can we produce messages using a programmatically usable interface?

YES! Let's use **python**

Let's have a look to the Producer documentation of the [confluent_kafka](https://docs.confluent.io/current/clients/confluent-kafka-python/#producer) library

As for the documentation, the method to call, in order to send a new message, is [`produce()`](https://docs.confluent.io/current/clients/confluent-kafka-python/#confluent_kafka.Producer.produce)

In particular, the `produce()` method accepts a parameter, named `value`, that must be `String` or `Bytes`

Let's now create a new `Producer` and play with it.

In order to start playing with `Producer` in python, we will consume messages with a `console consumer`

## Python Producer

Let's start importing libraries and creating useful variables 

In [None]:
%load_ext autotime

In [None]:
from confluent_kafka import Producer, KafkaError
import json
import qcutils

servers=qcutils.read_config_value("kafka.server") + ":" + str(qcutils.read_config_value("kafka.port"))

topic = ''

assert len(topic) > 0, "In order to avoid conflicts during write operation, please name the topic as <surname>-topic"

qcutils.create_kafka_topic(topic)

In [None]:
producerconf = {
        'bootstrap.servers': servers,
    }


p = Producer(producerconf)

## Open Console Consumer

* Open a new Terminal
* From the home folder go to resources/kafka_2.12-2.4.1 folder -> `cd resources/kafka_2.12-2.4.1`
* Run console consumer on **your topic**, e.g., **MR**-topic -> `./bin/kafka-console-consumer.sh --bootstrap-server cp-cp-kafka.cp:9092 --topic <topic>`

## Start the production of messages

### Text Messages

Let's try to produce some text ....

In [None]:
import time

record_key = None
record_value = "some text"

print("Producing record: {}\t{}".format(record_key, record_value))

p.produce(topic, key=record_key, value=record_value)

print("one message was produced to topic {}!".format(topic))

... and even some strange text

In [None]:
record_key = None
record_value = "some strange string...§§§Ᾱ▼☞😅"

print("Producing record: {}\t{}".format(record_key, record_value))

p.produce(topic, key=record_key, value=record_value)

print("one message was produced to topic {}!".format(topic))

### Numbers (int)

It looks like it works.... even with strange strings and characters... but how can we manage **numbers**???

In [None]:
record_key = None
record_value = 10

print("Producing record: {}\t{}".format(record_key, record_value))

p.produce(topic, key=record_key, value=record_value)

print("one message was produced to topic {}!".format(topic))

OPS! Remember that `produce()` accepts only Strings or Bytes

...so, let's try with String (using `str()` function)...

In [None]:
record_key = None
record_value = str(10)

print("Producing record: {}\t{}".format(record_key, record_value))

p.produce(topic, key=record_key, value=record_value)

print("one message was produced to topic {}!".format(topic))

#### Discussion

Good, it works, but let's look to the size of what we are sending

In [None]:
import sys

print("String: " + str(sys.getsizeof(str(10))) + " bytes")

print('"Plain" int: ' + str(sys.getsizeof(10)) + " bytes")

In [None]:
import sys

print("Larger String: " + str(sys.getsizeof(str(1000000000))) + " bytes")

print('Larger "Plain" int: ' + str(sys.getsizeof(1000000000)) + " bytes")

In [None]:
import sys

print("Empty String: " + str(sys.getsizeof(str())) + " bytes")

..., but as we discussed before when talking about latency vs. throughput vs. size, the **larger is the size** the **smaller is the throughput**

A single String is not a problem, but larger messages at scale is a **bad idea**...

### Bytes

...so, let's go with Bytes!

We can use the `to_bytes(...)` function (see [doc](https://docs.python.org/3/library/stdtypes.html#int.to_bytes)) to encode int value.

In [None]:
record_key = None
record_value = (10).to_bytes(2, byteorder='big')

print("Producing record: {}\t{}".format(record_key, record_value))

p.produce(topic, key=record_key, value=record_value)

print("one message was produced to topic {}!".format(topic))

We got an empty line ... why?

Maybe the problem is related to numbers...let's try with some text!

[Encoding](https://docs.python.org/3/library/stdtypes.html#str.encode) a string is the way to transform it into Bytes!

We can use one of the existing [codec](https://docs.python.org/3/library/codecs.html#standard-encodings).

Let's try to use [ASCII](https://en.wikipedia.org/wiki/ASCII) encoding.

![](img/ascii.png)

In [None]:
record_key = None
record_value = ":-)".encode("ascii")

print("Producing record: {}\t{}".format(record_key, record_value))

p.produce(topic, key=record_key, value=record_value)

print("one message was produced to topic {}!".format(topic))

It works! Now, as before, let's try again with some strange text... 

In [None]:
record_key = None
record_value = ":-) 😅".encode("ascii")

print("Producing record: {}\t{}".format(record_key, record_value))

p.produce(topic, key=record_key, value=record_value)

print("one message was produced to topic {}!".format(topic))

OPS! Not all characters can be encoded using ASCII. 
... We can manage errors ([doc](https://docs.python.org/3/library/stdtypes.html#str.encode)) by *rising exception*, *replacing* or *ignoring* them.

In [None]:
record_key = None
record_value = ":-) 😅".encode("ascii",errors='replace')

print("Producing record: {}\t{}".format(record_key, record_value))

p.produce(topic, key=None, value=record_value)

print("one message was produced to topic {}!".format(topic))

The chars that can't be encoded are replaced by `?`

.... We can change the encoding codec.

Let's try with [UTF-8](https://en.wikipedia.org/wiki/UTF-8)

In [None]:
record_key = None
record_value = ":-) 😅".encode("utf-8")

print("Producing record: {}\t{}".format(record_key, record_value))

p.produce(topic, key=record_key, value=record_value)

print("one message was produced to topic {}!".format(topic))

It works!!

### Back to numbers (int)
And the numbers?


We can treat them as a string!

In [None]:
record_key = None
record_value = "10".encode("utf-8")

print("Producing record: {}\t{}".format(record_key, record_value))

p.produce(topic, key=record_key, value=record_value)

print("one message was produced to topic {}!".format(topic))

### Discussion

...but, Let's take a look again to the size in memory of the different data type

In [None]:
import sys

print("String: " + str(sys.getsizeof(str(10))) + " bytes")

print('"Plain" int: ' + str(sys.getsizeof(10)) + " bytes")

print("Encoded String (ASCII): " + str(sys.getsizeof("10".encode("ascii"))) + " bytes")

print("Encoded String (UTF-8): " + str(sys.getsizeof("10".encode("utf-8"))) + " bytes")

print("Encoded int: " + str(sys.getsizeof((10).to_bytes(2, byteorder='big'))) + " bytes")

String is too big.

"Plain" int can't be sent.

Encoded version of String and int are of the same size, but why do we need to send a String instead of an int?

What if the problem is the `Consumer`?

## Create a [Python Consumer](simple-consumer.ipynb)

Let's try to write a new **Consumer** using python lib and send again encoded int and encoded String.

In [None]:
record_key = None
record_value = "some strange string...§§§Ᾱ▼☞😅".encode("UTF-8")

print("Producing record: {}\t{}".format(record_key, record_value))

p.produce(topic, key=record_key, value=record_value)

print("one message was produced to topic {}!".format(topic))

In [None]:
record_key = None
record_value = (10).to_bytes(2, byteorder='big')

print("Producing record: {}\t{}".format(record_key, record_value))

p.produce(topic, key=record_key, value=record_value)

print("one message was produced to topic {}!".format(topic))

___

### End notes

If you want to learn more [ttps://docs.python.org/3/howto/unicode.html](https://docs.python.org/3/howto/unicode.html)

##### ![Quantia Tiny Logo](https://www.quantiaconsulting.com/logos/quantia_logo_tiny.png) 2020 Quantia Consulting, srl. All rights reserved.