<div style="text-align: center; line-height: 0; padding-top: 2px;">
  <img src="https://www.quantiaconsulting.com/logos/quantia_logo_orizz.png" alt="Quantia Consulting" style="width: 600px; height: 250px">
</div>

# Synchronous and Asynchronous write

The producer uses the `produce(...)` method to trigger the message production operation.

The `produce(...)` method is asynchronous, it enques messages in an internal queue and gives back the control to the user.

An internal thread will send the queued messages in order.

A writing block can be:
* asynchronous: the [poll([timeout])](https://docs.confluent.io/current/clients/confluent-kafka-python/index.html#confluent_kafka.Producer.poll) function asynchronously propagates the results of the produce operation to the callback function without blocking the operation flux.
* synchronous: the [flush([timeout])](https://docs.confluent.io/current/clients/confluent-kafka-python/index.html#confluent_kafka.Producer.flush) function waits for all messages in the Producer queue to be delivered before release the control.

Both `poll([timeout])` and `flush([timeout])` accept a float parameter named timeout. 

The timeout represents the `maximum time to block waiting for events expressed in seconds`. 

Both functions wait, at maximum, <timestamp> seconds before returning an error to the callback function.

## Initialization

Let's initialize a producer and test `poll([timeout])` and `flush([timeout])`

In [None]:
from confluent_kafka import Producer, KafkaError
import json
import qcutils
import time

servers=qcutils.read_config_value("kafka.server") + ":" + str(qcutils.read_config_value("kafka.port"))

topic = ''

assert len(topic) > 0, "In order to avoid conflicts during write operation, please name the topic as <surname>-topic"

qcutils.create_kafka_topic(topic)

producerconf = {
    'bootstrap.servers': servers
}


p = Producer(producerconf)

## Synchronous Production

In the previous notebooks we exploited the synchronous production via the `flush([timeout])` method.

Typically, the synchronous writes can, effectively, limit throughput to the broker round trip time, but, in some cases, it can be really useful.

In [None]:
%load_ext autotime
    
for n in range(15):
    record_key = "qc-key"
    record_value = json.dumps({'count': n})
    print("Producing record: {}\t{}".format(record_key, record_value))
    p.produce(topic, key=record_key, value=record_value)
    p.flush(10)

print("{} messages were produced to topic {}!".format(n + 1, topic))


In our example, the `flush([timeout])` function of the Producer is called at each iteration.

The `flush([timeout])` is a blocking operation and call it too often, the throughput may fall.

Let's try to call it once.

In [None]:
%load_ext autotime
    
for n in range(15):
    record_key = "qc-key"
    record_value = json.dumps({'count': n})
    print("Producing record: {}\t{}".format(record_key, record_value))
    p.produce(topic, key=record_key, value=record_value)

p.flush(10)

print("{} messages were produced to topic {}!".format(n + 1, topic))


What's the main difference?

Look at the **execution time**! It is an order of magnitude faster than before...

If you need to wait the message delivery at each iteration, the throughput falls.

.... But how can we check for the production report?

## Check the production results (Callback Function)

In order to receive notification of delivery success or failure, you can define a callback function and pass it as a parameter to the `produce(...)` function.

Let's first try to check the production results.

In [None]:
def delivery_report(err, msg):
    if err is not None:
        print("Failed to deliver message: {}".format(err))
    else:
        print("Produced record to topic {} partition [{}] @ offset {}"
              .format(msg.topic(), msg.partition(), msg.offset()))

for n in range(15):
    record_key = "qc-key"
    record_value = json.dumps({'count': n})
    print("Producing record: {}\t{}".format(record_key, record_value))
    p.produce(topic, key=record_key, value=record_value, on_delivery=delivery_report)

p.flush(1)

print("{} messages were produced to topic {}!".format(n + 1, topic))


.... and what if we need to know the delivery report at each iteration with a comparable throughput??

We need to produce asynchronously

## Asynchronous Production

During the aynchronous write, we call the `poll([timeout])` function.

The `poll([timeout])` function tell the system to wait up to `timeout` seconds for events.

In [None]:
def delivery_report(err, msg):
    if err is not None:
        print("Failed to deliver message: {}".format(err))
    else:
        print("Produced record to topic {} partition [{}] @ offset {}"
              .format(msg.topic(), msg.partition(), msg.offset()))

for n in range(15):
    record_value = json.dumps({'count': n})
    print("Producing record: {}".format(record_value))
    p.produce(topic, value=record_value, on_delivery=delivery_report)
    p.poll(0)

print("{} messages were produced to topic {}!".format(n + 1, topic))


Now we can see the report deliveries interlived to the production "log" with a comparable execution time (and consequently production throughput)

## Final Note and Good Practice


### Poll and Flush together
At the end of all the wtiting operations, even if we use the Asynchronous way, it is a good practice to call `flush()`.

`flush()` should be called before shutting down the producer to ensure all outstanding/queued/in-flight messages are delivered.

### Key Recall
You must specify the message value during the production, but the key is not mandatory. If you don't specify any key, the system will automatically put `null` as message key. 

Specify a key represent a good practice. The message key could be useful during data preparation and data analysis phase.


##### ![Quantia Tiny Logo](https://www.quantiaconsulting.com/logos/quantia_logo_tiny.png) 2020 Quantia Consulting, srl. All rights reserved.