<div style="text-align: center; line-height: 0; padding-top: 2px;">
  <img src="https://www.quantiaconsulting.com/logos/quantia_logo_orizz.png" alt="Quantia Consulting" style="width: 600px; height: 250px">
</div>

# Simple Python Kafka Avro Producer 

**Technical Accomplishments:**
- Start working with avro schema in Kafka
- Introduce the class `AvroProducer`
- Produce data for a Kafka avro topic

## Getting Started

Let's start importing libraries and creating useful variables 

In [None]:
%load_ext autotime

In [None]:
from confluent_kafka import avro
from confluent_kafka.avro import AvroProducer
import time
import qcutils

servers=qcutils.read_config_value("kafka.server") + ":" + str(qcutils.read_config_value("kafka.port"))

sr_url=qcutils.read_config_value("kafka.schema_registry.url")

topic = 'test-topic'

admin_conf = {'bootstrap.servers': servers}
a = AdminClient(admin_conf)
fs = a.create_topics([NewTopic(topic, num_partitions=1, replication_factor=1)])

for t, f in fs.items():
    try:
        f.result()  # The result itself is None
        print("Topic {} created".format(t))
    except Exception as e:
        print("Failed to create topic {}: {}".format(t, e))
        
sr_url=qcutils.read_config_value("kafka.schema_registry.url")

producerconf = {
        'bootstrap.servers': servers,
        'schema.registry.url': sr_url
    }

**Note**: in order to avoid conflicts during write operation, please name the topic as `<surname>-topic`

## Apache Avro

![](https://www.quantiaconsulting.com/img/Avro_Logo.svg.png)

Avro is a binary encoding format that uses a schema to specify the structure of the data being encoded.

Avro's encoding consists only of values concatenated together, and the there is nothing to identify fields or their datatypes in the byte sequence.

```json
{
"namespace": "kafka.exercise.avro",
 "type": "record",
 "name": "Observation",
 "fields": [
     {"name": "id", "type": "long", "doc" : "The observation id"},
     {"name": "value", "type": "double", "doc" : "The actual measurement from the sensor"},
     {"name": "measurement", "type": "string", "doc" : "The measurement type, e.g., temperature"},
     {"name": "timestamp", "type": "long", "doc" : "The measurement timestamp"}
 ]
}
```

## Schema Registry

![](https://www.quantiaconsulting.com/img/schema-registry.png)

Schema registry provides centralized management of schemas
* Stores a versioned history of all schemas
* Provides a RESTful interface for storing and retrieving Avro schemas
* Checks schemas and throws an exception if data does not conform to the schema – Allows evolution of schemas according to the configured compatibility setting

## Your first AvroProducer

### Define an avro schema for the key

> **NOTE**: **we** are also sharing the Schema Registry as in any real-world organization. The attribute `namespace` is meant to isolate the names in separsated space. **Add out initials to the `namespace`**, e.g., `example.avro.`**`MR`** if you are **M**ario **R**ossi. 

In [None]:
key_schema_str = """
{
  "namespace": "example.avro",
  "type": "record",
  "name": "PersonKey",
  "fields": [
    {
      "name": "name",
      "type": "string"
    }
  ]
}
"""

key_schema = avro.loads(key_schema_str)

### Define an avro schema for the value

> **NOTE**: **we** are also sharing the Schema Registry as in anuy real-world organization. The attribute `namespace` is meant to isolate the names in separsated space. **Add out initials to the `namespace`**, e.g., `example.avro.`**`MR`** if you are **M**ario **R**ossi. 

In [None]:
value_schema_str = """
{
  "namespace": "example.avro",
  "type": "record",
  "name": "PersonValue",
  "fields": 
    [{
      "name": "age",
      "type": "int"
    }]
}
"""

value_schema = avro.loads(value_schema_str)

### Create an AvroProducer

In [None]:
ap = AvroProducer(producerconf, default_key_schema=key_schema, default_value_schema=value_schema)

### Use the AvroProducer to send an avro message

In [None]:
key = {"name": "Abe"}
value = {"age": 22}
ap.produce(topic=topic, value=value, key=key)

**Let's now cosume this message with a [simple avro consumer](simple-avro-consumer.ipynb)**

## Discussion

* How does the AvroConsumer know about the schema of the key and the schema of the value?

# Let's evolve the schema

We want the schema of the value to carry also the `haircolor`

## Define a new schema for the value

> **NOTE**: **we** are also sharing the Schema Registry as in anuy real-world organization. The attribute `namespace` is meant to isolate the names in separsated space. **Add out initials to the `namespace`**, e.g., `example.avro.`**`MR`** if you are **M**ario **R**ossi. 

In [None]:
new_value_schema_str = """
{
  "namespace": "example.avro",
  "type": "record",
  "name": "PersonValue",
  "fields": [
    {
      "name": "age",
      "type": "int"
    },
    {
      "name": "haircolor",
      "type": "string",
      "default": "black"
    }
  ]
}
"""

new_value_schema = avro.loads(new_value_schema_str)

***NOTE***: the field `default` is **strictly** necessary to guarantee **backward compatibility**

## Use the same AvroProducer to send an avro message with the new schema

In [None]:
key = {"name": "Abe"}
value = {"age": 2, "haircolor": "Red"}
ap.produce(topic=topic, value=value, key=key, key_schema=key_schema, value_schema=new_value_schema)

##### ![Quantia Tiny Logo](https://www.quantiaconsulting.com/logos/quantia_logo_tiny.png) 2020 Quantia Consulting, srl. All rights reserved.