# Kafka client tutorial
Kafka allow you to create **topics**, and **produce** and **consume** message inside those topics.

To do so we'll need too install the kafka-python library to allow us to connect to our kafka brokers.

In [1]:
!pip install kafka-python

Collecting kafka-python
  Using cached kafka_python-2.0.2-py2.py3-none-any.whl (246 kB)
Installing collected packages: kafka-python
Successfully installed kafka-python-2.0.2


## Configuration

Here are some examples of configuration you can  override.

| Configuration                     | Description                                                                                                                              | Example Values                                           |
| --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------- |
| **Topic Name**                    | The name of the topic.                                                                                                                   | `my_topic`                                               |
| **Number of Partitions**          | `num_partitions`: The number of partitions in the topic. Partitions allow for parallel processing of messages.                           | `3`, `5`, `10`                                           |
| **Replication Factor**            | `replication_factor`: The number of replicas for each partition. Replicas provide fault tolerance.                                       | `2`, `3`, `4`                                            |
| **Retention Policy**              | `retention_ms` or `retention_bytes`: The retention period for messages in the topic.                                                     | `86400000` (milliseconds), `100000000` (bytes)           |
| **Cleanup Policy**                | `cleanup.policy`: The policy to use for log retention. Options include "delete" or "compact" for compacted topics.                       | `delete`, `compact`                                      |
| **Segment Size**                  | `segment.bytes`: The maximum size of a log segment file for the topic.                                                                   | `1073741824` (1 GB), `536870912` (512 MB)                |
| **Segment Index Size**            | `segment.index.bytes`: The maximum size of an index file for the topic.                                                                  | `1048576` (1 MB), `524288` (512 KB)                      |
| **Min In-Sync Replicas**          | `min.insync.replicas`: The minimum number of in-sync replicas required for a producer to consider a write complete.                      | `1`, `2`                                                 |
| **Compression Type**              | `compression.type`: The compression type for the topic, such as "gzip", "snappy", or "lz4".                                              | `gzip`, `snappy`, `lz4`                                  |
| **Message Format Version**        | `message.format.version`: The version of the message format for the topic.                                                               | `0.8.2`, `1.0`, `2.8`                                    |
| **Unclean Leader Election**       | `unclean.leader.election.enable`: Whether unclean leader election is allowed.                                                            | `true`, `false`                                          |
| **Auto Creation**                 | `auto.create.topics.enable`: Whether topics are automatically created on the server when a client makes a request with an unknown topic. | `true`, `false`                                          |
| **Key Serializer**                | `key.serializer`: The serializer for message keys.                                                                                       | `org.apache.kafka.common.serialization.StringSerializer` |
| **Value Serializer**              | `value.serializer`: The serializer for message values.                                                                    | `org.apache.kafka.common.serialization.StringSerializer` |ializaes needed     |

In [3]:
# Kafka configuration

kafka_config = {
    'bootstrap_servers': 'kafka:19092',
    'key_serializer': 'org.apache.kafka.common.serialization.StringSerializer',
    'value_serializer': 'org.apache.kafka.common.serialization.StringSerializer',
    'topic': 'topic-001',
    'num_partitions': 1,
    'replication_factor': 1,
    'auto_offset_reset': 'earliest',
    'auto.create.topics.enable': False
}


## Create a topic
Creatig a topic isn't necessary if the ***auto.create.topics.enable*** setting is True. You can directly write inside it and it will automatically create it.

In [3]:
from kafka.admin import KafkaAdminClient, NewTopic

# Set up the Kafka admin client with the Kafka broker address
admin_client = KafkaAdminClient(bootstrap_servers=kafka_config['bootstrap_servers'])

name = input('Topic name: ')

# Create a NewTopic instance with the desired topic configuration
new_topic = NewTopic(name               = name, # kafka_config['topic'],
                     num_partitions     = kafka_config['num_partitions'],
                     replication_factor = kafka_config['replication_factor'])

# Create the topic
admin_client.create_topics(new_topics=[new_topic])
print('Created !')

# Close the admin client
admin_client.close()


Topic name:  test


TopicAlreadyExistsError: [Error 36] TopicAlreadyExistsError: Request 'CreateTopicsRequest_v3(create_topic_requests=[(topic='test', num_partitions=1, replication_factor=1, replica_assignment=[], configs=[])], timeout=30000, validate_only=False)' failed with response 'CreateTopicsResponse_v3(throttle_time_ms=0, topic_errors=[(topic='test', error_code=36, error_message="Topic 'test' already exists.")])'.

## Produce message in a topic

In [None]:
from kafka import KafkaProducer
from kafka.errors import KafkaError

producer = KafkaProducer(bootstrap_servers=[kafka_config['bootstrap_servers']])

topic = input('topic: ')
message = input('message: ').encode()

# Asynchronous by default
future = producer.send(topic, message)

# Block for 'synchronous' sends
try:
    record_metadata = future.get(timeout=10)
    print('Message sent successfully.')
except KafkaError as e:
    # Decide what to do if produce request failed...
    print(e)
    pass

# Successful result returns assigned partition and offset
print(f'partition: {record_metadata.partition}')
print(f'offset: {record_metadata.offset}')

## Consume message from a topic
Choose a topic to listen, it will list the messages it contains.
Set the **auto_offset_reset** to **earliest** to see previous messages and to **latest** to only see the message in real-time.
>Stop the kernel to quit

In [None]:
from kafka import KafkaConsumer

topic = input('topic: ')

# To consume latest messages and auto-commit offsets
consumer = KafkaConsumer(topic,
                         bootstrap_servers  = [kafka_config['bootstrap_servers']],
                         auto_offset_reset  = kafka_config['auto_offset_reset'],
                        enable_auto_commit = True)
for message in consumer:
    # message value and key are raw bytes -- decode if necessary!
    # e.g., for unicode: `message.value.decode('utf-8')`
    print ("top:%s, part:%d, off:%d: key=%s value=%s" % (message.topic, message.partition,
                                          message.offset, message.key,
                                          message.value),
          end='\r')

topic:  meteo


top:meteo, part:0, off:4529: key=None value=b'{"timestamp": "2024-01-16 13:35:20.138322", "temperature": 20.32, "humidity": 40.3, "pressure": 1004.34, "sensor_id": 1, "location": {"lon": 3.18298, "lat": 42.471203}}'''

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



top:meteo, part:0, off:32103: key=None value=b'{"timestamp": "2024-01-16 17:25:41.428047", "temperature": 27.07, "humidity": 59.65, "pressure": 1009.64, "sensor_id": 2, "location": {"lon": 3.184122, "lat": 42.473493}}'

## List topics

In [3]:
from kafka.admin import KafkaAdminClient

# Set up the Kafka admin client with the Kafka broker address
admin_client = KafkaAdminClient(bootstrap_servers=kafka_config['bootstrap_servers'])

# List all the topic names in the cluster
topic_names = admin_client.list_topics()

# Print the list of topic names
for topic_name in topic_names:
    print(topic_name)

# Close the admin client
admin_client.close()


meteo
docker-connect-status
__consumer_offsets
docker-connect-offsets
docker-connect-configs


## Delete a topic 

In [4]:
from kafka.admin import KafkaAdminClient
from kafka.errors import KafkaError

# Set up the Kafka admin client with the Kafka broker address
admin_client = KafkaAdminClient(bootstrap_servers=kafka_config['bootstrap_servers'])

# Define the name of the topic you want to delete
topic_name = input('Topic name: ')

# Attempt to delete the topic
try:
    admin_client.delete_topics(topics=[topic_name])
    print('Topic deleted.')
except KafkaError:
    # Handle the case when the topic is already marked for deletion
    print(f"Topic '{topic_name}' is already marked for deletion.")
except Exception as e:
    # Handle other exceptions if necessary
    print(f"An error occurred while deleting the topic: {e}")

# Close the admin client
admin_client.close()


Topic name:  meteo


Topic deleted.
