# Kafka Consumer Example - Saving Messages

For an introduction to [Kafka](https://kafka.apache.org/), you may want to read some of the main [concepts](https://kafka.apache.org/documentation/#intro_concepts_and_terms).  An **event** records the fact that "something happened". An event has a key, value, timestamp, and optional metadata headers. **Producers** are those client applications that publish (write) events to Kafka, and **consumers** are those that subscribe to (read and process) these events.

This is an example of how to write a very simple Kafka [consumer](https://kafka-python.readthedocs.io/en/master/apidoc/KafkaConsumer.html) using the kafka-python library.  This example connects to a Kafka broker configured with the `SASL_PLAIN` security protocol. It receives simple JSON formatted strings on a kafka **topic** and saves them to the filesystem.  This is a single consumer, but could be part of a **consumer group**. A consumer group is a set of consumers which cooperate to consume data from some topics.

For a sample producer, refer to the notebook [1_kafka_producer.ipynb](./1_kafka_producer.ipynb) to send some sample events.

For further reading, visit the [documentation](https://kafka.apache.org/documentation/) for Kafka and for [kafka-python](https://kafka-python.readthedocs.io/)

## Dependencies

- [kafka-python](https://pypi.org/project/kafka-python/) Python client for the Apache Kafka distributed stream processing system. kafka-python is designed to function much like the official java client, with a sprinkling of pythonic interfaces (e.g., consumer iterators).

*Note:  In general, you want to manage your dependencies in a `requirements.txt` file. For easy demonstration, we've installed the library inline here.*


In [None]:
!pip install kafka-python

## Connection Information

Generally, much of your connection information (servers, username,  password) will be injected as environment variables.  This prevents a user from uploading private information to source control.

#### Expected Environment Variables
- `KAFKA_BOOTSTRAP_SERVER` location of the Kafka Bootstrap Server.  e.g. 'my-kafka-bootstrap.namespace.svc.cluster.local:9092'
- `KAFKA_USERNAME` SASL username or client ID
- `KAFKA_PASSWORD` SASL password or client secret.
- `KAFKA_TOPIC` Name of Kafka topic to consume messages from.


In [None]:
import os

# location of the Kafka Bootstrap Server loaded from the environment variable.
# e.g. 'my-kafka-bootstrap.namespace.svc.cluster.local:9092'
KAFKA_BOOTSTRAP_SERVER = os.environ.get('KAFKA_BOOTSTRAP_SERVER')

# SASL username or client ID
KAFKA_USERNAME = os.environ.get('KAFKA_USERNAME')

# SASL password or client secret
KAFKA_PASSWORD = os.environ.get('KAFKA_PASSWORD')

# Name of the topic for the producer to send messages.  Consumers will listen to this topic for events.
KAFKA_TOPIC = os.environ.get('KAFKA_TOPIC') or 'notebook-test'

# Kafka consumer group to which this consumer belongs
KAFKA_CONSUMER_GROUP = 'notebook-consumer-save-files'


## Creating the Consumer

This function will create a consumer that connects to the Kafka server set by variable `KAFKA_BOOTSTRAP_SERVER` and listen to the topic set by variable `KAFKA_TOPIC`.  The consumer receive and save messages to the file system.  The consumer will run until the kernel is stopped.

In [None]:
from kafka import KafkaConsumer
import json
from pathlib import Path

def create_consumer_save_files():
    consumer = KafkaConsumer(KAFKA_TOPIC,
                             group_id=KAFKA_CONSUMER_GROUP,
                             bootstrap_servers=[KAFKA_BOOTSTRAP_SERVER],
                             security_protocol='SASL_SSL',
                             sasl_mechanism='PLAIN',
                             sasl_plain_username=KAFKA_USERNAME,
                             sasl_plain_password=KAFKA_PASSWORD,
                             auto_offset_reset='earliest',
                             api_version_auto_timeout_ms=30000,
                             request_timeout_ms=450000)

    print(f'Subscribed to "{KAFKA_BOOTSTRAP_SERVER}" consuming topic "{KAFKA_TOPIC}"...')
    Path("messages").mkdir(parents=True, exist_ok=True)

    try:
        for record in consumer:
            timestamp = record.timestamp
            filename = f'messages/{timestamp}.json'
            msg = record.value.decode('utf-8')
            topic = record.topic
            json_data = json.loads(msg)
            print(('Received the following message on the '
                   f'topic "{topic}": {msg}'))
            with open(filename, 'w') as outfile:
                print(f'Writing msg to "{filename}"')
                outfile.write(msg)

    finally:
        print("Closing consumer...")
        consumer.close()
    print("Kafka consumer stopped.")



## Start Listening

Starts the consumer.  Listens to events and saves the values to the filesystem.  **Stop the kernel to quit**

In [None]:
try:
    create_consumer_save_files()
except KeyboardInterrupt:
    print('Stopped')
