# Kafka Producer
The companion notebook `spark-data-federation.ipynb` in this repository uses data pulled from a Kafka topic.
This notebook reads the example data file `fraud.csv` and pushes the data rows onto Kafka so that
the federation example code will run.

This technique of using a Jupyter notebook to push data onto Kakfa is useful for testing and demonstrations.

### Managing python dependencies with pip

The following cells show how python `pip` can be called from inside a notebook to customize your
python packages.
The following cells are commented out because I generated customized image builds that Open Data Hub can use.
If you are running non-customized versions, you can uncomment and run these cells.

In [1]:
# %pip uninstall -y mlflow

In [2]:
# %pip install kafka-python pandas==1.0.3

In [3]:
from time import sleep
from json import dumps
from kafka import KafkaProducer

### Connecting to Kafka
The following cell sets up a Kafka connection and configures it to use JSON and UTF8 to encode data rows.

In [4]:
producer = KafkaProducer(bootstrap_servers=['odh-message-bus-kafka-brokers:9092'], \
                         value_serializer=lambda x: dumps(x).encode('utf-8'))

### Pushing data onto Kafka
The following cells load `fraud.csv` into a Pandas dataframe,
and push the resulting rows onto our Kafka topic.
The last cell will loop over all the rows, and may run for a long time:
if you wish to halt it, you can interrupt the kernel in the jupyter menu bar above.

In [5]:
import pandas as pd
data = pd.read_csv("fraud.csv")

In [6]:
for row in data.to_dict(orient='records'):
    producer.send('demotopic', row)
    sleep(0.1)

KeyboardInterrupt: 