# 3.3 Sending a PM2.5 data stream to the Kafka broker (Step-2)

This Jupyter Notebook will be used as a producer application for our Kafka workflow. We will use the CSV file generated in the previous step and send each data record as a message to the kafka broker.

In [None]:
## Import Libraries
import pandas as pd # to handle tabular data
import json # to handle data operations in json format
from confluent_kafka import Producer # Kafka producer library to enable streaming functionalities
import socket # to get network properties for kafka communication

In [None]:
def acked(err, msg):

    '''
    This function handles callback for Kafka Producer. It handles the error/success messages
    '''
    
    if err is not None:
        print("Failed to deliver message: %s: %s" % (str(msg.value()), str(err)))
    else:
        print("Message produced: %s" % (str(msg.value())))

The next section runs in the following order:

1. Initialise variables: CSV filepath and topic name
2. Topic: Every Kafka message should be associated to a kafka topic, it can be named anything
3. conf: Init Kafka server with IP:Port

In [None]:
topic = "pm25_stream"
p_key = "../data/sample_multilocation.csv"

In [None]:
### START: AVOID MAKING CHANGES ###

### This section defines how to connect to the Kafka server the configuration
### of which is defined in the docker file. Changing this section including port numbers
### or other variables can result in broken kafka connection and the data cannot be streamed then

conf = {'bootstrap.servers': "kafka:9093", 'client.id': socket.gethostname()}
producer = Producer(conf)

### END: AVOID MAKING CHANGES ###

In [None]:
# Read CSV using Pandas
df = pd.read_csv(p_key)

In [None]:
# Init For Loop for number of records in CSV file
for i in range(df.shape[0]): ## df.shape returns dimension of the dataframe (rows, columns)

    result = {} ## Init Dict
    result[df.loc[i,'value']] = [df.loc[i,'lat'], df.loc[i,'lon'], str(df.loc[i,'day']), df.loc[i,'boxId']]

    '''
    Format of result JSON:

    {
        'pm25_value_1': [lat, lon, day, boxId]
    }
    '''

    # Store as JSON as Kafka supports JSON transmission as standard
    result = json.dumps(result)
    
    ## Key is optional, used to categorize messages by partition, in this case all messages get the same partition name. 
    ## Is Mostly used for scalability 

    producer.produce(topic, key=p_key, value=result, callback=acked) ## Message is stored in broker memory at this point

    ## Complete the sending of message from buffer (broker memory) and get an acknowledgement
    producer.flush()

# KAFKA CONSUMER

At this point you should have successfully streamed the downloaded data into the kafka broker where it is currently stored in a temporary file.

#### END STEP - 2