## Sending data to a Kafka server

This notebook uses the [Python client for the Apache Kafka distributed stream processing system](http://kafka-python.readthedocs.io/en/master/index.html) to send messages to a Kafka server. 

* Sensor data is available from https://uv.ulb.ac.be/pluginfile.php/923479/course/section/165902/data.conv.txt.gz
* Sensor location is available from https://uv.ulb.ac.be/pluginfile.php/923479/course/section/165902/mote_locs.txt

In this example, Kafka is used to send messages containing the temperature data of sensor 1, from the 28/02 to the 06/03.

You need to have Kafka and Zookeeper servers running to execut this notebook. If you use the Docker course container, or work on the course cluster, these servers should already be running. Otherwise, you may start them on your machine with

```
nohup $KAFKA_PATH/bin/zookeeper-server-start.sh $KAFKA_PATH/config/zookeeper.properties  > $HOME/zookeeper.log 2>&1 &
nohup $KAFKA_PATH/bin/kafka-server-start.sh $KAFKA_PATH/config/server.properties > $HOME/kafka.log 2>&1 &
```

where `KAFKA_PATH` points to the folder containing Kafka. See https://kafka.apache.org/quickstart for how to install Kafka on your machine. 


### General import

In [1]:
from kafka import KafkaProducer
import time
import numpy as np
import pandas as pd

### Load measurements, sort by Date/Time, add relative number of seconds since beginning

In [2]:
DATA_LOCATION = "../data"
FILE = "data.conv.txt"
data_file = "{}/{}".format(DATA_LOCATION, FILE)

#Takes about one minute to load
data=pd.read_csv(data_file,header=None,sep=" ")
data.columns=["Date","Hour","Sensor","Value","Voltage"]
data=data.sort_values(['Date','Hour']).reset_index(drop=True)

In [3]:
data['datetime']=pd.to_datetime(data.Date+' '+data.Hour)
data['relative_datetime']=data['datetime']-data['datetime'][0]
data['seconds']=data['relative_datetime'].dt.total_seconds()

In [4]:
sensorId_type=data.Sensor.str.split("-",expand=True)
sensorId_type.columns=['SensorId','Type']
data['SensorId']=sensorId_type['SensorId'].astype(int)
data['Type']=sensorId_type['Type'].astype(int)


In [5]:
#Drop features not needed for the simulation
data=data.drop(['datetime','relative_datetime','Sensor','Date','Hour','Voltage'],axis=1)

### Select temperature data from sensor 1 and 24

In [6]:
temp = data[((data.SensorId==1) | (data.SensorId==24)) & (data.Type==0)]
temp=temp.reset_index(drop=True).drop(['Type'], axis=1)

### Create  Kafka producer

In [7]:
producer = KafkaProducer(bootstrap_servers='localhost:9092')

### Stream data

We simulate the streaming of data by sending every five seconds the set of measurements collected during one day. This allows to speed up the simulation (for 8 days - from 28/02/2017 to 7/03/2017: 8*15=120 seconds).


In [8]:
interval=15

#Start at relative day 0 (2017-02-28)
day=0

#For synchronization with receiver (for the sake of the simulation), starts at a number of seconds multiple of 'interval'
current_time=time.time()
time_to_wait=interval-current_time%interval
time.sleep(time_to_wait)

#Loop for sending messages to Kafka with the topic persistence
for day in range(0,8):
    
    time_start=time.time()
    
    #Select sensor measurements for the corresponding relative day
    data_current_day=temp[(temp.seconds>=day*86400) & (temp.seconds<(day+1)*86400)]
    data_current_day=data_current_day.dropna()
    #For all measurements in that hour
    for i in range(len(data_current_day)):
        #Get data
        current_data=list(data_current_day.iloc[i])
        #Transform list to string
        message=str(current_data)
        #Send
        producer.send('SGD',message.encode())
    
    time_to_send=time.time()-time_start
    print("Time to send "+str(len(data_current_day))+" measurements (day "+str(day)+" ) : "+str(time_to_send))

    day=day+1
    
    time.sleep(interval-time_to_send)

Time to send 7332 measurements (day 0 ) : 8.872701168060303
Time to send 3880 measurements (day 1 ) : 9.067212343215942
Time to send 3894 measurements (day 2 ) : 9.510358810424805
Time to send 3610 measurements (day 3 ) : 8.673032760620117
Time to send 3527 measurements (day 4 ) : 8.320244073867798
Time to send 3397 measurements (day 5 ) : 7.125953197479248
Time to send 4315 measurements (day 6 ) : 7.819364070892334
Time to send 4503 measurements (day 7 ) : 12.99349594116211
