# DSC 650 - Assignment 11 # 
## Simulating Kafka Producer Streams. A companion file consumes the Kafka stream. ##
**By: Kurt Stoneburner**

Some Suplemental Resources
Installing Kafka in Windows
- https://www.geeksforgeeks.org/how-to-install-and-run-apache-kafka-on-windows/

Built using Apache 2.8.1 on Windows

In [3]:
import json
import uuid
import os
import pandas as pd

from pathlib import Path
import kafka
from kafka import KafkaProducer, KafkaAdminClient
from kafka.admin.new_topic import NewTopic
from kafka.errors import TopicAlreadyExistsError

import pyarrow as pa
#from pyarrow.json import read_json
import pyarrow.parquet as pq

#//*** Get Working Directory
current_dir = Path(os.getcwd()).absolute()

#//*** Go up Two folders
kafka_dir = current_dir.parents[2].joinpath("kafka")
tmp_dir = kafka_dir.joinpath("tmp")
log_dir = kafka_dir.joinpath("logs")

**Kafka is being run as a separate local server. This section closes any running Kafka & Zookeeper processes. Removed any existing topics by deleting the files from disk. This is a sub-optimal solution. KafkaAdminClient.delete_topics() kept crashing and corrupting Kafka. It's likely a bug (or user error). I worked around the issue by manually deleting the files.**

**This method is sub optimal since topics are supposed to be streaming AND persistent. However I'm inferring the purpose of the assignment is to simulate streaming data and display that data at the requisite intervals.**

**Once data is deleted, fresh instances of Zookeeper and Kafka are launched.**

In [14]:
import time
#//*** Get Working Directory
current_dir = Path(os.getcwd()).absolute()

#//*** Go up Two folders
kafka_dir = current_dir.parents[2].joinpath("kafka")
tmp_dir = kafka_dir.joinpath("tmp")
log_dir = kafka_dir.joinpath("logs")

res = os.system(str(kafka_dir)+"\stop_kafka-server-stop.bat")
print("Stop Running Kafka Server: ", res)


res = os.system(str(kafka_dir)+"\stop_zookeeper-server-stop.bat")
print("Stop Running Zookeeper Server: ", res)
print("Waiting For everything to close gracefully.....")
time.sleep(5)
#//*** Delete all Kafka and Zookeeper logs.
#//*** We can't kill the partitions programtically with the Windows version of Kafka
"""
for root, dirs, files in os.walk(log_dir, topdown=False):
    #//*** Delete all the log files
    for file in files:
        os.remove(Path(root).joinpath(file))
"""        
        
for root, dirs, files in os.walk(tmp_dir, topdown=False):
    #//*** Delete all the log files
    for file in files:
        os.remove(Path(root).joinpath(file))

    for folder in dirs:
        os.rmdir(Path(root).joinpath(folder))

time.sleep(1)
print("Starting Zookeeper: ")
os.system(f"start {str(kafka_dir)}\start_Kafka_zookeeper.bat")


print("Starting Kafka Server: ")
os.system(f"start {str(kafka_dir)}\start_Kafka_server.bat")



Stop Running Kafka Server:  0
Stop Running Zookeeper Server:  0
Waiting For everything to close gracefully.....
Starting Zookeeper: 
Starting Kafka Server: 


0

**Build a dictionary of accelerations and locations. Each Time index stores a list of File Paths to the stored .parquet files. The data dictionary collects times and file locations as a convenient collection.**


In [4]:
#//*** Get Working Directory
current_dir = Path(os.getcwd()).absolute()

#//*** Go up Two folders
project_dir = current_dir.parents[2]

#//*** Bdd Data Path
project_dir = project_dir.joinpath("dsc650/data/processed/bdd")

accel_dir = project_dir.joinpath("accelerations")
location_dir = project_dir.joinpath("locations")
print("Accel Dir: ",os.listdir(accel_dir))
print("Location Dir: ",os.listdir(location_dir))

#//*** Build a list of times, to simulate packet transmission

#//*** Parse the dir, each directory represents a time. Convert the string to a float
#//*** This feels very pythonic. I ended up not using this. But it's still cool. I'm keeping it as a reference
times = [float(name.replace("t=","")) for name in os.listdir(location_dir)]

data = {
    "accelerations" : {},
    "locations" : {},
}

for root, dirs, files in os.walk(project_dir, topdown=False):

    #//*** Load each Parquet FilePath dictionary
    for file in files:
        key = ""
        if str(accel_dir) in root:
            key = "accelerations"

        if str(location_dir) in root:
            key = "locations"
        
        #//*** Convert the t= folder to a float time. This syncs the folder keys with the times
        time_index = float(root.split("\\")[-1].replace("t=",""))
        
        #//*** Build Time_index Keys as needed
        if time_index not in data[key].keys():
            data[key][time_index] = []
        
        data[key][time_index].append(Path(root).joinpath(file))
        

print("Parsed time Values in Seconds:",data["accelerations"].keys())
        

Accel Dir:  ['t=000.0', 't=004.5', 't=007.8', 't=010.6', 't=014.9', 't=017.9', 't=021.3', 't=026.1', 't=030.4', 't=033.7', 't=037.7', 't=041.5', 't=045.4', 't=049.5', 't=052.5', 't=056.4', 't=060.1', 't=063.8', 't=066.7', 't=070.9', 't=073.9', 't=077.1', 't=081.4', 't=085.1', 't=088.3', 't=091.7', 't=094.7', 't=098.8', 't=102.5', 't=106.0', 't=109.9', 't=113.2', 't=117.2', 't=121.4']
Location Dir:  ['t=000.0', 't=004.5', 't=007.8', 't=010.6', 't=014.9', 't=017.9', 't=021.3', 't=026.1', 't=030.4', 't=033.7', 't=037.7', 't=041.5', 't=045.4', 't=049.5', 't=052.5', 't=056.4', 't=060.1', 't=063.8', 't=066.7', 't=070.9', 't=073.9', 't=077.1', 't=081.4', 't=085.1', 't=088.3', 't=091.7', 't=094.7', 't=098.8', 't=102.5', 't=106.0', 't=109.9', 't=113.2', 't=117.2', 't=121.4']
Parsed time Values in Seconds: dict_keys([0.0, 4.5, 7.8, 10.6, 14.9, 17.9, 21.3, 26.1, 30.4, 33.7, 37.7, 41.5, 45.4, 49.5, 52.5, 56.4, 60.1, 63.8, 66.7, 70.9, 73.9, 77.1, 81.4, 85.1, 88.3, 91.7, 94.7, 98.8, 102.5, 106.0, 10

### Configuration Parameters 



In [5]:
config = dict(
    bootstrap_servers=['127.0.0.1:9092'],
    first_name='ition',
    last_name='Admin'
)

config['client_id'] = '{}{}'.format(
    config['last_name'], 
    config['first_name']
)
config['topic_prefix'] = '{}{}'.format(
    config['last_name'], 
    config['first_name']
)

config

{'bootstrap_servers': ['127.0.0.1:9092'],
 'first_name': 'ition',
 'last_name': 'Admin',
 'client_id': 'Adminition',
 'topic_prefix': 'Adminition'}

### Create Topic Utility Function

The `create_kafka_topic` helps create a Kafka topic based on your configuration settings.  For instance, if your first name is *John* and your last name is *Doe*, `create_kafka_topic('locations')` will create a topic with the name `DoeJohn-locations`.  The function will not create the topic if it already exists. 

In [6]:
def create_kafka_topic(topic_name, config=config, num_partitions=1, replication_factor=1):
    bootstrap_servers = config['bootstrap_servers']
    client_id = config['client_id']
    topic_prefix = config['topic_prefix']
    name = '{}-{}'.format(topic_prefix, topic_name)
    
    admin_client = KafkaAdminClient(
        bootstrap_servers=bootstrap_servers, 
        client_id=client_id
    )
    
    topic = NewTopic(
        name=name,
        num_partitions=num_partitions,
        replication_factor=replication_factor
    )

    topic_list = [topic]
    try:
        admin_client.create_topics(new_topics=topic_list)
        print('Created topic "{}"'.format(name))
    except TopicAlreadyExistsError as e:
        print('Topic "{}" already exists'.format(name))
    
create_kafka_topic('locations')
create_kafka_topic('accelerations')

Topic "Adminition-locations" already exists
Topic "Adminition-accelerations" already exists


## Running the code to delete all topics, completely kills Kafka under Windows using version 2.8.1. It may be fixed in version 3.0 however, Kafka won't run this version under Windows due to an audit issue. I suspect it's related to the vulnerabilites in Log4j. ##

In [8]:
"""
admin_client = KafkaAdminClient(
        bootstrap_servers=config['bootstrap_servers'], 
        client_id=config['client_id']
    )
#//*** List Topics
print(admin_client.list_topics())
admin_client.delete_topics(admin_client.list_topics())

"""
print("Skip this...It Breaks Kafka")

Skip this...It Breaks Kafka


### Kafka Producer

The following code creates a `KafkaProducer` object which you can use to send Python objects that are serialized as JSON.

**Note:** This producer serializes Python objects as JSON. This means that object must be JSON serializable.  As an example, Python `DateTime` values are not JSON serializable and must be converted to a string (e.g. ISO 8601) or a numeric value (e.g. a Unix timestamp) before being sent.

In [7]:
producer = KafkaProducer(
  bootstrap_servers=config['bootstrap_servers'],
  value_serializer=lambda x: json.dumps(x).encode('utf-8')
)

### Send Data Function

The `send_data` function sends a Python object to a Kafka topic. This function adds the `topic_prefix` to the topic so `send_data('locations', data)` sends a JSON serialized message to `DoeJohn-locations`. The function also registers callbacks to let you know if the message has been sent or if an error has occured. 

In [8]:
def on_send_success(record_metadata):
    print('Message sent:\n    Topic: "{}"\n    Partition: {}\n    Offset: {}'.format(
        record_metadata.topic,
        record_metadata.partition,
        record_metadata.offset
    ))
    
def on_send_error(excp):
    print('I am an errback', exc_info=excp)
    # handle exception

def send_data(topic, data, config=config, producer=producer, msg_key=None):
    topic_prefix = config['topic_prefix']
    topic_name = '{}-{}'.format(topic_prefix, topic)
    
    if msg_key is not None:
        key = msg_key
    else:
        key = uuid.uuid4().hex
    
    producer.send(
        topic_name, 
        value=data,
        key=key.encode('utf-8')
    ).add_callback(on_send_success).add_errback(on_send_error)

In [11]:
"""
#//*** Reference on sending topic data
example_data = dict(
    key1='value1',
    key2='value2'
)

send_data('locations', example_data)
"""
print()




**Update the topics at the indicated intervals, Times are managed with an iterable.**

**An endless while loop compares the elapsed time to the current element (time). If the elapsed time is greater than the element, Loop through the data dictionary keys "locations" and "accelerations". Publish each parquet file successively to the Kafka Producer. Then get the next time element.**

**The loop closes when getting the next time fails.**

**The loop waits 100ms between each cycle to avoid over-taxing the CPU.**

In [10]:
import time

#//*** Create an iterable of the times
times = iter(data['accelerations'].keys())


start_time = time.time()
element = next(times)
while True:
    
    #//*** Get the elapsed time
    elapsed_time = time.time()-start_time
    
    #//*** Check if it's time to perform an action
    if element <= elapsed_time:
        try:
            print()
            print("=======================================")
            print("=======================================")
            print("Sending Values at Time:", element)
            print("=======================================")
            print("=======================================")
                
            for topic in ['locations','accelerations']:

                print("Sending: ",topic)
                for filepath in data[topic][element]:
                    send_data(topic, pd.read_parquet(filepath).to_json())
            
            #//*** Get Next Element
            element = next(times)

        except(StopIteration):
            break

    #//*** Sleep for 100ms so we don't crush the CPU while waiting
    time.sleep(.1)
        



Sending Values at Time: 0.0
Sending:  locations
Sending:  accelerations
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 0
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 1
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 0
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 1

Sending Values at Time: 4.5
Sending:  locations
Sending:  accelerations
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 2
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 3
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 2
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 3

Sending Values at Time: 7.8
Sending:  locations
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 4
Sending:  accelerations
Message sent:
    Topic: "Adminition-locations"
    Pa


Sending Values at Time: 45.4
Sending:  locations
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 36
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 37
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 38
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 39
Sending:  accelerations
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 40
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 33
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 34
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 35
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 36
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 37

Sending Values at Time: 49.5
Sending:  locations
Message sent:
    Topic: "Adminition-locations"
    Part

Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 79
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 80
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 81

Sending Values at Time: 60.1
Sending:  locations
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 83
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 84
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 85
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 86
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 87
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 88
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 89
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 90
Message sent:
    Topic: "Adminition-locations"
    Partit

Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 133
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 134
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 135
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 136
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 137
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 138
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 139
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 140
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 141
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 142
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 143
Message sent:
    Topic: "Adminition-accelerations"
    Partition

Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 182
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 183
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 184
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 185
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 186
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 187
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 188
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 189
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 190
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 191
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 192
Message sent:
    Topic: "Adminition-accelerations"
    Partition

    Offset: 236
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 237
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 238
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 239
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 240
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 241
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 242
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 243
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 244
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 245
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 246
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 247
Message sent:
    Topic: "Adminition-acceleration

Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 295
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 296
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 297
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 298
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 299
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 300
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 301
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 302
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 303
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 304
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 305
Message sent:
    Topic: "Adminition-accelerations"
    Partition

Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 358
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 359
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 360
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 361
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 362

Sending Values at Time: 102.5
Sending:  locations
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 354
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 355
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 356
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 357
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 358
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 359
Message sent:
    Topic: "Adminition-l


Sending Values at Time: 121.4
Sending:  locations
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 395
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 396
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 397
Sending:  accelerations
Message sent:
    Topic: "Adminition-locations"
    Partition: 0
    Offset: 398
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 405
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 406
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 407
Message sent:
    Topic: "Adminition-accelerations"
    Partition: 0
    Offset: 408


In [11]:

res = os.system(str(kafka_dir)+"\stop_zookeeper-server-stop.bat")
print("Stop Running Zookeeper Server: ", res)


res = os.system(str(kafka_dir)+"\stop_kafka-server-stop.bat")
print("Stop Running Kafka Server: ", res)

time.sleep(3)

print("Stop the Servers")



Stop Running Zookeeper Server:  0
Stop Running Kafka Server:  0
Stop the Servers
