In [None]:
%%scala
%%spark --start
SparkSession.builder

<link rel="stylesheet" href="https://doc.splicemachine.com/zeppelin/css/zepstyles.css" />

# Setting Up Kafka

In this notebook, we create a Kafka topic then start a Kafka producer. The producer reads data from a csv file that contains one message per line.

Each message contains comma separated values of data that map to various tables that we created in our *Setting Up the Database* notebook:

<table class="splicezepOddEven">
    <col />
    <col />
    <thead>
        <tr>
            <th>Message Type</th>
            <th>Description</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Items</td>
            <td>Have <code>ID, Serial Number, CreatedTime,</code> and <code>UPCcode</code> fields.</td>
        </tr>
        <tr>
            <td>ItemFlow Events</td>
            <td><p>These occur at high frequency: 1000's per second. They are ingested into the database.</p>
                <p>An event occurs when an <code>Item</code> moves from a <code>Warehouse</code>, arrives at a <code>Store</code>, and is seen at a door, in a dressing room, or at a point-of-sale terminal.</p>
                <p class="noteNote">There are 600,000 ItemFlow events in this demo.</p>
            </td>
        </tr>
   </tbody>
</table>

There are multiple warehouses and stores; all of the location coordinates are in a geographic region east of London.

## Setting up Kafka Variables

Before proceeding, make sure that the Kafka server is already running.

We'll first assign values to variables that we'll use when creating our Kafka topic:

* topic name
* ZooKeeper URL
* Broker URL
* Data file name

Replace the Zookeeper and Broker URL values in the next cell before running it:

In [None]:
%%scala 
z.put("topicname", "iotdemo")
z.put("zookeeper", "zookeeper-0-node.{FRAMEWORKNAME}.mesos:2182")
z.put("brokers", "kafka-0-node.{FRAMEWORKNAME}.mesos:9092")


## Create the Kafka Topic

To create the Kafka topic, we:

1. Specify parameters for the queue, including session timeout, connection timeout, number of partitions, and replication factor.
2. Create the ZooKeeper client.
3. Invoke `AdminUtils` to create the topic.

If you've previously run this code and the topic already exists, you'll see an error message; otherwise, the topic has been successfully created.

In [None]:
%%scala 
import java.util.Properties
import kafka.admin.AdminUtils
import kafka.utils.ZkUtils

//Properties for zookeeper client
val sessionTimeoutMs = 10000
val connectionTimeoutMs = 10000

//Properties for Kafak Queue
val topicName=z.get("topicname").toString
val numPartitions = 10
val replicationFactor = 1

// Create a ZooKeeper client
val zkUtils = ZkUtils.apply(z.get("zookeeper").toString, sessionTimeoutMs, connectionTimeoutMs,
    false)
    

// Create  topic
val topicConfig = new Properties
AdminUtils.createTopic(zkUtils, topicName, numPartitions, replicationFactor, topicConfig)


## Create Kafka Producer and Add Data to Queue

Make sure that the `filename` value in the next cell is set to the name of the data file containing the messages that you want *produced*. For example,the cell is currently set to read and enqueue the `ItemFlow` values.

In [None]:
%%scala 
import org.apache.commons.io.IOUtils
import java.net.URL
import java.util.Properties
import java.nio.charset.Charset
import scala.io.Source
import java.io.{FileReader, FileNotFoundException, IOException}
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerConfig, ProducerRecord}
import org.apache.kafka.common.serialization.StringSerializer


//Properties
val brokers = z.get("brokers")
val topic = z.get("topicname").toString

val messagesPerSec=1000
val pauseBetweenMessages = 500

//val filename = "https://s3.amazonaws.com/splice-demo/iot/itemflow_small.csv"
val filename = "https://s3.amazonaws.com/splice-demo/iot/itemflow_200k.csv"
//val filename = "https://s3.amazonaws.com/splice-demo/iot/itemflow_600k.csv"
     
//Add properties
val props =new Properties
props.put("bootstrap.servers", brokers)
props.put("acks", "all")
props.put("retries",new Integer( 0))
props.put("batch.size",new Integer( 16384))
props.put("linger.ms",new Integer( 1))
props.put("buffer.memory", new Integer(33554432))
props.put("key.serializer",
        "org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer",
        "org.apache.kafka.common.serialization.StringSerializer")
    
//Create Kafka producer
val producer = new KafkaProducer[String, String](props)
    
    
//Read data file
val s3fileData = sc.parallelize(
    IOUtils.toString(new URL( filename), Charset.forName("utf8")).split("\n"))
    

//Put each line from file onto Queue in batchs specified by properties
var i = 0
s3fileData.collect().foreach(line =>  {
        val message =  new ProducerRecord[String, String](topic, null, line)
        producer.send(message)
        i= i+1;
        if (i >= messagesPerSec) {
            i = 0;
            Thread.sleep(pauseBetweenMessages)
         }
   }
)
    
println ("DONE")

## Where to Go Next

Now we're ready to [Stream the data into Splice Machine using Spark streaming](./6.4%20Using%20Spark%20Streaming.ipynb).