# Sensor Data Generator
This notebook serves as a sensor data simulator.
It generates a stream of random sensor readings for a given number of sensors.

The data is produced to a configurable Kafka topic.

### Configuration

In [ ]:
// Kafka
val kafkaBootstrapServer = "172.17.0.2:9092"
val targetTopic = "iot-data"

// File system
val workDir = "/tmp/streaming-with-spark"

// Generator
val sensorCount = 100000

## Schema
We need a schema definition for the sensor data that we are going to generate.

In [ ]:
case class SensorData(sensorId: Int, timestamp: Long, value: Double)
object SensorData {
  import scala.util.Random
  def randomGen(maxId:Int) = {
    SensorData(Random.nextInt(maxId), System.currentTimeMillis, Random.nextDouble())
  }
}

In [ ]:
case class Rate(timestamp: Long, value: Long)

## We use the built-in rate generator as the base stream for our data generator

In [ ]:
val baseStream = sparkSession.readStream.format("rate").option("recordsPerSecond", 100).load()

In [ ]:
val sensorValues = baseStream.as[Rate].map(_ => SensorData.randomGen(sensorCount))

In [ ]:
import org.apache.spark.sql.kafka010._

In [ ]:
val query = sensorValues.writeStream.format("kafka")
  .queryName("kafkaWriter")
  .outputMode("append")
  .option("kafka.bootstrap.servers", kafkaBootstrapServer) // comma-separated list of host:port
  .option("topic", targetTopic)
  .option("checkpointLocation", workDir+"/generator-checkpoint")
  .option("failOnDataLoss", "false") // use this option when testing
  .start()

