### Add spark streaming kafka to Jupyter

We have a set of stock portfolios


In [None]:
import kafka.serializer.StringDecoder

In [None]:
import org.apache.spark._
import org.apache.spark.storage._
import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka._
import com.datastax.spark.connector.streaming._
import com.datastax.spark.connector.writer.WriteConf
import scala.util.Try

### Create Case Classes for the trades and portfolios
- To make life easy

In [None]:
case class Trade (
stock_symbol:String,
exchange:String,
trade_timestamp: String,
price: Float,
quantity: Int)

In [None]:
case class Portfolio (
    name: String,
    stock_symbol: String,
    quantity: Int,
    price: Option[Float],
    value: Option[Float]
  )

### Roll up the portfolios every 5 seconds

In [None]:
// The batch interval sets how we collect data for, before analyzing it in a batch
val BatchInterval = Seconds(5)

 Create a new `StreamingContext`, using the SparkContext and batch interval:

In [None]:
val ssc = new StreamingContext(sc, BatchInterval)

 Create a Kafka stream 
 
 ### Ensure that each node defines the kafka host on every cluster node.  Get the ip address from someone knowlegeable

In [None]:
 val directKafkaStream = KafkaUtils.createDirectStream[
     String, String, StringDecoder, StringDecoder ](
     ssc, Map("metadata.broker.list" ->"kafka:9092"), Set("Trades"))

split the strings coming in and turn it into an instance of  Trade 

In [None]:
val trades = directKafkaStream
  .map{ case (tid, data) 
                => data.split('|') match { case Array(ss,ex,dt,p,q)
                            => Trade(ss,ex,dt,Try(p.toFloat).getOrElse(0F),Try(q.toInt).getOrElse(0))}}

create an RDD for the stock portfolios table

In [None]:
val portfolios = sc.cassandraTable[Portfolio]("stock","portfolios").keyBy[String]("stock_symbol")

For each batch,
- get the *newest* trade for each symbol. Use a reduce for this
- join it to portfolios
- update the price and value for each portfolio item
- save it

In [None]:
trades.foreachRDD ( tradesRDD => tradesRDD.map( t => (t.stock_symbol, (t.trade_timestamp,t.price)))
                                         .reduceByKey( (l,r) => if (r._1 > l._1) r else l)
                                         .join(portfolios)
                                         .map{case (stock_symbol,((tt,price), port))
                                                    => port.copy(price = Some(price),
                                                                 value = Some(port.quantity * price))
                                             }
                                         .saveToCassandra("stock","portfolios")
                  )

In [None]:
ssc.start

In [None]:
%%cql select * from stock.portfolios limit 50

Put this in the terminal below:

```
 watch "echo \"select * from stock.portfolios where name = 'Ehtel Murakami' ;\" | cqlsh node0"
```

In [None]:
%%html <iframe src="/terminals/1" width=1000 height=400/>

### Stop the stream

In [None]:
ssc.stop()