This notebook reads prediction requests from a MessageHub (kafka) topic and makes predictions.<br>
In a real world application these requests could be put on the topic by a web application that a user is interacting with.<br>
<br>
This notebook prints the predictions to the console.<br>
A future notebook will put the predictions on another MessageHub topic where it can be read by the web application to make recommendations to the user.

In [1]:
%Addjar http://central.maven.org/maven2/org/apache/kafka/kafka-clients/0.9.0.0/kafka-clients-0.9.0.0.jar
%Addjar http://central.maven.org/maven2/org/apache/kafka/kafka_2.10/0.9.0.0/kafka_2.10-0.9.0.0.jar
%Addjar http://central.maven.org/maven2/org/apache/kafka/kafka-log4j-appender/0.9.0.0/kafka-log4j-appender-0.9.0.0.jar
%Addjar https://github.com/ibm-messaging/message-hub-samples/raw/master/java/message-hub-login-library/messagehub.login-1.0.0.jar
%Addjar https://github.com/ibm-messaging/iot-messgehub-spark-samples/releases/download/v0.1/streaming-kafka.jar

Starting download from http://central.maven.org/maven2/org/apache/kafka/kafka-clients/0.9.0.0/kafka-clients-0.9.0.0.jar
Finished download of kafka-clients-0.9.0.0.jar
Starting download from http://central.maven.org/maven2/org/apache/kafka/kafka_2.10/0.9.0.0/kafka_2.10-0.9.0.0.jar
Finished download of kafka_2.10-0.9.0.0.jar
Starting download from http://central.maven.org/maven2/org/apache/kafka/kafka-log4j-appender/0.9.0.0/kafka-log4j-appender-0.9.0.0.jar
Finished download of kafka-log4j-appender-0.9.0.0.jar
Starting download from https://github.com/ibm-messaging/message-hub-samples/raw/master/java/message-hub-login-library/messagehub.login-1.0.0.jar
Finished download of messagehub.login-1.0.0.jar
Starting download from https://github.com/ibm-messaging/iot-messgehub-spark-samples/releases/download/v0.1/streaming-kafka.jar
Finished download of streaming-kafka.jar


**IMPORTANT:** Restart your kernel after running the above cell for the first time.

Read the MessageHub properties that were saved by the previous step.

In [8]:
import java.util.Properties
import java.io.FileInputStream

val prop = new Properties()
prop.load(new FileInputStream("messagehub.properties"))

val bootstrap_servers     = prop.getProperty("bootstrap_servers")
val sasl_username         = prop.getProperty("sasl_username")
val sasl_password         = prop.getProperty("sasl_password")
val messagehub_topic_name = prop.getProperty("messagehub_topic_name")
val api_key               = prop.getProperty("api_key")
val kafka_rest_url        = prop.getProperty("kafka_rest_url")

// set to true to debug
if (false) { 
    println (bootstrap_servers)
    println (sasl_username)
    println (sasl_password)
    println (messagehub_topic_name)
    println (api_key)
    println (kafka_rest_url)
}

Load the model and create a properties object for spark streaming

In [9]:
import scala.collection.mutable.ArrayBuffer
import org.apache.spark.streaming.Duration
import org.apache.spark.streaming.Seconds
import org.apache.spark.streaming.StreamingContext
import com.ibm.cds.spark.samples.config.MessageHubConfig
import com.ibm.cds.spark.samples.dstream.KafkaStreaming.KafkaStreamingContextAdapter
import org.apache.kafka.common.serialization.Deserializer
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.kafka.common.serialization.StringSerializer
import org.apache.kafka.clients.producer.KafkaProducer
import org.apache.kafka.clients.producer.ProducerRecord
import java.util.UUID
import java.util.Properties
import scala.util.Try

import org.apache.spark.mllib.recommendation.ALS
import org.apache.spark.mllib.recommendation.MatrixFactorizationModel
import org.apache.spark.mllib.recommendation.Rating

// load the saved model
val model = MatrixFactorizationModel.load(sc, "./recommender_model/")

// test the model
println( "Prediction for user=1, movie=500 is " + model.predict(1, 500) ) 

val kafkaProps = new MessageHubConfig

kafkaProps.setConfig("bootstrap.servers",   bootstrap_servers)
kafkaProps.setConfig("kafka.user.name",     sasl_username)
kafkaProps.setConfig("kafka.user.password", sasl_password)
kafkaProps.setConfig("kafka.topic",         messagehub_topic_name)
kafkaProps.setConfig("api_key",             api_key)
kafkaProps.setConfig("kafka_rest_url",      kafka_rest_url)
kafkaProps.setConfig("auto.offset.reset",   "earliest") // should this be "smallest"?
kafkaProps.setConfig("group.id",            UUID.randomUUID().toString())

// the topic for responses
val messagehub_response_topic_name = messagehub_topic_name + "_responses" 

kafkaProps.createConfiguration()

val properties = new Properties()
kafkaProps.toImmutableMap.foreach {
    case (key, value) => properties.setProperty (key, value)
}
properties.setProperty(
    "value.serializer", 
    "org.apache.kafka.common.serialization.StringSerializer"
)

// create a producer for sending responses
val kafkaProducer = new KafkaProducer[String, String]( properties )

Prediction for user=1, movie=500 is 3.715163746095887
default location of ssl Trust store is: /usr/local/src/spark160master/ibm-java-x86_64-80/jre/lib/security/cacerts
com/ibm/cds/spark/samples/config/jaas.conf
Registering JaasConfiguration: /gpfs/fs01/user/s30f-65857bea3b733e-39ca506ba762/notebook/tmp/0fa6cpzwWnrLoYQI/jaas.conf


null

Use spark streaming to retrieve the prediction requests and make predictions.<br>
**IMPORTANT** The following code will block - you will need to stop the notebook kernel to quit the code below.<br>
After running the code below,

 1. go back to the previous notebook to send some messages to MessageHub: **STEP 08 (A) - Produce Prediction Requests**
 2. you should see some requests output to the console below
 3. then go to the previous notebook and attempt to consume the responses: **STEP 08 (B) - Consume Prediction Responses** 

In [12]:
val ssc = new StreamingContext( sc, Seconds(2) )

val stream = ssc.createKafkaStream[String, String, StringDeserializer, StringDeserializer](
                     kafkaProps,
                     List(kafkaProps.getConfig("kafka.topic"))
                     )

// let's wrap the predict function with a try catch block
def predict(userId: Int, movieId: Int): Try[Any] = {
    Try(model.predict(userId, movieId))
}

val moviesToRate = stream.
                    filter(_._2.contains(",")).
                    map(_._2.split(","))

moviesToRate.foreachRDD( rdd => {
    for(item <- rdd.collect().toArray) {
        val userId = item(0).toInt
        val movieId = item(1).toInt     
        val prediction = predict(userId, movieId).getOrElse(-1)
        
        // print the prediction responses to the console
        println(s"$userId, $movieId, $prediction")
        
        val producerRecord = new ProducerRecord[String, String](messagehub_response_topic_name, s"$userId, $movieId, $prediction")
        
        // send the prediction responses to MessageHub
        kafkaProducer.send( producerRecord );
    }
})

ssc.start()
ssc.awaitTermination() // you will need to restart the notebook kernel to quit

default location of ssl Trust store is: /usr/local/src/spark160master/ibm-java-x86_64-80/jre/lib/security/cacerts
