#Twitter + Watson Tone Analyzer sample Notebook Part 1: Loading the data
In this Notebook, we show how to load the custom library generate as part of the Twitter + Watson Tone Analyzer streaming application. Code can be found here: https://github.com/ibm-cds-labs/spark.samples/tree/master/streaming-twitter.
The following code is using a pre-built jar has been posted on the Github project, but you can replace with your own url if needed.

In [1]:
%AddJar https://github.com/ibm-cds-labs/spark.samples/raw/master/dist/streaming-twitter-assembly-1.1.jar -f

Starting download from https://github.com/ibm-cds-labs/spark.samples/raw/master/dist/streaming-twitter-assembly-1.1.jar
Finished download of streaming-twitter-assembly-1.1.jar


##Set up the Twitter and Watson credentials
Please refer to the tutorial for details on how to find the Twitter and Watson credentials, then add the value in the placeholders specified in the code below

In [2]:
val demo = com.ibm.cds.spark.samples.StreamingTwitter

demo.setConfig("twitter4j.oauth.consumerKey","XXXXXXXXXXXXXXXXXX")
demo.setConfig("twitter4j.oauth.consumerSecret","XXXXXXXXXXXXXXXXXX")
demo.setConfig("twitter4j.oauth.accessToken","XXXXXXXXXXXXXXXXXX")
demo.setConfig("twitter4j.oauth.accessTokenSecret","XXXXXXXXXXXXXXXXXX")
demo.setConfig("watson.tone.url","https://gateway.watsonplatform.net/tone-analyzer-experimental/api")
demo.setConfig("watson.tone.password","XXXXXXXXXXXXXXXXXX")
demo.setConfig("watson.tone.username","XXXXXXXXXXXXXXXXXX")

##Start the Spark Stream to collect live tweets
Start a new Twitter Stream that collects the live tweets and enrich them with Sentiment Analysis scores. The stream is run for a duration specified in the second argument of the **startTwitterStreaming** method.
Note: if no duration is specified then the stream will run until the **stopTwitterStreaming** method is called.

In [3]:
import org.apache.spark.streaming._
demo.startTwitterStreaming(sc, Seconds(100))

Twitter stream started
Tweets are collected real-time and analyzed
To stop the streaming and start interacting with the data use: StreamingTwitter.stopTwitterStreaming
Stopping Twitter stream. Please wait this may take a while
Twitter stream stopped
You can now create a sqlContext and DataFrame with 1247 Tweets created. Sample usage: 
val (sqlContext, df) = com.ibm.cds.spark.samples.StreamingTwitter.createTwitterDataFrames(sc)
df.printSchema
sqlContext.sql("select author, text from tweets").show


##Create a SQLContext and a dataframe with all the tweets
Note: this method will register a SparkSQL table called tweets

In [4]:
val (sqlContext, df) = demo.createTwitterDataFrames(sc)

A new table named tweets with 1247 records has been correctly created and can be accessed through the SQLContext variable
Here's the schema for tweets
root
 |-- author: string (nullable = true)
 |-- date: string (nullable = true)
 |-- lang: string (nullable = true)
 |-- text: string (nullable = true)
 |-- lat: double (nullable = true)
 |-- long: double (nullable = true)
 |-- Cheerfulness: double (nullable = true)
 |-- Negative: double (nullable = true)
 |-- Anger: double (nullable = true)
 |-- Analytical: double (nullable = true)
 |-- Confident: double (nullable = true)
 |-- Tentative: double (nullable = true)
 |-- Openness: double (nullable = true)
 |-- Agreeableness: double (nullable = true)
 |-- Conscientiousness: double (nullable = true)



##Execute a SparkSQL query that contains all the data

In [5]:
val fullSet = sqlContext.sql("select * from tweets limit 100000")  //Select all columns
fullSet.show

author            date                 lang text                 lat long Cheerfulness Negative Anger Analytical Confident Tentative Openness          Agreeableness     Conscientiousness
Lizzy Johnson     Sun Sep 27 20:18:... en   @CaylorxShacks an... 0.0 0.0  0.0          100.0    100.0 0.0        0.0       0.0       0.0               1.0               0.0              
26631stwc         Sun Sep 27 20:18:... en   Get Weather Updat... 0.0 0.0  0.0          0.0      0.0   0.0        0.0       0.0       74.0              45.0              96.0             
Ayndrei?          Sun Sep 27 20:18:... en   RT @drycilagan: H... 0.0 0.0  0.0          0.0      0.0   0.0        0.0       0.0       97.0              0.0               68.0             
C.                Sun Sep 27 20:18:... en   RT @denisSDJEM: #... 0.0 0.0  0.0          0.0      0.0   0.0        0.0       100.0     84.0              86.0              6.0              
Jason Brinker     Sun Sep 27 20:18:... en   RT @FirstBaptistJ... 

##Persist the dataset into a parquet file on Object Storage service
The parquet file will be reloaded in IPython Part 2 Notebook

In [6]:
fullSet.saveAsParquetFile("swift://twitter2.spark/tweetsFull5.parquet")

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.


##SparkSQL query example on the data.
Select all the tweets that have Anger score greated than 70%

In [7]:
val angerSet = sqlContext.sql("select author, text, Anger from tweets where Anger > 70")
angerSet.show

author             text                 Anger
Lizzy Johnson      @CaylorxShacks an... 100.0
Pond               RT @CraziestSex: ... 100.0
Mychal Elliott     RT @TheTweetOfGod... 100.0
Kat Willey         Girls gotta look ... 100.0
Big Chan Trill OG? Hoes don't have r... 100.0
FIFA15 Messi Trick @M4DE_DARKf Luis ... 100.0
Courtney Perkins   Why does Lauren t... 100.0
Miami Celebs       We are great writ... 100.0
InfoblazeCentral   @BillClinton blam... 100.0
Dave               Argument from ign... 100.0
Mispooky           @loghainfucker SH... 100.0
Nourelmalah        RT @stillawinner_... 100.0
yella              RT @PoloMylogo_: ... 100.0
cheyenne           RT @LRHASBOYFRIEN... 100.0
anette             dont you hate it ... 100.0
Savage Emily       RT @esterluvzpll:... 100.0
Nicole Williams    The REAL men and ... 100.0
Lexi Babyy         RT @Lowkey: peopl... 100.0
la malinche        @luvhairyguys1 sc... 100.0
? MIGO ?           RT @Lowkey: peopl... 100.0
