## Streaming Twitter Sentiment Prediction
Goal: Making twitter sentiment prediction to first train a Naive Bayes model of twitter sentiment prediction using a given labled dataset at Kaggle.

### Data Download and Exploration

In [0]:
%%bash
rm -f twitter1.6m.zip
wget -nv http://idsdl.csom.umn.edu/c/share/msba6330/twitter1.6m.zip
yes | unzip twitter1.6m.zip

Archive:  twitter1.6m.zip
  inflating: training.1600000.processed.noemoticon.csv  
2023-03-07 23:02:28 URL:http://idsdl.csom.umn.edu/c/share/msba6330/twitter1.6m.zip [84855679/84855679] -> "twitter1.6m.zip" [1]


In [0]:
%%bash
head training.1600000.processed.noemoticon.csv

"0","1467810369","Mon Apr 06 22:19:45 PDT 2009","NO_QUERY","_TheSpecialOne_","@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer.  You shoulda got David Carr of Third Day to do it. ;D"
"0","1467810672","Mon Apr 06 22:19:49 PDT 2009","NO_QUERY","scotthamilton","is upset that he can't update his Facebook by texting it... and might cry as a result  School today also. Blah!"
"0","1467810917","Mon Apr 06 22:19:53 PDT 2009","NO_QUERY","mattycus","@Kenichan I dived many times for the ball. Managed to save 50%  The rest go out of bounds"
"0","1467811184","Mon Apr 06 22:19:57 PDT 2009","NO_QUERY","ElleCTF","my whole body feels itchy and like its on fire "
"0","1467811193","Mon Apr 06 22:19:57 PDT 2009","NO_QUERY","Karoli","@nationwideclass no, it's not behaving at all. i'm mad. why am i here? because I can't see you all over there. "
"0","1467811372","Mon Apr 06 22:20:00 PDT 2009","NO_QUERY","joy_wolf","@Kwesidei not the whole crew "
"0","1467811592","Mon Apr 06 22:20:03 PDT 2009","NO

In [0]:
schema = "target integer,id long,date string,flag string,user string,text string"
data = spark.read.csv("file:/databricks/driver/training.1600000.processed.noemoticon.csv", schema =schema)

In [0]:
data.limit(10).display()

target,id,date,flag,user,text
0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer. You shoulda got David Carr of Third Day to do it. ;D"
0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by texting it... and might cry as a result School today also. Blah!
0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Managed to save 50% The rest go out of bounds
0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire
0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all. i'm mad. why am i here? because I can't see you all over there."
0,1467811372,Mon Apr 06 22:20:00 PDT 2009,NO_QUERY,joy_wolf,@Kwesidei not the whole crew
0,1467811592,Mon Apr 06 22:20:03 PDT 2009,NO_QUERY,mybirch,Need a hug
0,1467811594,Mon Apr 06 22:20:03 PDT 2009,NO_QUERY,coZZ,"@LOLTrish hey long time no see! Yes.. Rains a bit ,only a bit LOL , I'm fine thanks , how's you ?"
0,1467811795,Mon Apr 06 22:20:05 PDT 2009,NO_QUERY,2Hood4Hollywood,@Tatiana_K nope they didn't have it
0,1467812025,Mon Apr 06 22:20:09 PDT 2009,NO_QUERY,mimismo,@twittera que me muera ?


In [0]:
data.printSchema()

root
 |-- target: integer (nullable = true)
 |-- id: long (nullable = true)
 |-- date: string (nullable = true)
 |-- flag: string (nullable = true)
 |-- user: string (nullable = true)
 |-- text: string (nullable = true)



### Data Cleaning and Target Variable transformation
- transform the date column into a timestamp column `time`
- transform `target` variable into a binary column
- drop the irrelevant columns

In [0]:
from pyspark.ml.feature import *
from pyspark.ml import Pipeline
from pyspark.sql.functions import col, to_timestamp,substring

data_clean = data.drop('id','flag','user').withColumn('time', to_timestamp(substring(col('date'),5,24), 'MMM dd HH:mm:ss zzz yyyy')).drop('date').cache()

st_cast = SQLTransformer(statement = 'select cast(target as double) as target_double, * from __THIS__')
binarizer = Binarizer(threshold=2.0, inputCol='target_double', outputCol='label')
target_pipeline = Pipeline(stages=[st_cast, binarizer])
data_train = target_pipeline.fit(data_clean).transform(data_clean).drop('target_double','target')

data_train.limit(10).display()

text,time,label
"@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer. You shoulda got David Carr of Third Day to do it. ;D",2009-04-07T05:19:45.000+0000,0.0
is upset that he can't update his Facebook by texting it... and might cry as a result School today also. Blah!,2009-04-07T05:19:49.000+0000,0.0
@Kenichan I dived many times for the ball. Managed to save 50% The rest go out of bounds,2009-04-07T05:19:53.000+0000,0.0
my whole body feels itchy and like its on fire,2009-04-07T05:19:57.000+0000,0.0
"@nationwideclass no, it's not behaving at all. i'm mad. why am i here? because I can't see you all over there.",2009-04-07T05:19:57.000+0000,0.0
@Kwesidei not the whole crew,2009-04-07T05:20:00.000+0000,0.0
Need a hug,2009-04-07T05:20:03.000+0000,0.0
"@LOLTrish hey long time no see! Yes.. Rains a bit ,only a bit LOL , I'm fine thanks , how's you ?",2009-04-07T05:20:03.000+0000,0.0
@Tatiana_K nope they didn't have it,2009-04-07T05:20:05.000+0000,0.0
@twittera que me muera ?,2009-04-07T05:20:09.000+0000,0.0


### Define and fit a ML pipeline containing data preprocessing and model training

In [0]:
from pyspark.ml.classification import NaiveBayes
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
st = SQLTransformer(statement = "select *, regexp_replace(regexp_replace(regexp_replace(lower(text), 'http\\\S+',''),'@\\\w+',''),'#','') as text_cleaned from __THIS__   ")
tokenizer = RegexTokenizer(inputCol = "text_cleaned", outputCol ="words", pattern = "\\W+")
eng_stopwords = StopWordsRemover.loadDefaultStopWords(language="english")

swr = StopWordsRemover(inputCol = "words", outputCol = "words_filtered", stopWords= eng_stopwords)
cv = CountVectorizer(inputCol = "words_filtered", outputCol = "features", minDF = 20)
nb= NaiveBayes(smoothing = 1.0, modelType ="multinomial")
pipeline = Pipeline(stages=[st, tokenizer, swr, cv, nb])

In [0]:
pipeline.fit(data_train).transform(data_train).limit(10).display() # stages=[st, tokenizer, swr, cv]

text,time,label,text_cleaned,words,words_filtered,features,rawPrediction,probability,prediction
"@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer. You shoulda got David Carr of Third Day to do it. ;D",2009-04-07T05:19:45.000+0000,0.0,"- awww, that's a bummer. you shoulda got david carr of third day to do it. ;d","List(awww, that, s, a, bummer, you, shoulda, got, david, carr, of, third, day, to, do, it, d)","List(awww, bummer, shoulda, got, david, carr, third, day, d)","Map(vectorType -> sparse, length -> 22148, indices -> List(2, 11, 72, 349, 737, 1074, 1787, 3378, 9542), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-73.10315816766634, -75.76010317369364))","Map(vectorType -> dense, length -> 2, values -> List(0.9344377541357022, 0.06556224586429775))",0.0
is upset that he can't update his Facebook by texting it... and might cry as a result School today also. Blah!,2009-04-07T05:19:49.000+0000,0.0,is upset that he can't update his facebook by texting it... and might cry as a result school today also. blah!,"List(is, upset, that, he, can, t, update, his, facebook, by, texting, it, and, might, cry, as, a, result, school, today, also, blah)","List(upset, update, facebook, texting, might, cry, result, school, today, also, blah)","Map(vectorType -> sparse, length -> 22148, indices -> List(7, 70, 174, 197, 425, 429, 440, 682, 1018, 1918, 2240), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-84.59444310092665, -89.98160775323247))","Map(vectorType -> dense, length -> 2, values -> List(0.9954459081643332, 0.004554091835666661))",0.0
@Kenichan I dived many times for the ball. Managed to save 50% The rest go out of bounds,2009-04-07T05:19:53.000+0000,0.0,i dived many times for the ball. managed to save 50% the rest go out of bounds,"List(i, dived, many, times, for, the, ball, managed, to, save, 50, the, rest, go, out, of, bounds)","List(dived, many, times, ball, managed, save, 50, rest, go, bounds)","Map(vectorType -> sparse, length -> 22148, indices -> List(5, 216, 256, 370, 800, 983, 1170, 1578), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-63.04531360734018, -63.7797837673144))","Map(vectorType -> dense, length -> 2, values -> List(0.6757854508682208, 0.32421454913177916))",0.0
my whole body feels itchy and like its on fire,2009-04-07T05:19:57.000+0000,0.0,my whole body feels itchy and like its on fire,"List(my, whole, body, feels, itchy, and, like, its, on, fire)","List(whole, body, feels, itchy, like, fire)","Map(vectorType -> sparse, length -> 22148, indices -> List(4, 331, 381, 705, 1043, 2815), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-47.009948334394494, -50.54773661405896))","Map(vectorType -> dense, length -> 2, values -> List(0.9717440469195154, 0.02825595308048461))",0.0
"@nationwideclass no, it's not behaving at all. i'm mad. why am i here? because I can't see you all over there.",2009-04-07T05:19:57.000+0000,0.0,"no, it's not behaving at all. i'm mad. why am i here? because i can't see you all over there.","List(no, it, s, not, behaving, at, all, i, m, mad, why, am, i, here, because, i, can, t, see, you, all, over, there)","List(behaving, m, mad, see)","Map(vectorType -> sparse, length -> 22148, indices -> List(0, 21, 493, 10212), values -> List(1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-30.130604678083436, -31.07548249143339))","Map(vectorType -> dense, length -> 2, values -> List(0.7200838991455994, 0.27991610085440055))",0.0
@Kwesidei not the whole crew,2009-04-07T05:20:00.000+0000,0.0,not the whole crew,"List(not, the, whole, crew)","List(whole, crew)","Map(vectorType -> sparse, length -> 22148, indices -> List(331, 2083), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-17.98391091046203, -17.91489039377489))","Map(vectorType -> dense, length -> 2, values -> List(0.4827517176108538, 0.5172482823891462))",1.0
Need a hug,2009-04-07T05:20:03.000+0000,0.0,need a hug,"List(need, a, hug)","List(need, hug)","Map(vectorType -> sparse, length -> 22148, indices -> List(35, 815), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-14.601725469945839, -15.435554208423282))","Map(vectorType -> dense, length -> 2, values -> List(0.6971638872858877, 0.3028361127141123))",0.0
"@LOLTrish hey long time no see! Yes.. Rains a bit ,only a bit LOL , I'm fine thanks , how's you ?",2009-04-07T05:20:03.000+0000,0.0,"hey long time no see! yes.. rains a bit ,only a bit lol , i'm fine thanks , how's you ?","List(hey, long, time, no, see, yes, rains, a, bit, only, a, bit, lol, i, m, fine, thanks, how, s, you)","List(hey, long, time, see, yes, rains, bit, bit, lol, m, fine, thanks)","Map(vectorType -> sparse, length -> 22148, indices -> List(0, 12, 13, 21, 31, 76, 78, 88, 162, 423, 2559), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-80.42879077895951, -76.07687148586122))","Map(vectorType -> dense, length -> 2, values -> List(0.012718227357656941, 0.9872817726423432))",1.0
@Tatiana_K nope they didn't have it,2009-04-07T05:20:05.000+0000,0.0,nope they didn't have it,"List(nope, they, didn, t, have, it)","List(nope, didn)","Map(vectorType -> sparse, length -> 22148, indices -> List(69, 691), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-14.795096699692145, -16.118580610190293))","Map(vectorType -> dense, length -> 2, values -> List(0.7897607544738096, 0.21023924552619036))",0.0
@twittera que me muera ?,2009-04-07T05:20:09.000+0000,0.0,que me muera ?,"List(que, me, muera)","List(que, muera)","Map(vectorType -> sparse, length -> 22148, indices -> List(2372), values -> List(1.0))","Map(vectorType -> dense, length -> 2, values -> List(-10.504200888860087, -10.589274685072796))","Map(vectorType -> dense, length -> 2, values -> List(0.5212556307070653, 0.4787443692929346))",0.0


### Train the pipeline
- save the resulting model as `pipelineModel`
- use the model to transform the training data `data_clean`
- display sample results

In [0]:
pipelineModel = pipeline.fit(data_train)

In [0]:
results = pipelineModel.transform(data_train)
results.limit(10).display()

text,time,label,text_cleaned,words,words_filtered,features,rawPrediction,probability,prediction
"@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer. You shoulda got David Carr of Third Day to do it. ;D",2009-04-07T05:19:45.000+0000,0.0,"- awww, that's a bummer. you shoulda got david carr of third day to do it. ;d","List(awww, that, s, a, bummer, you, shoulda, got, david, carr, of, third, day, to, do, it, d)","List(awww, bummer, shoulda, got, david, carr, third, day, d)","Map(vectorType -> sparse, length -> 22148, indices -> List(2, 11, 72, 349, 737, 1074, 1788, 3380, 9557), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-73.10315816766634, -75.76010317369364))","Map(vectorType -> dense, length -> 2, values -> List(0.9344377541357022, 0.06556224586429775))",0.0
is upset that he can't update his Facebook by texting it... and might cry as a result School today also. Blah!,2009-04-07T05:19:49.000+0000,0.0,is upset that he can't update his facebook by texting it... and might cry as a result school today also. blah!,"List(is, upset, that, he, can, t, update, his, facebook, by, texting, it, and, might, cry, as, a, result, school, today, also, blah)","List(upset, update, facebook, texting, might, cry, result, school, today, also, blah)","Map(vectorType -> sparse, length -> 22148, indices -> List(7, 70, 174, 197, 425, 429, 440, 682, 1019, 1917, 2240), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-84.59444310092665, -89.98160775323247))","Map(vectorType -> dense, length -> 2, values -> List(0.9954459081643332, 0.004554091835666661))",0.0
@Kenichan I dived many times for the ball. Managed to save 50% The rest go out of bounds,2009-04-07T05:19:53.000+0000,0.0,i dived many times for the ball. managed to save 50% the rest go out of bounds,"List(i, dived, many, times, for, the, ball, managed, to, save, 50, the, rest, go, out, of, bounds)","List(dived, many, times, ball, managed, save, 50, rest, go, bounds)","Map(vectorType -> sparse, length -> 22148, indices -> List(5, 216, 256, 370, 800, 982, 1171, 1578), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-63.04531360734018, -63.7797837673144))","Map(vectorType -> dense, length -> 2, values -> List(0.6757854508682208, 0.32421454913177916))",0.0
my whole body feels itchy and like its on fire,2009-04-07T05:19:57.000+0000,0.0,my whole body feels itchy and like its on fire,"List(my, whole, body, feels, itchy, and, like, its, on, fire)","List(whole, body, feels, itchy, like, fire)","Map(vectorType -> sparse, length -> 22148, indices -> List(4, 331, 381, 705, 1043, 2815), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-47.009948334394494, -50.54773661405896))","Map(vectorType -> dense, length -> 2, values -> List(0.9717440469195154, 0.02825595308048461))",0.0
"@nationwideclass no, it's not behaving at all. i'm mad. why am i here? because I can't see you all over there.",2009-04-07T05:19:57.000+0000,0.0,"no, it's not behaving at all. i'm mad. why am i here? because i can't see you all over there.","List(no, it, s, not, behaving, at, all, i, m, mad, why, am, i, here, because, i, can, t, see, you, all, over, there)","List(behaving, m, mad, see)","Map(vectorType -> sparse, length -> 22148, indices -> List(0, 21, 493, 10187), values -> List(1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-30.130604678083436, -31.07548249143339))","Map(vectorType -> dense, length -> 2, values -> List(0.7200838991455994, 0.27991610085440055))",0.0
@Kwesidei not the whole crew,2009-04-07T05:20:00.000+0000,0.0,not the whole crew,"List(not, the, whole, crew)","List(whole, crew)","Map(vectorType -> sparse, length -> 22148, indices -> List(331, 2084), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-17.98391091046203, -17.91489039377489))","Map(vectorType -> dense, length -> 2, values -> List(0.4827517176108538, 0.5172482823891462))",1.0
Need a hug,2009-04-07T05:20:03.000+0000,0.0,need a hug,"List(need, a, hug)","List(need, hug)","Map(vectorType -> sparse, length -> 22148, indices -> List(35, 815), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-14.601725469945839, -15.435554208423282))","Map(vectorType -> dense, length -> 2, values -> List(0.6971638872858877, 0.3028361127141123))",0.0
"@LOLTrish hey long time no see! Yes.. Rains a bit ,only a bit LOL , I'm fine thanks , how's you ?",2009-04-07T05:20:03.000+0000,0.0,"hey long time no see! yes.. rains a bit ,only a bit lol , i'm fine thanks , how's you ?","List(hey, long, time, no, see, yes, rains, a, bit, only, a, bit, lol, i, m, fine, thanks, how, s, you)","List(hey, long, time, see, yes, rains, bit, bit, lol, m, fine, thanks)","Map(vectorType -> sparse, length -> 22148, indices -> List(0, 12, 13, 21, 31, 76, 78, 88, 162, 423, 2559), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-80.42879077895951, -76.07687148586122))","Map(vectorType -> dense, length -> 2, values -> List(0.012718227357656941, 0.9872817726423432))",1.0
@Tatiana_K nope they didn't have it,2009-04-07T05:20:05.000+0000,0.0,nope they didn't have it,"List(nope, they, didn, t, have, it)","List(nope, didn)","Map(vectorType -> sparse, length -> 22148, indices -> List(69, 691), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-14.795096699692145, -16.118580610190293))","Map(vectorType -> dense, length -> 2, values -> List(0.7897607544738096, 0.21023924552619036))",0.0
@twittera que me muera ?,2009-04-07T05:20:09.000+0000,0.0,que me muera ?,"List(que, me, muera)","List(que, muera)","Map(vectorType -> sparse, length -> 22148, indices -> List(2372), values -> List(1.0))","Map(vectorType -> dense, length -> 2, values -> List(-10.504200888860087, -10.589274685072796))","Map(vectorType -> dense, length -> 2, values -> List(0.5212556307070653, 0.4787443692929346))",0.0


### Obtain the accuracy of the model

In [0]:
e = MulticlassClassificationEvaluator(metricName='accuracy')
e.evaluate(results) # accuracy

Out[17]: 0.77388125

### Save the model

In [0]:
pipelineModel.write().overwrite().save('/FileStore/twitter_nbpipelne/') # dbfs # save for more than one time: overwrite

In [0]:
%fs ls /FileStore/twitter_nbpipelne/

path,name,size,modificationTime
dbfs:/FileStore/twitter_nbpipelne/metadata/,metadata/,0,0
dbfs:/FileStore/twitter_nbpipelne/stages/,stages/,0,0


### Test the model

In [0]:
from pyspark.ml import PipelineModel
PipelineModel2 = PipelineModel.load('/FileStore/twitter_nbpipelne/')

In [0]:
data_sample = data_clean.limit(10)
scored_tweets = pipelineModel.transform(data_sample)
scored_tweets.display()

target,text,time,text_cleaned,words,words_filtered,features,rawPrediction,probability,prediction
0,"@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer. You shoulda got David Carr of Third Day to do it. ;D",2009-04-07T05:19:45.000+0000,"- awww, that's a bummer. you shoulda got david carr of third day to do it. ;d","List(awww, that, s, a, bummer, you, shoulda, got, david, carr, of, third, day, to, do, it, d)","List(awww, bummer, shoulda, got, david, carr, third, day, d)","Map(vectorType -> sparse, length -> 22148, indices -> List(2, 11, 72, 349, 737, 1074, 1788, 3380, 9557), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-73.10315816766634, -75.76010317369364))","Map(vectorType -> dense, length -> 2, values -> List(0.9344377541357022, 0.06556224586429775))",0.0
0,is upset that he can't update his Facebook by texting it... and might cry as a result School today also. Blah!,2009-04-07T05:19:49.000+0000,is upset that he can't update his facebook by texting it... and might cry as a result school today also. blah!,"List(is, upset, that, he, can, t, update, his, facebook, by, texting, it, and, might, cry, as, a, result, school, today, also, blah)","List(upset, update, facebook, texting, might, cry, result, school, today, also, blah)","Map(vectorType -> sparse, length -> 22148, indices -> List(7, 70, 174, 197, 425, 429, 440, 682, 1019, 1917, 2240), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-84.59444310092665, -89.98160775323247))","Map(vectorType -> dense, length -> 2, values -> List(0.9954459081643332, 0.004554091835666661))",0.0
0,@Kenichan I dived many times for the ball. Managed to save 50% The rest go out of bounds,2009-04-07T05:19:53.000+0000,i dived many times for the ball. managed to save 50% the rest go out of bounds,"List(i, dived, many, times, for, the, ball, managed, to, save, 50, the, rest, go, out, of, bounds)","List(dived, many, times, ball, managed, save, 50, rest, go, bounds)","Map(vectorType -> sparse, length -> 22148, indices -> List(5, 216, 256, 370, 800, 982, 1171, 1578), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-63.04531360734018, -63.7797837673144))","Map(vectorType -> dense, length -> 2, values -> List(0.6757854508682208, 0.32421454913177916))",0.0
0,my whole body feels itchy and like its on fire,2009-04-07T05:19:57.000+0000,my whole body feels itchy and like its on fire,"List(my, whole, body, feels, itchy, and, like, its, on, fire)","List(whole, body, feels, itchy, like, fire)","Map(vectorType -> sparse, length -> 22148, indices -> List(4, 331, 381, 705, 1043, 2815), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-47.009948334394494, -50.54773661405896))","Map(vectorType -> dense, length -> 2, values -> List(0.9717440469195154, 0.02825595308048461))",0.0
0,"@nationwideclass no, it's not behaving at all. i'm mad. why am i here? because I can't see you all over there.",2009-04-07T05:19:57.000+0000,"no, it's not behaving at all. i'm mad. why am i here? because i can't see you all over there.","List(no, it, s, not, behaving, at, all, i, m, mad, why, am, i, here, because, i, can, t, see, you, all, over, there)","List(behaving, m, mad, see)","Map(vectorType -> sparse, length -> 22148, indices -> List(0, 21, 493, 10187), values -> List(1.0, 1.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-30.130604678083436, -31.07548249143339))","Map(vectorType -> dense, length -> 2, values -> List(0.7200838991455994, 0.27991610085440055))",0.0
0,@Kwesidei not the whole crew,2009-04-07T05:20:00.000+0000,not the whole crew,"List(not, the, whole, crew)","List(whole, crew)","Map(vectorType -> sparse, length -> 22148, indices -> List(331, 2084), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-17.98391091046203, -17.91489039377489))","Map(vectorType -> dense, length -> 2, values -> List(0.4827517176108538, 0.5172482823891462))",1.0
0,Need a hug,2009-04-07T05:20:03.000+0000,need a hug,"List(need, a, hug)","List(need, hug)","Map(vectorType -> sparse, length -> 22148, indices -> List(35, 815), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-14.601725469945839, -15.435554208423282))","Map(vectorType -> dense, length -> 2, values -> List(0.6971638872858877, 0.3028361127141123))",0.0
0,"@LOLTrish hey long time no see! Yes.. Rains a bit ,only a bit LOL , I'm fine thanks , how's you ?",2009-04-07T05:20:03.000+0000,"hey long time no see! yes.. rains a bit ,only a bit lol , i'm fine thanks , how's you ?","List(hey, long, time, no, see, yes, rains, a, bit, only, a, bit, lol, i, m, fine, thanks, how, s, you)","List(hey, long, time, see, yes, rains, bit, bit, lol, m, fine, thanks)","Map(vectorType -> sparse, length -> 22148, indices -> List(0, 12, 13, 21, 31, 76, 78, 88, 162, 423, 2559), values -> List(1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 2.0, 1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-80.42879077895951, -76.07687148586122))","Map(vectorType -> dense, length -> 2, values -> List(0.012718227357656941, 0.9872817726423432))",1.0
0,@Tatiana_K nope they didn't have it,2009-04-07T05:20:05.000+0000,nope they didn't have it,"List(nope, they, didn, t, have, it)","List(nope, didn)","Map(vectorType -> sparse, length -> 22148, indices -> List(69, 691), values -> List(1.0, 1.0))","Map(vectorType -> dense, length -> 2, values -> List(-14.795096699692145, -16.118580610190293))","Map(vectorType -> dense, length -> 2, values -> List(0.7897607544738096, 0.21023924552619036))",0.0
0,@twittera que me muera ?,2009-04-07T05:20:09.000+0000,que me muera ?,"List(que, me, muera)","List(que, muera)","Map(vectorType -> sparse, length -> 22148, indices -> List(2372), values -> List(1.0))","Map(vectorType -> dense, length -> 2, values -> List(-10.504200888860087, -10.589274685072796))","Map(vectorType -> dense, length -> 2, values -> List(0.5212556307070653, 0.4787443692929346))",0.0
