# Develop Machine Learning Model to Predict whether Overall Satisfaction Comment in Survey Results is Positive or Negative


## Access Survey Results sourced from the Catalog connection.
Read in the survey results as a Spark DataFrame

In [1]:
# The code was removed by DSX for sharing.

Row(SURVEY_DATE=u'4/05/2016', SURVEY_ID=u'1826587323', DEPARTMENT=u'Customer Care', OVERALL_SATISFACTION_COMMENT=u'Monica was very professional and I got right through to her and hopefully everything will turn out. Okay with the receipt that I requested.', OVERALL_SATISFACTION=7, LEVEL_OF_EFFORT=6, LIKELY_TO_REPURCHASE=7, LIKELY_TO_RECOMMEND=7, AGENT_EMPATHY=7, AGENT_MEETS_NEEDS=7, AGENT_KNOWLEDGE=7, AGENT_SATISFACTION=7)

## Rename columns for readability

In [2]:
survey = (survey.withColumnRenamed('SURVEY_DATE', 'Survey_Date')
          .withColumnRenamed('SURVEY_ID', 'Survey_ID')
          .withColumnRenamed('DEPARTMENT', 'Department')
          .withColumnRenamed('OVERALL_SATISFACTION_COMMENT', 'Overall_Satisfaction_Comment')
          .withColumnRenamed('OVERALL_SATISFACTION', 'Overall_Satisfaction')
          .withColumnRenamed('LEVEL_OF_EFFORT', 'Level_of_Effort')
          .withColumnRenamed('LIKELY_TO_REPURCHASE', 'Likely_to_Repurchase')
          .withColumnRenamed('LIKELY_TO_RECOMMEND', 'Likely_to_Recommend')
          .withColumnRenamed('AGENT_EMPATHY', 'Agent_Empathy')
          .withColumnRenamed('AGENT_MEETS_NEEDS', 'Agent_Meets_Needs')
          .withColumnRenamed('AGENT_KNOWLEDGE', 'Agent_Knowledge')
          .withColumnRenamed('AGENT_SATISFACTION', 'Agent_Satisfaction'))

## Display the survey dataset

In [3]:
import pandas as pd
pd.set_option('display.max_colwidth', -1)
survey.toPandas().head()

Unnamed: 0,Survey_Date,Survey_ID,Department,Overall_Satisfaction_Comment,Overall_Satisfaction,Level_of_Effort,Likely_to_Repurchase,Likely_to_Recommend,Agent_Empathy,Agent_Meets_Needs,Agent_Knowledge,Agent_Satisfaction
0,4/05/2016,1826587323,Customer Care,Monica was very professional and I got right through to her and hopefully everything will turn out. Okay with the receipt that I requested.,7,6,7,7,7,7,7,7
1,4/15/16,1832620041,Repair Services,Very helpful.,7,7,7,7,7,7,7,7
2,4/16/16,1832852478,Repair Services,"But the gentleman I talked to was very very informative. He was extremely helpful and he gave me more than just one option. I mean, I wasn't just locked into Acme's in itself. He gave me a couple other options of which I might take advantage of and the fact that he went far and beyond as far as I'm concerned, you know I buy appreciate the fact that that you have a person like that working for you and that's about it thank you very much.",7,7,7,7,7,7,7,7
3,4/05/2016,1826718680,Repair Services,The agent responded very quickly and it was thorough and seemed to cover everything I needed to have covered all this is my first contact and we'll see how it should be resolved in the bank.,7,7,7,7,6,7,7,7
4,4/16/16,1832852772,Repair Services,The agent that took care of me was very professional and she was quick and got everything taken care of in a very professional quick manner and I appreciate that.,7,7,7,7,7,7,7,7


## Show the schema of the data including data types

In [4]:
survey.printSchema()

root
 |-- Survey_Date: string (nullable = true)
 |-- Survey_ID: string (nullable = true)
 |-- Department: string (nullable = true)
 |-- Overall_Satisfaction_Comment: string (nullable = true)
 |-- Overall_Satisfaction: integer (nullable = true)
 |-- Level_of_Effort: integer (nullable = true)
 |-- Likely_to_Repurchase: integer (nullable = true)
 |-- Likely_to_Recommend: integer (nullable = true)
 |-- Agent_Empathy: integer (nullable = true)
 |-- Agent_Meets_Needs: integer (nullable = true)
 |-- Agent_Knowledge: integer (nullable = true)
 |-- Agent_Satisfaction: integer (nullable = true)



### Dataset Overview - number of rows and columns

In [5]:
print "There are " + str(survey.count()) + " observations in the survey dataset."
print "There are " + str(len(survey.columns)) + " variables in the dataset."



There are 90643 observations in the survey dataset.
There are 12 variables in the dataset.


## Define a user defined function to determine a Satisfaction Rating
### Positive if Overall Satisfaction > 6
### Negative if Overall Satisfaction <= 6

In [6]:
from pyspark.sql.types import StringType
from pyspark.sql.functions import udf
udf = udf(lambda Overall_Satisfaction: 'Positive' if Overall_Satisfaction > 6 else 'Negative', StringType())

## Create a new column for the Satisfaction Rating
### using the user defined function

In [7]:
survey = (survey.withColumn('Satisfaction_Rating', udf(survey['Overall_Satisfaction'])))
survey.select(survey['Overall_Satisfaction'], survey['Satisfaction_Rating']).toPandas().head(10)

Unnamed: 0,Overall_Satisfaction,Satisfaction_Rating
0,7,Positive
1,7,Positive
2,7,Positive
3,6,Negative
4,7,Positive
5,7,Positive
6,1,Negative
7,7,Positive
8,4,Negative
9,7,Positive


## Exploratory Data Analysis

The **Brunel** Visualization Language is a highly succinct and novel language that defines interactive data visualizations based on tabular data. The language is well suited for both data scientists and more aggressive business users. The system interprets the language and produces visualizations using the user's choice of existing lower-level visualization technologies typically used by application engineers such as RAVE or D3.

In [8]:
import brunel
survey_pandas = survey.toPandas()
%brunel data('survey_pandas') stack bar x(Department) y(#count) color(Satisfaction_Rating) label(#count) tooltip(#all) :: width=2200, height=800

<IPython.core.display.Javascript object>

### Interactive query with Spark SQL

In [9]:
# Spark SQL also allow you to use standard SQL
survey.createOrReplaceTempView("survey")
sql = """
SELECT s.Survey_Date, s.Survey_ID, s.Department, s.Overall_Satisfaction_Comment, s.Overall_Satisfaction
FROM survey s
WHERE s.Overall_Satisfaction > 6

"""
spark.sql(sql).toPandas().head()

Unnamed: 0,Survey_Date,Survey_ID,Department,Overall_Satisfaction_Comment,Overall_Satisfaction
0,8/18/16,1910396274,Customer Care,The representative when do that I spoke with was informative gave me all the information I needed. The only improvement that I could recommend not on her behalf is the LENGTH of time it takes to get us request a receipt sent to me. I was told it's taking sent to take 10 days and it was it seems a bit lengthy. So if there's any way that could be improved that's the only thing I could recommend him thank you.,7
1,8/18/16,1910457115,Customer Care,Another customer service rep was really quick and answer my questions and I would you know recommended speaking the same customer service represent having a problem again the end up being no problem everything straightened out and she was quick about answered my questions.,7
2,8/18/16,1910528400,Customer Care,Was very courteous very helpful.,7
3,8/18/16,1910623225,Repair Services,I'm very satisfied. I had a lot of help from the rep even though I couldn't find all of my people work she went above and beyond to get it done and just hope resolve our issue. So I'm very appreciative. Thank you very much.,7
4,8/18/16,1910626225,Sales,Jason was on the phone with me for about 15 minutes trying to find out some products so I do appreciate his help.,7


## Create a label for machine learning - using another user defined function
### label = 0 if Negative
### label = 1 if Positive

In [10]:
from pyspark.sql.types import FloatType
from pyspark.sql.functions import udf
udf = udf(lambda Overall_Satisfaction: 1.0 if Overall_Satisfaction > 6 else 0.0, FloatType())

In [11]:
survey = (survey.withColumn('label', udf(survey['Overall_Satisfaction'])))
survey.select(survey['Overall_Satisfaction'], survey['Satisfaction_Rating'], survey['label']).toPandas().head(10)

Unnamed: 0,Overall_Satisfaction,Satisfaction_Rating,label
0,7,Positive,1
1,7,Positive,1
2,7,Positive,1
3,7,Positive,1
4,7,Positive,1
5,7,Positive,1
6,7,Positive,1
7,7,Positive,1
8,7,Positive,1
9,7,Positive,1


## Tokenize the Overall Satisfaction Comment

In [12]:
from pyspark.ml.feature import Tokenizer, RegexTokenizer
from pyspark.sql.functions import col, udf
from pyspark.sql.types import IntegerType

tokenizer = Tokenizer(inputCol="Overall_Satisfaction_Comment", outputCol="words")

countTokens = udf(lambda words: len(words), IntegerType())

tokenized = tokenizer.transform(survey)
(tokenized.select("Overall_Satisfaction_Comment", "words")
    .withColumn("#tokens", countTokens(col("words"))).toPandas().head())

Unnamed: 0,Overall_Satisfaction_Comment,words,#tokens
0,Monica was very professional and I got right through to her and hopefully everything will turn out. Okay with the receipt that I requested.,"[monica, was, very, professional, and, i, got, right, through, to, her, and, hopefully, everything, will, turn, out., okay, with, the, receipt, that, i, requested.]",24
1,Very helpful.,"[very, helpful.]",2
2,"But the gentleman I talked to was very very informative. He was extremely helpful and he gave me more than just one option. I mean, I wasn't just locked into Acme's in itself. He gave me a couple other options of which I might take advantage of and the fact that he went far and beyond as far as I'm concerned, you know I buy appreciate the fact that that you have a person like that working for you and that's about it thank you very much.","[but, the, gentleman, i, talked, to, was, very, very, informative., he, was, extremely, helpful, and, he, gave, me, more, than, just, one, option., i, mean,, i, wasn't, just, locked, into, acme's, in, itself., he, gave, me, a, couple, other, options, of, which, i, might, take, advantage, of, and, the, fact, that, he, went, far, and, beyond, as, far, as, i'm, concerned,, you, know, i, buy, appreciate, the, fact, that, that, you, have, a, person, like, that, working, for, you, and, that's, about, it, thank, you, very, much.]",87
3,The agent responded very quickly and it was thorough and seemed to cover everything I needed to have covered all this is my first contact and we'll see how it should be resolved in the bank.,"[the, agent, responded, very, quickly, and, it, was, thorough, and, seemed, to, cover, everything, i, needed, to, have, covered, all, this, is, my, first, contact, and, we'll, see, how, it, should, be, resolved, in, the, bank.]",36
4,The agent that took care of me was very professional and she was quick and got everything taken care of in a very professional quick manner and I appreciate that.,"[the, agent, that, took, care, of, me, was, very, professional, and, she, was, quick, and, got, everything, taken, care, of, in, a, very, professional, quick, manner, and, i, appreciate, that.]",30


## Remove common words

In [13]:
from pyspark.ml.feature import StopWordsRemover

remover = StopWordsRemover(inputCol="words", outputCol="filtered").setCaseSensitive(False)
removed = remover.transform(tokenized)
removed.select("Overall_Satisfaction_Comment", "words", "filtered" ).toPandas().head()

Unnamed: 0,Overall_Satisfaction_Comment,words,filtered
0,"Yes my. Survey for Ashley, I would like to leave a comment that she was very helpful as to what I called about your agents are very helpful at least the last two that I spoke with the first one was Jordan and actually was able to let me know who I spoke with before her first name because I wasn't sure I didn't push the pound key after I left the a survey for Jordan, but she was awesome and she went above and beyond to help me and I just wanted to let you know that and fact they were both very nice very very helpful. Thank you.","[yes, my., survey, for, ashley,, i, would, like, to, leave, a, comment, that, she, was, very, helpful, as, to, what, i, called, about, your, agents, are, very, helpful, at, least, the, last, two, that, i, spoke, with, the, first, one, was, jordan, and, actually, was, able, to, let, me, know, who, i, spoke, with, before, her, first, name, because, i, wasn't, sure, i, didn't, push, the, pound, key, after, i, left, the, a, survey, for, jordan,, but, she, was, awesome, and, she, went, above, and, beyond, to, help, me, and, i, just, wanted, to, let, you, know, that, and, fact, ...]","[yes, my., survey, ashley,, would, like, leave, comment, helpful, called, agents, helpful, least, last, two, spoke, first, one, jordan, actually, able, let, know, spoke, first, name, wasn't, sure, didn't, push, pound, key, left, survey, jordan,, awesome, went, beyond, help, wanted, let, know, fact, nice, helpful., thank, you.]"
1,"No one has exhibited good behaviour or customer service and our situation. No one has been following through saying they would call us back and furthermore, when we have purchased an item that was never delivered we should be able to get our money back much quicker than waiting for a check to clear they should be following through calling the bank and seeing if the funds were there and giving us our money and we never received our product and we have had nothing but hassles and hard times trying to communicate with our local store ever since this started on the fourth of July.","[no, one, has, exhibited, good, behaviour, or, customer, service, and, our, situation., no, one, has, been, following, through, saying, they, would, call, us, back, and, furthermore,, when, we, have, purchased, an, item, that, was, never, delivered, we, should, be, able, to, get, our, money, back, much, quicker, than, waiting, for, a, check, to, clear, they, should, be, following, through, calling, the, bank, and, seeing, if, the, funds, were, there, and, giving, us, our, money, and, we, never, received, our, product, and, we, have, had, nothing, but, hassles, and, hard, times, trying, to, communicate, with, our, local, store, ever, since, this, ...]","[one, exhibited, good, behaviour, customer, service, situation., one, following, saying, would, call, us, back, furthermore,, purchased, item, never, delivered, able, get, money, back, much, quicker, waiting, check, clear, following, calling, bank, seeing, funds, giving, us, money, never, received, product, nothing, hassles, hard, times, trying, communicate, local, store, ever, since, started, fourth, july.]"
2,"Answered all questions completely and re, iterated information several times.","[answered, all, questions, completely, and, re,, iterated, information, several, times.]","[answered, questions, completely, re,, iterated, information, several, times.]"
3,"Satisfied with Acme's, I'm not satisfied with my Whirlpool for door refrigerator and ice maker ice maker has continually miss functions a malfunction with stopped making ice the been goes 80 and I get it repaired. I have no ice other than that refrigerator is fine. Thank you.","[satisfied, with, acme's,, i'm, not, satisfied, with, my, whirlpool, for, door, refrigerator, and, ice, maker, ice, maker, has, continually, miss, functions, a, malfunction, with, stopped, making, ice, the, been, goes, 80, and, i, get, it, repaired., i, have, no, ice, other, than, that, refrigerator, is, fine., thank, you.]","[satisfied, acme's,, i'm, satisfied, whirlpool, door, refrigerator, ice, maker, ice, maker, continually, miss, functions, malfunction, stopped, making, ice, goes, 80, get, repaired., ice, refrigerator, fine., thank, you.]"
4,"Your representative did well my difficulty is with the lack of tools that you provide to her in order to be able to locate my information she was able to find my address based on my phone number, but couldn't find any information about the mower which I have had and to Acme's for service before. I hope y'all have a wonderful day.","[your, representative, did, well, my, difficulty, is, with, the, lack, of, tools, that, you, provide, to, her, in, order, to, be, able, to, locate, my, information, she, was, able, to, find, my, address, based, on, my, phone, number,, but, couldn't, find, any, information, about, the, mower, which, i, have, had, and, to, acme's, for, service, before., i, hope, y'all, have, a, wonderful, day.]","[representative, well, difficulty, lack, tools, provide, order, able, locate, information, able, find, address, based, phone, number,, couldn't, find, information, mower, acme's, service, before., hope, y'all, wonderful, day.]"


## Show list of common words removed

In [14]:
from __future__ import print_function
[print(x) for x in remover.getStopWords()]

i
me
my
myself
we
our
ours
ourselves
you
your
yours
yourself
yourselves
he
him
his
himself
she
her
hers
herself
it
its
itself
they
them
their
theirs
themselves
what
which
who
whom
this
that
these
those
am
is
are
was
were
be
been
being
have
has
had
having
do
does
did
doing
a
an
the
and
but
if
or
because
as
until
while
of
at
by
for
with
about
against
between
into
through
during
before
after
above
below
to
from
up
down
in
out
on
off
over
under
again
further
then
once
here
there
when
where
why
how
all
any
both
each
few
more
most
other
some
such
no
nor
not
only
own
same
so
than
too
very
s
t
can
will
just
don
should
now
d
ll
m
o
re
ve
y
ain
aren
couldn
didn
doesn
hadn
hasn
haven
isn
ma
mightn
mustn
needn
shan
shouldn
wasn
weren
won
wouldn


[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,

## Hash the words and inverse weight words that occur frequently across all comments

In [15]:
from pyspark.ml.feature import HashingTF, IDF

hashingTF = HashingTF(inputCol="filtered", outputCol="rawFeatures", numFeatures=100)
featurizedData = hashingTF.transform(removed)

idf = IDF(inputCol="rawFeatures", outputCol="features")
idfModel = idf.fit(featurizedData)
rescaledData = idfModel.transform(featurizedData)

rescaledData.select("Satisfaction_Rating", "rawFeatures", "features").toPandas().head()

Unnamed: 0,Satisfaction_Rating,rawFeatures,features
0,Positive,"(0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 1.0, 4.0, 0.0, 1.0, 3.0, 0.0, 1.0, 2.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 3.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 3.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 2.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 1.0, 3.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 4.0, 0.0, 2.0, 0.0, 1.0, 0.0, 2.0, 0.0, 0.0, 1.0, 1.0, 3.0, 2.0, 2.0, 0.0, 0.0, 5.0, 0.0, 3.0, 2.0, 0.0, 1.0, 1.0)","(0.0, 0.0, 0.0, 0.0, 2.05176857893, 0.0, 0.0, 0.0, 0.0, 0.0, 1.87070564646, 2.53063896361, 1.63569417794, 3.60435481084, 0.0, 0.951906019996, 4.05441724409, 0.0, 1.72248421407, 3.89045277669, 1.13136791235, 0.0, 0.731561546642, 0.0, 1.27878708472, 1.22373792295, 0.0, 1.4183548558, 1.92505763746, 0.0, 3.38938804444, 1.99522889383, 1.97284618573, 1.89371317679, 0.0, 2.06008116422, 0.0, 3.86203254174, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.1775606525, 0.0, 1.5295242046, 2.93352011617, 1.3448882735, 0.0, 1.7660352084, 0.0, 0.0, 1.07741409272, 0.0, 0.0, 0.0, 1.37807616022, 0.0, 2.29716055744, 1.46752557173, 3.69766059473, 0.0, 0.0, 0.0, 3.70640989097, 1.90188249545, 0.0, 2.92312106586, 0.0, 0.0, 0.0, 1.97094398971, 0.0, 1.60420033967, 0.0, 0.0, 5.81194294247, 0.0, 3.07407646571, 0.0, 2.49818894532, 0.0, 3.10676910344, 0.0, 0.0, 1.42580989731, 2.83810149066, 4.24346758183, 2.98771766897, 2.7830302998, 0.0, 0.0, 7.95802204945, 0.0, 4.44203480335, 3.3365414468, 0.0, 1.38184726751, 1.91385156374)"
1,Positive,"(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)","(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.70996333608, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.22373792295, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.94569237147, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.355121305, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.51170759786, 0.0, 0.0, 0.0, 1.46156053293, 0.0, 1.47150610597, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.70929262479, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)"
2,Positive,"(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)","(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.42553449108, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.951906019996, 0.0, 0.0, 0.0, 0.0, 0.0, 1.46747770996, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.1775606525, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 3.87033512713, 0.0, 0.0, 0.0, 1.94006915873, 0.0, 0.0, 1.14858027872, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.26793363502, 0.0, 0.0, 0.0, 1.86076494899, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.70929262479, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.42580989731, 0.0, 0.0, 0.0, 0.0, 0.0, 1.87060039282, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)"
3,Positive,"(0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)","(0.0, 0.0, 0.0, 0.0, 1.02588428947, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.731561546642, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.68562038554, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.96344977249, 0.0, 0.0, 0.0, 0.0, 0.0, 1.42580989731, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)"
4,Positive,"(0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)","(0.0, 0.0, 2.31350010064, 1.63569417794, 0.0, 1.80732502409, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.94522638834, 0.0, 0.0, 0.0, 1.82372883528, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.78663164787, 0.0, 0.0, 1.28734418058, 0.0, 0.0, 0.0, 1.52322944805, 0.0, 0.0, 1.1775606525, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.94006915873, 0.0, 0.0, 0.0, 0.0, 0.0, 1.69645330113, 1.10131882965, 1.51170759786, 0.0, 0.0, 0.0, 0.0, 0.0, 1.47150610597, 3.72152989798, 0.0, 0.0, 1.60420033967, 0.0, 0.0, 0.0, 0.0, 1.53703823285, 3.92689954497, 0.0, 2.21531449361, 0.0, 0.0, 0.0, 1.42580989731, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.59160440989, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)"


## Use Logistic Regression Algorithm to predict if comment is positive or negative

In [16]:
from pyspark.ml.classification import LogisticRegression
lr = LogisticRegression(labelCol = "label", maxIter=10, regParam=0.3, threshold=0.5)

## Define machine learning pipeline

In [17]:
stages = [tokenizer, remover, hashingTF, idf, lr]
from pyspark.ml import Pipeline
pipeline = Pipeline(stages = stages)

In [18]:
print("Tokenizer:")
print(tokenizer.explainParams())
print("*************************")
print("Remover:")
print(remover.explainParams())
print("*************************")
print("HashingTF:")
print(hashingTF.explainParams())
print("*************************")
print("IDF:")
print(idf.explainParams())
print("*************************")
print("LogisticRegression:")
print(lr.explainParams())
print("*************************")
print("Pipeline:")
print(pipeline.explainParams())

Tokenizer:
inputCol: input column name. (current: Overall_Satisfaction_Comment)
outputCol: output column name. (default: Tokenizer_44b4b3b72559e17157ae__output, current: words)
*************************
Remover:
caseSensitive: whether to do a case sensitive comparison over the stop words (default: False, current: False)
inputCol: input column name. (current: words)
outputCol: output column name. (default: StopWordsRemover_4656ab27e10481792109__output, current: filtered)
stopWords: The words to be filtered out (default: [u'i', u'me', u'my', u'myself', u'we', u'our', u'ours', u'ourselves', u'you', u'your', u'yours', u'yourself', u'yourselves', u'he', u'him', u'his', u'himself', u'she', u'her', u'hers', u'herself', u'it', u'its', u'itself', u'they', u'them', u'their', u'theirs', u'themselves', u'what', u'which', u'who', u'whom', u'this', u'that', u'these', u'those', u'am', u'is', u'are', u'was', u'were', u'be', u'been', u'being', u'have', u'has', u'had', u'having', u'do', u'does', u'did',

## Split the survey dataset into training and test datasets

In [19]:
train, test = survey.randomSplit([90.0,10.0], seed=1)
print('The number of records in the traininig data set is {}.'.format(train.count()))
print('The number of rows labeled Positive in the training data set is {}.'.format(train.filter(train['label'] == 1).count()))
print('The number of rows labeled Negative in the training data set is {}.'.format(train.filter(train['label'] == 0).count()))
print('The number of records in the test data set is {}.'.format(test.count()))
print('The number of rows labeled Positive in the test data set is {}.'.format(test.filter(train['label'] == 1).count()))
print('The number of rows labeled Negative in the test data set is {}.'.format(test.filter(train['label'] == 0).count()))

The number of records in the traininig data set is 81520.
The number of rows labeled Positive in the training data set is 58042.
The number of rows labeled Negative in the training data set is 23478.
The number of records in the test data set is 9123.
The number of rows labeled Positive in the test data set is 6515.
The number of rows labeled Negative in the test data set is 2608.


## Train the model using the training dataset

In [20]:
model = pipeline.fit(train)

## Make predictions using the test dataset

In [21]:
predictions = model.transform(test)

In [22]:
predictions.select("Overall_Satisfaction", "label", "prediction", "probability").sample(False,0.0007,15).toPandas().head()

Unnamed: 0,Overall_Satisfaction,label,prediction,probability
0,7,1,1,"[0.319421818704, 0.680578181296]"
1,5,0,0,"[0.792822521581, 0.207177478419]"
2,7,1,1,"[0.190910072764, 0.809089927236]"
3,7,1,1,"[0.165103265742, 0.834896734258]"
4,2,0,1,"[0.441119121659, 0.558880878341]"


## Show true positive examples

In [23]:
truePositives = (predictions.select("Overall_Satisfaction_Comment", "Overall_Satisfaction", "label", "prediction", "probability")
    .filter(predictions['label']==1)
    .filter(predictions['prediction']==1))
print('Number of true positives = {} out of {} surveys.'.format(truePositives.count(), test.count()))
truePositives.toPandas().head()

Number of true positives = 6343 out of 9123 surveys.


Unnamed: 0,Overall_Satisfaction_Comment,Overall_Satisfaction,label,prediction,probability
0,"Yes, the agent was very friendly and helpful and was able to answer all my questions without difficulty.",7,1,1,"[0.227998549999, 0.772001450001]"
1,"I have no improvement. She said, yes, the customer service agent was professional and courteous and patient very good experience. Thank you.",7,1,1,"[0.181663389527, 0.818336610473]"
2,"Hi, my name is Janet rather perky and not have an order number and I get bits to you just to be able to locate the customer service rep that I spoke with my order numbers 326500369 the lady that I spoke with said, she was located in Charlotte, North Carolina area. Her name was her name was Maurice and that's it. But you are the day. I just wanted to let you know that this young lady was above and beyond the call of duty as far as helping me with my issue. So my order that I had I just spoke with her maybe 45 minutes ago and she and she was just super good and it gives. If you could I would appreciate if she could be acknowledged for doing such a good job and I will always used Acme's to shop at online when I shop online because of her I mean, I was so impressed she had great customer service skills and she was just a pleasure to speak with please stick to it that should get some kind of recognition for her service that she gave me again her name was and may you are the a and she was doing she are late somewhere close to Charlotte and North Carolina I greatly.",7,1,1,"[0.267712102306, 0.732287897694]"
3,The agent I dealt with on line over the phone was amazing. He was patient he walked me through everything and you need everybody to file it is customer service plan.,7,1,1,"[0.294587786908, 0.705412213092]"
4,I thought we had bought a patio table that some responses said was dangerous because it implode is I was did not yet but I wanted to make a Acme's aware and we had very good conversation over the telephone with the person at Acme's and we are hoping our cable does not explode with thank you very much. This is my cable said Tallahassee Florida by.,7,1,1,"[0.21266966485, 0.78733033515]"


## Show false positive examples

In [24]:
falsePositives = (predictions.select("Overall_Satisfaction_Comment", "Overall_Satisfaction", "label", "prediction", "probability")
    .filter(predictions['label']==0)
    .filter(predictions['prediction']==1))
print('Number of false positives = {} out of {} surveys.'.format(falsePositives.count(), test.count()))
falsePositives.toPandas().head()

Number of false positives = 2207 out of 9123 surveys.


Unnamed: 0,Overall_Satisfaction_Comment,Overall_Satisfaction,label,prediction,probability
0,Your agent was very thorough.,6,0,1,"[0.236934535732, 0.763065464268]"
1,The Acme's protection plus with or fine. If these service company tech that seems to be dropping the ball and not getting anything dollars and giving me conflicting stories on on getting this issue resolved as far as parts and the service on the washer machine. Thank you.,1,0,1,"[0.339031462052, 0.660968537948]"
2,The main the main problem. I have is having to negotiate some phone tree everything else. I was really satisfied. Thank you.,6,0,1,"[0.246837696195, 0.753162303805]"
3,"Yes, unclear to spend on my name several difficult for your let the parties know, what was going on with the merchandise like I said I had to call you back I have to call you should be what was going on there could have been family emergency some type of phone. She was going on not knowing that a washer and dryer is coming between five and seven that's dug their.",1,0,1,"[0.459426620914, 0.540573379086]"
4,Your agent that helped me and when I called the Acme's direct line was great did a lot of beginning to figuring out where my order was why it wasn't being delivered and reassured me that they were making sure the store tomorrow so really appreciate all his efforts on my negative feedback is solely on this third party delivery. Company that has just three times in a row just been awful to work with and I am at the point where I'm just not going to order from Acme's anymore. So I wish it was different but if you can't deliver the product. I mean a reflection on you guys. But you know your your agents have been very helpful on and I do appreciate all of her hard work. Thank you.,1,0,1,"[0.242141091889, 0.757858908111]"


## Show true negative examples

In [25]:
trueNegatives = (predictions.select("Overall_Satisfaction_Comment", "Overall_Satisfaction", "label", "prediction", "probability")
    .filter(predictions['label']==0)
    .filter(predictions['prediction']==0))
print('Number of true negatives = {} out of {} surveys.'.format(trueNegatives.count(), test.count()))
trueNegatives.toPandas().head()

Number of true negatives = 401 out of 9123 surveys.


Unnamed: 0,Overall_Satisfaction_Comment,Overall_Satisfaction,label,prediction,probability
0,The woman who handled the phone call was very professional had all of her numbers material information altogether one place. She executed the call and the order very quickly. Unfortunately again. It was matter that you know that they need to be calling the call center rather than actually find the product at your brick and mortar store was a little discouraging and makes me wonder why I even said to put in your store in the first place. She was ordered online from the beginning or called your call center actually because online was charging me for out for shipping when there wasn't supposed to be so like I say in the end good results with me a good and resolved but the past to get it here was more complicated and it's been.,6,0,0,"[0.549440127311, 0.450559872689]"
1,If you promise someone delivery on a certain day at a certain time and you don't show up and you don't give a call why you're not showing up and you to leave the order from the routing. Don't even give us a call and an hour after the two hour window of six to 8 PM 9 PM we give a call and find the item wasn't even put on a truck and we got no call. I think that's very poor service.,1,0,0,"[0.547269028928, 0.452730971072]"
2,"That Acme's store in certified has the worst customer service service and this is my third major purchase from Acme's and and all of them and have been just a disaster the system recent one, which was just a storm door and I screen patio door. It's the installers. I they gave me first of all I never did receive a phone call the first week from the installer and love the Acme's department store said, someone would call me I purchased it on Monday. They said someone call me on the Thursday, the person didn't call I had to call Acme's back on Saturday to get an installer to call me to schedule the installation that I've already paid for I inside paid for the the the items on the 19th of September and it's today is the third of October and I still haven't received the installation. The installer said, he was going to come. He said he had things going on he did not come when he said he forgot to pick up and that the screen door. He never called me back to reschedule I I was okay with with scheduling a week out from the the week of my purchase and he still didn't show up nor did he call so I had to call the store back last week and and and he was supposed to call me this weekend to schedule and she never did call for ended up calling Acme's store today and she got in contact today. The representative got in contact with the installer and now I am scheduled for this Thursday the sixth of October so I've been waiting for almost three weeks to get a storm door installed and I am five minutes away from the store very very disappointing. I mean, I I experienced the same with the other two items a water softener and she had that I purchased but it was.",2,0,0,"[0.952085658297, 0.0479143417028]"
3,"My opinion is that if you can see that someone is having a hard time with your appliance and that their maintenance cost has exceeded the amount of the item original cost. I think that you guys should have a better way to just refund your customers either a with the new Appliance. They sent me know, they do which model that same appliance for just refund that customer entirely this process is very arduous. It was a real big pain in the ***** and I would really like to be done with it and get over with the rest of my life. The fact is that I need to take off from work every time a service ticket for service technician has to come out and this has been a real big problem for me to get this fixed. I don't have all the way to or you know multiple times a week or a month to just take off to get my dishwasher fixed some people have the luxury, but I do not if so I would just like this process to be fixed and to be over with so I can get on with the rest of your life.",2,0,0,"[0.596916424912, 0.403083575088]"
4,"Acme's in a situation like this with refrigerators freezers that sort of thing should provide some kind of emergency care. And other words get get the thing fixed as soon as I mean, I mean quick this has been almost were working on Roland of the second week without refrigeration they could provide possibly a online lending program to have a refrigerator they could drop off here. So we could use it temporarily that would be that would be a good thing the number to thing. I think is that they should they should respond to refrigeration problems air conditioning problems that sort of things right away. The drier that I could wait wash machine that can wait, but refrigeration air conditioning something that affects the peoples lives you need to jump on that and get taken care of and and not a lot of stuff or like we've been doing.",1,0,0,"[0.529696837587, 0.470303162413]"


## Show false negative examples

In [26]:
falseNegatives = (predictions.select("Overall_Satisfaction_Comment", "Overall_Satisfaction", "label", "prediction", "probability")
    .filter(predictions['label']==1)
    .filter(predictions['prediction']==0))
print('Number of false negatives = {} out of {} surveys.'.format(falseNegatives.count(), test.count()))
falseNegatives.toPandas().head()

Number of false negatives = 172 out of 9123 surveys.


Unnamed: 0,Overall_Satisfaction_Comment,Overall_Satisfaction,label,prediction,probability
0,"My name is Mark Stein as I mentioned in my earlier comment the service at your warranty service Center was stellar. However, at the store level the service was something just slightly better than abysmal. It took three Associates before I was even offered the number for the warranty service Center and and initially I was told by your employee that her name was Becky that I would have to contact the manufacturer about my issue even though I purchased a warranty extended warranty and she said she had no idea called what they would say. That is just not it's still a customer experience particular customer with except and up so on the part of your staff for a warranty.",7,1,0,"[0.533488490962, 0.466511509038]"
1,"Today's comment that I wanted to leave is actually in regards to the whole experience that I had purchased the extended warranty plan for my washer my dryer and my refrigerator are actually walked into the the Acme's Delaware store number 0622 on on Saturday trying to purchase it and the person I talked to at the counter actually told me I had to go to customer service went to customer service and the gentleman who actually exact customer service had to get page because he wasn't of this post when he got there. He was extremely rude and although I was told that I could get the extended warranty plan. He was telling me I could not get it because of the fact that I purchased these the claim to got a different location and he told me I had to have the credit card purchased it with a long with the receipt and I had to purchase it that specific store. So course I was today. However, I did call that store back just to get a call permission and I actually was talking to a different gentleman who works in the appliances department. His name is Tom gave him the whole situation. He was extremely helpful told me I could come into the store and purchased the plan. So I did so then he told me that I had to call the customer service number for a claim started except reach order to register the extended warranty for all of these devices. I was extremely satisfied with Tom helped because of the fact that I was able to get what I wanted what he was saying matched up to exactly what the sales associate sold me. The appliances said and it all worked out very well and totally helpful, especially since I got that very bad experience couple days prior now when I was actually on the phone to register everything we but he was the woman I spoke to was extremely patient was calm and very helpful with everything.",7,1,0,"[0.766724744733, 0.233275255267]"
2,"Well, he had to put me on hold on the telephone to call a different Acme's number to get more information and he kept coming back on the line every 30 seconds to let me know that he still needed to keep me you know on hold and told he could get a hold of somebody that she needed and I thought I was very courteous. I didn't like with everything and he forgot about me. So thank you very much bye.",7,1,0,"[0.53279695015, 0.46720304985]"
3,"The things I like best or your when we were putting in the new kitchen your cat program and how knowledgeable the gentleman was who helped us with that and the things I like the least I'm because gotten for every and I've done. They patrol and remote controls and I feel that if the troubles for 20 years and sometimes my because go to a series middle to military ID card is recognized and sometimes it is and of put in a lot of service and I shouldn't have any problems at all getting my military discount fact home Depot does not recognize my because going on so you read ID at all. So I don't shop there anymore. I shop always at Acme's because most of the time they do recognize my military ID, but you have a paper that shows I D's from the various service branches and cause got auxiliary is not on there and it should be we do have a few member is too long to the auxiliary who didn't do anything, but most of us have specific job. Personally has taken the exact same training and qualified for a crew coxen, which is supposed, but kept and and a number of other things to I for sure she qualify for any of those kinds of things.",7,1,0,"[0.591700947139, 0.408299052861]"
4,"Yes, my name is Phillip and Condor. I am very satisfied with the Acme's customer service that I have experience, but the know across store. They always professional kind and very very well aware a product knowledge. I am dissatisfied with your menu like trying to I have also stated they need to put in the notes that they never ever come to my residence again, they're very rude unprofessional. I did not want to have any type of service dealings with them, but all to repair the appliances that I was I should purchase from Acme's. I do not what Norman electronics should be able to do any of my repairs ever again. There when compassionate the Acme's customer service agents District manager's in store staff. Not the most exceptional people and Billy was told to give them any type of Acme's products that is why I travel from one side a child to meet the other side of town to urge in the store in Norcross Georgia when I looked at each point Georgia. I just want to let it be known that that store. I had gave me nothing but professional as them they have resolved my issues with their professional customer service, but I do not want normally electronics to ever come to my home for any repairs for anything that I purchased from Acme's.",7,1,0,"[0.854294932404, 0.145705067596]"


## Evaluate the model performance by calculating the area under the ROC curve

In [27]:
from pyspark.ml.evaluation import BinaryClassificationEvaluator
evaluator = BinaryClassificationEvaluator().setLabelCol("label").setMetricName("areaUnderROC")
print('Area under the ROC curve = {}.'.format(evaluator.evaluate(predictions)))

Area under the ROC curve = 0.759485042775.


## Make a prediction on a fabricated positive comment

In [28]:
newPositiveComment = """
I had a great experience in the store.
The sales person was very helpful.
I was in and out quickly.
"""

newPositiveCommentFeatures = (spark.createDataFrame([(0, newPositiveComment)],
    ['id', 'Overall_Satisfaction_Comment']))
newPositiveCommentFeatures.toPandas().head()

Unnamed: 0,id,Overall_Satisfaction_Comment
0,0,\nI had a great experience in the store.\nThe sales person was very helpful.\nI was in and out quickly.\n


In [29]:
model.transform(newPositiveCommentFeatures).toPandas().head()

Unnamed: 0,id,Overall_Satisfaction_Comment,words,filtered,rawFeatures,features,rawPrediction,probability,prediction
0,0,\nI had a great experience in the store.\nThe sales person was very helpful.\nI was in and out quickly.\n,"[, i, had, a, great, experience, in, the, store., the, sales, person, was, very, helpful., i, was, in, and, out, quickly.]","[, great, experience, store., sales, person, helpful., quickly.]","(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)","(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.72533364637, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.89297016959, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.97508495222, 0.0, 0.0, 0.0, 1.83529610296, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.46828119722, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.97076224247, 0.0, 0.0, 0.0, 0.0, 1.45326875754, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.42451509601, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)","[-1.02582840742, 1.02582840742]","[0.263893653317, 0.736106346683]",1


## Make a prediction on a fabricated negative comment

In [30]:
newNegativeComment = """
I'm never shopping there again.
I had the worse experience of my life.
My order was messed up. They didn't get one item right.
I had to wait in line for almost an hour and to even speak to someone about resolving the problem.
The person I spoke to was very rude and not helpful at all.
I'm never going back again.
"""

newNegativeCommentFeatures = (spark.createDataFrame([(0, newNegativeComment)],
    ['id', 'Overall_Satisfaction_Comment']))
newNegativeCommentFeatures.toPandas().head()

Unnamed: 0,id,Overall_Satisfaction_Comment
0,0,\nI'm never shopping there again.\nI had the worse experience of my life.\nMy order was messed up. They didn't get one item right.\nI had to wait in line for almost an hour and to even speak to someone about resolving the problem.\nThe person I spoke to was very rude and not helpful at all.\nI'm never going back again.\n


In [31]:
model.transform(newNegativeCommentFeatures).toPandas().head()

Unnamed: 0,id,Overall_Satisfaction_Comment,words,filtered,rawFeatures,features,rawPrediction,probability,prediction
0,0,\nI'm never shopping there again.\nI had the worse experience of my life.\nMy order was messed up. They didn't get one item right.\nI had to wait in line for almost an hour and to even speak to someone about resolving the problem.\nThe person I spoke to was very rude and not helpful at all.\nI'm never going back again.\n,"[, i'm, never, shopping, there, again., i, had, the, worse, experience, of, my, life., my, order, was, messed, up., they, didn't, get, one, item, right., i, had, to, wait, in, line, for, almost, an, hour, and, to, even, speak, to, someone, about, resolving, the, problem., the, person, i, spoke, to, was, very, rude, and, not, helpful, at, all., i'm, never, going, back, again.]","[, i'm, never, shopping, again., worse, experience, life., order, messed, up., didn't, get, one, item, right., wait, line, almost, hour, even, speak, someone, resolving, problem., person, spoke, rude, helpful, all., i'm, never, going, back, again.]","(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 2.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 2.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0)","(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.04299944794, 1.42788536934, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.70535651443, 0.0, 1.72533364637, 0.0, 0.0, 0.0, 0.0, 0.0, 1.27735101227, 0.0, 3.49862052381, 1.41760590372, 0.0, 0.0, 1.1282924167, 0.0, 1.97393628852, 0.0, 0.0, 4.11100694186, 0.0, 2.57469069311, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.17433417197, 1.97508495222, 0.0, 1.46482530947, 1.34643503587, 1.83529610296, 1.76459292776, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.37785374008, 0.0, 1.14862742725, 0.0, 1.23361502414, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.97076224247, 0.0, 0.0, 0.0, 0.0, 1.45326875754, 1.70880434442, 1.5376888178, 1.96243433986, 0.0, 2.21346197258, 0.0, 0.0, 0.0, 0.0, 0.0, 2.82249412644, 0.0, 0.0, 0.0, 0.0, 1.59163869963, 0.0, 0.0, 0.0, 1.96830048635, 0.0, 0.0)","[0.290410096677, -0.290410096677]","[0.572096528422, 0.427903471578]",0


# Use Watson Tone Analyzer to analyze Overall Satisfaction Comments

## Setup configuration for the Tone Analyzer service

In [32]:
#!pip install --upgrade watson-developer-cloud
import watson_developer_cloud
from watson_developer_cloud import ToneAnalyzerV3

In [33]:
# The code was removed by DSX for sharing.

In [34]:
# Watson Tone Analyzer
#TONE_ANALYZER_USERNAME = 'your user name'
#TONE_ANALYZER_PASSWORD = 'your password'
tone_analyzer = ToneAnalyzerV3(version='2016-05-19',
                               username=TONE_ANALYZER_USERNAME,
                               password=TONE_ANALYZER_PASSWORD)

# Randomly pick a Overall Satisfaction Comment from the survey dataset to analyze

In [35]:
from random import randint
randomNumber = randint(0, survey.count())
#print('Survey number to be used if nothing entered: {}'.format(randomNumber))
#surveyNumber = int(raw_input("Enter survey number (integer): ") or randomNumber)
surveyNumber = randomNumber

In [36]:
surveyTone = survey.select(survey["Overall_Satisfaction_Comment"]).toJSON().collect()[surveyNumber][32:-1]
print(surveyTone)

"It was he was very helpful. I called for instrument issue with my order, but I have to the meeting I can thank you for working with me to help and he was talking Riley was sensitive. So it was wonderful. Thank you very much."


## Show Tone Analyzer results

In [37]:
import json
tone = tone_analyzer.tone(surveyTone, tones='emotion', content_type='text/plain')

print
print(surveyTone)
print('')

(print('{}: {:.2f}%'.format(tone['document_tone']['tone_categories'][0]['tones'][0]['tone_name'],
                   tone['document_tone']['tone_categories'][0]['tones'][0]['score']*100)))
(print('{}: {:.2f}%'.format(tone['document_tone']['tone_categories'][0]['tones'][1]['tone_name'],
                   tone['document_tone']['tone_categories'][0]['tones'][1]['score']*100)))
(print('{}: {:.2f}%'.format(tone['document_tone']['tone_categories'][0]['tones'][2]['tone_name'],
                   tone['document_tone']['tone_categories'][0]['tones'][2]['score']*100)))
(print('{}: {:.2f}%'.format(tone['document_tone']['tone_categories'][0]['tones'][3]['tone_name'],
                   tone['document_tone']['tone_categories'][0]['tones'][3]['score']*100)))
(print('{}: {:.2f}%'.format(tone['document_tone']['tone_categories'][0]['tones'][4]['tone_name'],
                   tone['document_tone']['tone_categories'][0]['tones'][4]['score']*100)))

"It was he was very helpful. I called for instrument issue with my order, but I have to the meeting I can thank you for working with me to help and he was talking Riley was sensitive. So it was wonderful. Thank you very much."

Anger: 3.55%
Disgust: 0.16%
Fear: 0.33%
Joy: 77.54%
Sadness: 6.71%


In [38]:
print(json.dumps(tone, indent=2))

{
  "document_tone": {
    "tone_categories": [
      {
        "category_id": "emotion_tone", 
        "tones": [
          {
            "tone_name": "Anger", 
            "score": 0.035546, 
            "tone_id": "anger"
          }, 
          {
            "tone_name": "Disgust", 
            "score": 0.001564, 
            "tone_id": "disgust"
          }, 
          {
            "tone_name": "Fear", 
            "score": 0.003344, 
            "tone_id": "fear"
          }, 
          {
            "tone_name": "Joy", 
            "score": 0.775446, 
            "tone_id": "joy"
          }, 
          {
            "tone_name": "Sadness", 
            "score": 0.067128, 
            "tone_id": "sadness"
          }
        ], 
        "category_name": "Emotion Tone"
      }
    ]
  }, 
  "sentences_tone": [
    {
      "input_to": 28, 
      "text": "\"It was he was very helpful.", 
      "sentence_id": 0, 
      "tone_categories": [
        {
          "category_id": "emotio