Sentiment Analysis
===

![](images/pos_neg_sentiment.png/)

By The End of This Session You Will:
---
- Application of sentiment analysis
- Be able to utilize different labels for sentiment analysis
- Be able to list the elements in a sentiment analysis problem
- Baseline algorithm of sentiment analysis (Naive Bayes)

***
<br>
<br>

Application of Sentiment Analysis
===

There are many real-world applications to being able to classify if a document is of positive or negative sentiment

Summary 
---

1. __Classify if product reviews are positive or negative__
   - This allows us to assign rating to various aspects of a product
   - E.g. __customer service__ will receive a high rating if most of 
     the comments related to customer service are positive 
   - Such as __"I DO like honest technical support people__
   
   <img src="images/product.png" width="500px">
   
   <br>
   
2. __Twitter sentiment analysis to predict poll results or stock market__

   - If people feel more negatively towards a subject, the polls are lower. Visit 
     [Gallup Polls](http://www.gallup.com/)
     
     <img src="images/twitter.png" width="500px">
     
   - Certain sentiments, such as calmness, predicts stock market price during the financial crisis
   
     <img src="images/stock.png" height="200px">


     
  


***
<br>
<br>

Knowledge Check Questions
---

1) List 2 other applications of sentiment analysis. State the business value in each case.

In [None]:
# How do people feel about a movie? Is it good or bad?
# How do people feel about a brand? Do you like United Airlines?

<details><summary>
Click here for solution to 1.
</summary>
`
1. Sentiment of movie reviews to predict box office ahead of time

2. Sentiment of political candidicates on social media to predict election results
`
</details>

***
<br>
<br>

Different Sentiment Labels
===

Here we show a variety of sentiment other than just positive and negative.

The examples given below are by no means exhastive.

Summary
---

1. ___Emotion_ is brief and organically synchronized. Usually evaluation of a major event__
  - Angry
  - Sad
  - Joyful
  - Fearful
  - Ashamed
  
  <br>

2. ___Mood_ is a more long-term and low intensity subjective feeling__
   - Cheerful
   - Gloomy
   - Irritable
   - Depressed
   - Buoyant

   <br>
 
3. ___Interpersonal stances_ are attitudes towards another person in a specific interaction__
   - Friendly
   - Flirtatious
   - Distant
   - Warm
   - Supportive
   - Contemptuous
   
   <br>
   
4. ___Attitudes_ are enduring beliefs towards certain objects or persons__
   - Liking
   - Loving
   - Valuing
   - Desiring
   
   <br>
   
5. ___Personality traits_ are stable personalities and behavior tendencies__
   - Nervous
   - Anxious
   - Reckless
   - Morose
   - Hostile

***
<br>
<br>

Knowledge Check Questions
---

1) What is the category of labels I should use if I am developing an app for detecting short-term and real-time sentiments ?

In [None]:
#Emotions

<details><summary>
Click here for solution 1.
</summary>
`
Emotions
`
</details>

2) What is the category of labels Google should consider if Google Chat is to add an extra feature indicating if the person talking to you like you or not ?

In [None]:
#Interpersonal Stances

<details><summary>
Click here for solution 2.
</summary>
`
Interpersonal stances
`
</details>

3) What is the category of labels I should use for detecting the general atmosphere of the stock market ?

In [None]:
#Mood 

<details><summary>
Click here for solution 3.
</summary>
`
Mood
`
</details>

***
<br>
<br>

Elements of Sentiment Analysis
===

Below are the __3 possible elements__ of a sentiment analysis problem.

One can tackle one or more elements in a sentiment analysis problem.


Summary
---

1. __Holder of sentiment__

   - For example: __John (Holder)__ is angry at Joe.
   
   <br>

2. __Target of sentiment__

   - For example: John is angry at __Joe (Target)__.

   <br>
   
3. __Type of sentiment__

   - As listed above: __Emotion__, __Mood__, __Interpersonal stances__ ...

***
<br>
<br>

Knowledge Check Questions
---

1) List the 3 elements in a sentiment analysis problem.

In [None]:
#Holder of sentiment
#Target of sentiment
#Type of sentiment

<details><summary>
Click here for solution 1.
</summary>
`
1. Holder of sentiment
2. Target of sentiment
3. Type of sentiment
`
</details>

***
<br>
<br>

Baseline Model for Sentiment Analysis
===

Here we will walk through the basic steps of getting a baseline model for sentiment analysis.

Summary
---

1. __Tokenization__
   
   - Depending on the source, different tokenizer might be used to replace special characters (covered perviously)
   - Accounting for __negation__ is especially important for sentiment analysis
   - __For example:__
     
     `I didn't eat the pie => I didn't NOT_eat NOT_the NOT_pie`
   
   <br>
   
2. __Boolean featurization__

   - In __text classification__, we have covered the __Bag of Words__ representation
   - In __sentiment analysis__, we care more about if the word occurred more than how many times it occurred
   - __Boolean featurization__ indicates if a particular word occurs in the document
   - __For example:__
   
     <br> 
     
     - __2 documents:__
     
     ```
     1. this is ridiculous, abosolutly ridiculous. Ridiculous
     2. I love this. Simply love this.
     ```
     
     ![](images/boolean_feature.png)

   <br>

3. __Naive Bayes__

   - The baseline machine learning model for sentiment analysis is usually Naive Bayes
   - There will be other discriminative models covered later (MaxEnt and SVM)

***
<br>
<br>

Exercises
---

1) You are given the following documents. Write a function that will tokenize the documents and featurize the tokens.

   You can expect the following output, a list of the vocab and a list of list of boolean features (one list per documnet)
   
   ``
   (['abosolutly', 'love', 'this', 'is', 'i', 'ridiculous', 'simply'],
   [[1, 0, 1, 1, 0, 1, 0], [0, 1, 1, 0, 1, 0, 1]])
   ``

In [50]:
documents = ['This is ridiculous, abosolutly ridiculous. Ridiculous',
             'I love this. Simply love this.']

import re
def featurize(documents):
    doc_set = set()
    vocab = []
    for doc in documents:
        doc=doc.split(' ')
        
        for word in doc:
            word= word.lower()
            word=re.sub(r'[.,]','',word)
            if word in doc_set:
                pass
            else:
                doc_set.add(word)
    
    for v in doc_set:
   
        for doc in documents:
            doc=doc.split(' ')
            doc = [i.lower() for i in doc]
            current_vocab=[]
            
            if v in doc:
                current_vocab.append(1)
            else:
                current_vocab.append(0)
                
            
            vocab.append(current_vocab)
            

    
    print(doc_set)
    print(vocab)
    # your code here
    

In [51]:
featurize(documents)

{'i', 'simply', 'this', 'love', 'abosolutly', 'ridiculous', 'is'}
[[0], [1], [0], [1], [1], [0], [0], [1], [1], [0], [1], [0], [1], [0]]


<details><summary>
Click here for solution 1.
</summary>
```
import re
def featurize(documents):
    vocab = set()
    tokenized_docs = []
    for doc in documents:
        clean_doc = re.sub('[.|,]', ' ', doc).lower().strip()
        doc_token_lst = re.split('\s+', clean_doc)
        tokenized_docs.append(set(doc_token_lst))
        vocab.update(doc_token_lst)
    
    vocab_lst = list(vocab)
    results = []
    for tokenized_doc_set in tokenized_docs:
        boolean_vector = [1 if word in tokenized_doc_set else 0 for word in vocab_lst] 
        results.append(boolean_vector)
    
    return vocab_lst, results
```
</details>

***
<br>
<br>

Summary
===

__Application of Sentiment Analysis__:
- Classify sentiment of product reviews to gather opinions about product
- Classify sentiment of tweets to make predictions about outcome of polls


__Sentiment labels__:
- __Emotions__, e.g. Angry, Happy
- __Mood__, e.g. Cheerful, Gloomy
- __Interpersonal stance__: e.g Friendly, Hostile
- __Attitude__: e.g. Loving, Liking
- __Personality__: e.g. Nervous, Anxious

__Element of sentiment analysis__:
- Holder of sentiment
- Target of sentiment
- Type of sentiment

__Baseline model for sentiment analysis__:
- Tokenization accounting for negation
- Boolean feature
- Naive Bayes


<br>
<br> 
<br>

----