# (5C) Sentiment analysis

Sentiment analysis is widely used as a way of measuring the positivity or negativity of sentences. It has a lot of corporate uses–companies want to know how its products are being reviewed, for instance–as well as political uses—candidates want to know how they're being talked about on Twitter, say. But we can also imagine literary uses: how are characters being described? Does sentiment change over time in the novel? Here's some tools to do sentiment analysis.

#### Install 
For this notebook, you'll need to install vaderSentiment:

    pip install vaderSentiment

In [1]:
# do some imports
import os
import nltk
import pandas as pd
pd.set_option('display.max_colwidth', 0)

## How can I get the sentiment of sentences?

### (1) textblob

In [2]:
from textblob import TextBlob

def sentiment_analysis_textblob(string):
    # first make a blob
    blob = TextBlob(string)

    # make output dictionary
    output_list = []
    
    # for each sentence
    sent_num=0
    for sent in blob.sentences:
        sent_num+=1
        
        # make an empty results dictionary
        result_dict={}
        result_dict['_sent_num'] = sent_num
        result_dict['_sent'] = str(sent)
        
        result_dict['polarity'] = sent.sentiment.polarity
        result_dict['subjectivity'] = sent.sentiment.subjectivity
        
        output_list.append(result_dict)
    
    return output_list

In [3]:
juliet="""
But soft, what light through yonder window breaks?
It is the east, and Juliet is the sun.
Arise, fair sun, and kill the envious moon.
Who is already sick and pale with grief
That thou, her maid, art far more fair than she.
"""

In [4]:
pd.DataFrame(sentiment_analysis_textblob(juliet))

Unnamed: 0,_sent,_sent_num,polarity,subjectivity
0,"\nBut soft, what light through yonder window breaks?",1,0.25,0.525
1,"It is the east, and Juliet is the sun.",2,0.0,0.0
2,"Arise, fair sun, and kill the envious moon.",3,0.7,0.9
3,"Who is already sick and pale with grief\nThat thou, her maid, art far more fair than she.",4,-0.070714,0.60619


### (2) polyglot

[Polyglot](https://polyglot.readthedocs.io/) can also do sentiment analysis, and in multiple languages.

To install:

    conda install -c conda-forge pyicu
    pip install pycld2
    pip install morfessor
    pip install polyglot
    polyglot download LANG:en   # for english
    polyglot download LANG:es   # for spanish (optional)
    polyglot download LANG:xx   # where xx is the two-letter language code
   
See [the website](https://polyglot.readthedocs.io/) for more details.

In [None]:
def sentiment_analysis_polyglot(string):
    # let's try this...
    try:
        # to use polyglot, import its "Text" object:
        from polyglot.text import Text
    except ImportError:
        print('Polyglot not installed! To do so, follow the instructions above.')
        return
    # from here on we can assume that polyglot is imported
    
    # wrap that Text object around any string
    pg_text = Text(string)

    # make an output list
    output_list = []
    
    # loop over sentences
    sent_num = 0
    for sent in pg_text.sentences:
        sent_num+=1

        # make an empty results dictionary
        result_dict={}
        result_dict['_sent_num'] = sent_num
        result_dict['_sent'] = str(sent)
        
        # make a new text (maybe this sentence is in a different language?)
        sent2 = Text(str(sent))
        
        result_dict['polarity'] = sent2.polarity

        # add to output
        output_list.append(result_dict)
        
    return output_list

In [None]:
pd.DataFrame(sentiment_analysis_polyglot("""
I have a horrible dog named Spot who is honestly the worst creature on the entire planet.
I have a dog named Spot who is honestly NOT the worst creature on the entire planet.
Tengo un perro horrible que honestamente es la peor criatura en todo el planeta.
I have an OK dog named Spot who is honestly a pretty good creature.
Tengo un perro horrible que honestamente es la peor criatura en todo el planeta.
"""))

### (2) VADER (recommended)

From the [source for vaderSentiment](https://github.com/cjhutto/vaderSentiment):

<blockquote>VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. It is fully open-sourced under the MIT License.</blockquote>

For more information, see this [post on sentiment analysis](https://medium.com/analytics-vidhya/simplifying-social-media-sentiment-analysis-using-vader-in-python-f9e6ec6fc52f).


In [5]:
def get_vader_scores(string):
    from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
    analyser = SentimentIntensityAnalyzer()
    score = analyser.polarity_scores(string)
    return score

In [6]:
# Test it out
get_vader_scores("I have a horrible dog named Spot who is honestly the worst creature on the entire planet.")

{'neg': 0.336, 'neu': 0.531, 'pos': 0.133, 'compound': -0.6808}

In [7]:
# Can we trick it?
get_vader_scores("I have a NOT SO horrible dog named Spot who is honestly NOT the worst creature on the entire planet.")

{'neg': 0.0, 'neu': 0.587, 'pos': 0.413, 'compound': 0.8899}

In [8]:
# Make a function for sentiment analysis with VADER

def sentiment_analysis_vader(string):
    blob = TextBlob(string)

    # make output dictionary
    output_list = []
    
    # for each sentence
    sent_num=0
    for sent in blob.sentences:
        # add 1 to number of sentence
        sent_num+=1
        
        # print sent_num if it's divisible by 100
        if not sent_num%100: print(sent_num, len(blob.sentences))
        
        # make an empty results dictionary
        result_dict={}
        
        # store sent num to it
        result_dict['_sent_num'] = sent_num
        
        # store the sentence to it
        result_dict['_sent'] = str(sent)
        
        # get the score dictionary
        vader_scores = get_vader_scores(sent)
        
        # loop over the scores dictionary
        for key,value in vader_scores.items():
            # assign the score to the result dictionary
            result_dict[key]=value
            
        # add result dictionary to output_list
        output_list.append(result_dict)
    
    # return output list
    return output_list

In [9]:
pd.DataFrame(sentiment_analysis_vader(juliet))

Unnamed: 0,_sent,_sent_num,compound,neg,neu,pos
0,"\nBut soft, what light through yonder window breaks?",1,0.0,0.0,1.0,0.0
1,"It is the east, and Juliet is the sun.",2,0.0,0.0,1.0,0.0
2,"Arise, fair sun, and kill the envious moon.",3,-0.6705,0.482,0.355,0.163
3,"Who is already sick and pale with grief\nThat thou, her maid, art far more fair than she.",4,-0.6003,0.27,0.623,0.108


In [None]:
# Let's run it on Harry Potter
with open('../corpora/harry_potter/texts/Sorcerers Stone.txt') as file:
    txt=file.read()

In [None]:
df_sentiment_harrypotter = pd.DataFrame(sentiment_analysis_vader(txt))
df_sentiment_harrypotter

In [None]:
df_sentiment_harrypotter.sort_values('compound',ascending=False)

In [None]:
# Show preponderance of negativity
df_sentiment_harrypotter.plot(x='_sent_num', y='neg',figsize=(24,6))

In [None]:
# What's going on in the negative region?
df_sentiment_harrypotter.query('1000 < _sent_num < 1400').sort_values('compound')

## For sentiment analysis research team

* Research sentiment analysis in all of Tropic of Orange:
    * Run the VADER sentiment analyzer on all of Tropic of Orange (see below)
    * Merge that dataframe with the Tropic of Orange metadata
    * Save the merged dataframe to excel
    * Open it in Tableau and generate a few graphs. Which are the narrators with the most pos/neg sentiment? Who has the highest emotional range?


* (Advanced) Calculate the sentiment only for sentences in which places are mentioned (expand on Tuesday's work).
    * Merge this dataframe with the Tropic of Orange metadata
    * Merge in Tableau or in pandas that dataframe with the lat/long data from Tuesday
    * Map the emotionality of places
    
    
* (Advanced) Calculate the sentiment only for sentences in which people are mentioned
    * Merge with Tropic of Orange metadata, save
    * Plot in Tableau
    * Who are the most positively/negatively *mentioned* people?

In [None]:
## @TODO: Get the sentiment for every sentence in the entire Tropic of Orange text
#

# Load the dataframe for Tropic of Orange
df_tropic = pd.read_excel('../corpora/tropic_of_orange/metadata.xls')

# make an empty list for all results in the book
all_results = []

# set a variable to the text folder
text_folder = '../corpora/tropic_of_orange/texts'


# loop over the filename column in df_tropic...      

    # print filename
    

    # get full path
    
    
    # open text
    
        
    # call one of the sentiment analysis functions to get back a list of dictionaries
    
    
    # for each result dictionary in that list
    
        # add the filename to the result dictionary
        
        # append the result dictionary to all_results
        

# make a data frame from all of the results
