### Sentiment Analysis Gettysburg Address example
This is an example of sentiment analysis using the Gettysburg Address as the text.  
While this is one of the most noteworthy Presidential speeches in American history,  
the text is very short and is good material for testing.

In [50]:
import warnings
from pathlib import Path
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.tokenize import PunktSentenceTokenizer
import pandas as pd

In [51]:
warnings.filterwarnings('ignore')

In [52]:
data_source = Path('H:/My Documents/Work/pythondata/')
data_file = data_source / 'getty.txt'

f = open(str(data_file), encoding='latin-1')
raw = f.read()

In [53]:
# prep text by tokenizing sentences
raw = raw.lower()
sent_tokenizer = PunktSentenceTokenizer()
sentences = sent_tokenizer.tokenize(raw)

In [54]:
# display the text as tokenized sentences
sentences

['four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal.',
 'now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure.',
 'we are met on a great battle-field of that war.',
 'we have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live.',
 'it is altogether fitting and proper that we should do this.',
 'but, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground.',
 'the brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract.',
 'the world will little note, nor long remember what we say here, but it can never forget what they did here.',
 'it is for us the living, rather, to be dedicated here to the unfinished work w

In [55]:
# apply sentiment analyizer and save results in a list of dictionaries
#scores = []
sia = SentimentIntensityAnalyzer()
scores = [sia.polarity_scores(sent) for sent in sentences]
scores

[{'compound': 0.8126, 'neg': 0.0, 'neu': 0.756, 'pos': 0.244},
 {'compound': 0.7761, 'neg': 0.116, 'neu': 0.566, 'pos': 0.318},
 {'compound': 0.0516, 'neg': 0.26, 'neu': 0.467, 'pos': 0.273},
 {'compound': 0.0, 'neg': 0.0, 'neu': 1.0, 'pos': 0.0},
 {'compound': 0.0, 'neg': 0.0, 'neu': 1.0, 'pos': 0.0},
 {'compound': 0.0, 'neg': 0.0, 'neu': 1.0, 'pos': 0.0},
 {'compound': -0.7506, 'neg': 0.325, 'neu': 0.563, 'pos': 0.113},
 {'compound': 0.2498, 'neg': 0.0, 'neu': 0.909, 'pos': 0.091},
 {'compound': 0.4549, 'neg': 0.075, 'neu': 0.752, 'pos': 0.173},
 {'compound': 0.9566, 'neg': 0.1, 'neu': 0.629, 'pos': 0.272}]

#### Note
I used the VADER (Valence Aware Dictionary and sEntiment Reasoner) lexicon which produces  
scores for positive, neutral, negative and a compound sentiment score. The compound score is  
a summation of the valence scores normalized between -1 and +1. The other scores are  
a proportion of text in each category.

While this lexicon is tuned to work well for social media, it also provides very good results for other texts.

In [56]:
# create panda dataframes of the original sentences and the sentiment scores
sentiment = pd.DataFrame(scores)
text = pd.DataFrame(sentences)

In [57]:
# combine the sentence and sentiment dataframes into a single dataframe
data = pd.concat([text, sentiment], axis=1)

In [58]:
# display the dataframe
data

Unnamed: 0,0,compound,neg,neu,pos
0,four score and seven years ago our fathers bro...,0.8126,0.0,0.756,0.244
1,"now we are engaged in a great civil war, testi...",0.7761,0.116,0.566,0.318
2,we are met on a great battle-field of that war.,0.0516,0.26,0.467,0.273
3,we have come to dedicate a portion of that fie...,0.0,0.0,1.0,0.0
4,it is altogether fitting and proper that we sh...,0.0,0.0,1.0,0.0
5,"but, in a larger sense, we can not dedicate --...",0.0,0.0,1.0,0.0
6,"the brave men, living and dead, who struggled ...",-0.7506,0.325,0.563,0.113
7,"the world will little note, nor long remember ...",0.2498,0.0,0.909,0.091
8,"it is for us the living, rather, to be dedicat...",0.4549,0.075,0.752,0.173
9,it is rather for us to be here dedicated to th...,0.9566,0.1,0.629,0.272


In [59]:
# rename and reorganize the dataframe columns
data = data.ix[:, (0, 'compound', 'pos', 'neu', 'neg')]
data.columns = ['Sentences', 'Compound', 'Positive', 'Neutral', 'Negative']

In [60]:
# display of the cleaned up final dataframe with sentences and sentiment polarity scores
data.head()

Unnamed: 0,Sentences,Compound,Positive,Neutral,Negative
0,four score and seven years ago our fathers bro...,0.8126,0.244,0.756,0.0
1,"now we are engaged in a great civil war, testi...",0.7761,0.318,0.566,0.116
2,we are met on a great battle-field of that war.,0.0516,0.273,0.467,0.26
3,we have come to dedicate a portion of that fie...,0.0,0.0,1.0,0.0
4,it is altogether fitting and proper that we sh...,0.0,0.0,1.0,0.0


In [None]:
# write dataframe to a csv file
data.to_csv('H:\\My Documents\\Work\\pythondata\\getty_sentiment.csv')

#### Summary
The text covers 10 sentences, some fairly long but overall it displays a mostly positive  
outlook during one of the most difficult periods of American history. The first two and final  
sentences provide a positive and uplifting attitude. Notably the 7th sentence(index \#6)  
has the highest negative sentiment as it mentions the battlefield.




https://github.com/cjhutto/vaderSentiment  
Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.