---
layout: page
title: Introduction to NLP Sentiment Analysis with Python
description: This lesson uses Python to understand the sentiments of an author in text. 
---

## Source

Zoë Wilkinson Saldaña, "Sentiment Analysis for Exploratory Data Analysis," Programming Historian 7 (2018), https://doi.org/10.46430/phen0079.



## Reflection

This Programming Historian lesson dealt with understanding sentiment analysis in text and how to use the Natural Language ToolKit to parse text corpuses for sentiment scores. In this example by Programming Historian, I looked at email messages from the Enron Scandal of the early 2000s, and looking at how those messages were positive or negative in sentiment. I also used one message that Prof. Saxton sent earlier today to analyze the sentiment analysis of that message. 

It was cool to see how easy it is to use something that is very clearly technologically advanced, but with instructions and easy downloads is fairly seamless and simple. There were certainly some issues I noticed with the ToolKit, as far as it not being able to understand the contexts of human interaction-- such as, the phrase “I am having trouble understanding your logic here” in a corporate email shows much more frustration and contempt that it would in say, an email from a professor to a student in a homework evaluation. For the program, though, the words are the same, so it would have the same score. 

I can see some really pertinent uses of this kind of program in understanding social media trends, and governments using it to control their population, which is something I’m thinking a lot about in the wake of the Iran protests. I could see a government using sentiment analysis to try to police their populations with respect to how they talk about the government and their policies online, which could be a huge threat to freedom of speech and expression around the world.

As for personal use, I am interested in running this program over my own personal text messages with friends and family to see how my friends interact with me-- to see who interacts with me positively and negatively. I don’t know what I would do with this information, but it could be really interesting to know and would be a good way to use this program in a way that is pertinent to me. 


# Code


In [6]:
import nltk
import ssl

try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

nltk.download('vader_lexicon')
nltk.download('punkt')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/mason/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!
[nltk_data] Downloading package punkt to /Users/mason/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [7]:
#Import sentiment intensity analyzer
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [11]:

#Initialize Vader
sid = SentimentIntensityAnalyzer()


#Putting in text to analyze
message_text = '''Like you, I am getting very frustrated with this process. I am genuinely trying to be as reasonable as possible. I am not trying to "hold up" the deal at the last minute. I'm afraid that I am being asked to take a fairly large leap of faith after this company (I don't mean the two of you -- I mean Enron) has screwed me and the people who work for me.'''
print(message_text)

#find polarity scores
scores = sid.polarity_scores(message_text)

#print key-value pairs for scores
for key in sorted(scores):
        print('{0}: {1}, '.format(key, scores[key]), end='')

Like you, I am getting very frustrated with this process. I am genuinely trying to be as reasonable as possible. I am not trying to "hold up" the deal at the last minute. I'm afraid that I am being asked to take a fairly large leap of faith after this company (I don't mean the two of you -- I mean Enron) has screwed me and the people who work for me.
compound: -0.3804, neg: 0.093, neu: 0.836, pos: 0.071, 

In [12]:
#Message 2


#Putting in text to analyze
message_text = '''Looks great.  I think we should have a least 1 or 2 real time traders in Calgary.'''
print(message_text)

#find polarity scores
scores = sid.polarity_scores(message_text)

#print key-value pairs for scores
for key in sorted(scores):
        print('{0}: {1}, '.format(key, scores[key]), end='')


Looks great.  I think we should have a least 1 or 2 real time traders in Calgary.
compound: 0.6249, neg: 0.0, neu: 0.745, pos: 0.255, 

In [13]:
#Message 3


#Putting in text to analyze
message_text = '''I think we are making great progress on the systems side.  I would like to
set a deadline of November 10th to have a plan on all North American projects
(I'm ok if fundementals groups are excluded) that is signed off on by
commercial, Sally's world, and Beth's world.  When I say signed off I mean
that I want signitures on a piece of paper that everyone is onside with the
plan for each project.  If you don't agree don't sign. If certain projects
(ie. the gas plan) are not done yet then lay out a timeframe that the plan
will be complete.  I want much more in the way of specifics about objectives
and timeframe. Thanks for everyone's hard work on this'''
print(message_text)

#find polarity scores
scores = sid.polarity_scores(message_text)

#print key-value pairs for scores
for key in sorted(scores):
        print('{0}: {1}, '.format(key, scores[key]), end='')

I think we are making great progress on the systems side.  I would like to
set a deadline of November 10th to have a plan on all North American projects
(I'm ok if fundementals groups are excluded) that is signed off on by
commercial, Sally's world, and Beth's world.  When I say signed off I mean
that I want signitures on a piece of paper that everyone is onside with the
plan for each project.  If you don't agree don't sign. If certain projects
(ie. the gas plan) are not done yet then lay out a timeframe that the plan
will be complete.  I want much more in the way of specifics about objectives
and timeframe. Thanks for everyone's hard work on this
compound: 0.8951, neg: 0.042, neu: 0.821, pos: 0.136, 

In [14]:
#Message 4 


#Putting in text to analyze
message_text = '''I've heard from a few of you that you are having difficulty with the Programming Historian Lessons. I don't want you to stress out about this assignment. Do your best. You will have opportunity to fix or redo anything before your final grade. In the meantime, here are two things that may be of some help'''
print(message_text)

#find polarity scores
scores = sid.polarity_scores(message_text)

#print key-value pairs for scores
for key in sorted(scores):
        print('{0}: {1}, '.format(key, scores[key]), end='')

I've heard from a few of you that you are having difficulty with the Programming Historian Lessons. I don't want you to stress out about this assignment. Do your best. You will have opportunity to fix or redo anything before your final grade. In the meantime, here are two things that may be of some help
compound: 0.646, neg: 0.1, neu: 0.749, pos: 0.151, 

In [15]:
#Message 5 -- whole email block text


#Putting in text to analyze
message_text = '''It seems to me we are in the middle of no man's land with respect to the  following:  Opec production speculation, Mid east crisis and renewed  tensions, US elections and what looks like a slowing economy (?), and no real weather anywhere in the world. I think it would be most prudent to play  the markets from a very flat price position and try to day trade more aggressively. I have no intentions of outguessing Mr. Greenspan, the US. electorate, the Opec ministers and their new important roles, The Israeli and Palestinian leaders, and somewhat importantly, Mother Nature.  Given that, and that we cannot afford to lose any more money, and that Var seems to be a problem, let's be as flat as possible. I'm ok with spread risk  (not front to backs, but commodity spreads). The morning meetings are not inspiring, and I don't have a real feel for  everyone's passion with respect to the markets.  As such, I'd like to ask  John N. to run the morning meetings on Mon. and Wed.  Thanks. Jeff'''
print(message_text)

#find polarity scores
scores = sid.polarity_scores(message_text)

#print key-value pairs for scores
for key in sorted(scores):
        print('{0}: {1}, '.format(key, scores[key]), end='')

It seems to me we are in the middle of no man's land with respect to the  following:  Opec production speculation, Mid east crisis and renewed  tensions, US elections and what looks like a slowing economy (?), and no real weather anywhere in the world. I think it would be most prudent to play  the markets from a very flat price position and try to day trade more aggressively. I have no intentions of outguessing Mr. Greenspan, the US. electorate, the Opec ministers and their new important roles, The Israeli and Palestinian leaders, and somewhat importantly, Mother Nature.  Given that, and that we cannot afford to lose any more money, and that Var seems to be a problem, let's be as flat as possible. I'm ok with spread risk  (not front to backs, but commodity spreads). The morning meetings are not inspiring, and I don't have a real feel for  everyone's passion with respect to the markets.  As such, I'd like to ask  John N. to run the morning meetings on Mon. and Wed.  Thanks. Jeff
compoun

In [18]:
#Adding Tokenizer

from nltk import sentiment
from nltk import word_tokenize

#adding english pickle
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')

#adding message text
message_text = '''It seems to me we are in the middle of no man's land with respect to the  following:  Opec production speculation, Mid east crisis and renewed  tensions, US elections and what looks like a slowing economy (?), and no real weather anywhere in the world. I think it would be most prudent to play  the markets from a very flat price position and try to day trade more aggressively. I have no intentions of outguessing Mr. Greenspan, the US. electorate, the Opec ministers and their new important roles, The Israeli and Palestinian leaders, and somewhat importantly, Mother Nature.  Given that, and that we cannot afford to lose any more money, and that Var seems to be a problem, let's be as flat as possible. I'm ok with spread risk  (not front to backs, but commodity spreads). The morning meetings are not inspiring, and I don't have a real feel for  everyone's passion with respect to the markets.  As such, I'd like to ask  John N. to run the morning meetings on Mon. and Wed.  Thanks. Jeff'''


#tokenize list
sentences = tokenizer.tokenize(message_text)

#finding sentence scores
for sentence in sentences:
        print(sentence)
        scores = sid.polarity_scores(sentence)
        for key in sorted(scores):
                print('{0}: {1}, '.format(key, scores[key]), end='')
        print()



It seems to me we are in the middle of no man's land with respect to the  following:  Opec production speculation, Mid east crisis and renewed  tensions, US elections and what looks like a slowing economy (?
compound: -0.5267, neg: 0.197, neu: 0.68, pos: 0.123, 
), and no real weather anywhere in the world.
compound: -0.296, neg: 0.216, neu: 0.784, pos: 0.0, 
I think it would be most prudent to play  the markets from a very flat price position and try to day trade more aggressively.
compound: 0.0183, neg: 0.103, neu: 0.792, pos: 0.105, 
I have no intentions of outguessing Mr. Greenspan, the US.
compound: -0.296, neg: 0.216, neu: 0.784, pos: 0.0, 
electorate, the Opec ministers and their new important roles, The Israeli and Palestinian leaders, and somewhat importantly, Mother Nature.
compound: 0.4228, neg: 0.0, neu: 0.817, pos: 0.183, 
Given that, and that we cannot afford to lose any more money, and that Var seems to be a problem, let's be as flat as possible.
compound: -0.1134, neg: 