In [1]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 10;

<IPython.core.display.Javascript object>

# Using sentiment analysis for fun ~~and profit~~

Is that text positive or negative?  Figuring that out is called sentiment analysis.

## Performing sentiment analysis with `py-processors`

In [2]:
from processors import *
# We'll be using the server in several examples
# NOTE: you can stop it manually with API.stop_server()
API = ProcessorsAPI(port=8886, keep_alive=True)

Using default


INFO - Starting processors-server (java -Xmx3G -cp /Users/gus/anaconda3/envs/bored/lib/python3.5/site-packages/processors/processors-server.jar NLPServer --port 8886 --host localhost) ...



Waiting for server...
[===                                                         ]

Connection with processors-server established (http://localhost:8886)


In [3]:
# CoreNLP's sentiment scores range from (1 (very negative) to 5 (very positive)`)
API.sentiment.corenlp.score_text("I'm so happy!")

[4]

# Pulling movie reviews from Rotten Tomatoes using `snappytomato`

Documentation: http://snappytomato.readthedocs.io/en/latest/

In [4]:
from snappytomato import *

# path to api key
api_key_file = "../rt.key"
api_key = load_api_key(api_key_file)
snappy = RT(api_key)

# Retrieve movie data by its title

In [5]:
movie = snappy.movies.movie_by_title('big trouble in little china')

In [6]:
summary = """
title:\t\t\t"{}"
audience score:\t\t{}
critic consensus:\t{}
""".format(movie.title, movie.audience_score, movie.critics_consensus or "unknown")
print(summary)


title:			"Big Trouble in Little China"
audience score:		83
critic consensus:	unknown



In [7]:
reviews = movie.reviews
print("{} reviews for \"{}\"".format(len(movie.reviews), movie.title))
for r in reviews:
    summary = """
    critic: {}
    quote: {}
    publication: {}
    source: {}
    freshness: {}
    original score: {}""".format(r.critic, r.quote, r.publication, r.source, r.freshness, r.original_score if hasattr(r, "original_score") else "None")
    print(summary)

4 reviews for "Big Trouble in Little China"

    critic: 
    quote: 
    publication: Variety
    source: http://www.variety.com/review/VE1117789259.html?categoryid=31&cs=1
    freshness: none
    original score: None

    critic: 
    quote: 
    publication: Time Out
    source: http://www.timeout.com/film/reviews/67813/big_trouble_in_little_china.html
    freshness: none
    original score: None

    critic: Walter Goodman
    quote: An upscale send-up.
    publication: New York Times
    source: http://movies.nytimes.com/movie/review?res=9A0DE6D7113BF931A35754C0A960948260&partner=Rotten Tomatoes
    freshness: fresh
    original score: 3/5

    critic: Roger Ebert
    quote: Special effects don't mean much unless we care about the characters who are surrounded by them, and in this movie the characters often seem to exist only to fill up the foregrounds.
    publication: Chicago Sun-Times
    source: http://www.rogerebert.com/reviews/big-trouble-in-little-china-1986
    freshness: 

# The review text

Rotten Tomatoes only indexes a single quote from each review, but often they provide a link to the original review.

In [8]:
from bs4 import BeautifulSoup
import requests

In [9]:
# hmmm...not all the reviews are positive...
# I guess not everyone is enlightened.
# Well, let's look at a review we know
# to be "fresh"
r = reviews[-2]
print("Critic: {}".format(r.critic))
print("Publication: {}".format(r.publication))
print("Freshness: {}".format(r.freshness))
# not every review has a link
url = r.links.get("review", None)
article_text = ""
if url:
    response = requests.get(url)
    html = response.text
    soup = BeautifulSoup(html, "html.parser")
    article_text = "\n".join([p.text for p in soup.find_all("p")])

Critic: Walter Goodman
Publication: New York Times
Freshness: fresh


In [10]:
article_text

"Want both newspaper deliveryand free, unlimited digital access?\nWant unlimited access to NYTimes.com and our apps?\n\nIF, as is not unlikely, you should lose track of what is going on in ''Big Trouble in Little China'' and think you have wandered into a festival of ''Raiders of the Lost Ark,'' ''Romancing the Stone,'' ''Star Wars,'' ''The Karate Kid,'' ''Flash Gordon'' and a throng of facsimiles, don't be concerned. What matters is the stunts and the spirit, and this latest set of exotic exploits of an indomitable hero (Kurt Russell) and a spunky heroine (Kim Cattrall) gives good value.\n\n\n\nThis ''mystical action-adventure-comedy-kung fu-monster-ghost sto ry,'' which opens today at the RKO Warner Twin and other theaters, gets going with a ferocious battle between opposing bands of Chinese gunmen, knife-slingers, sword-jugglers and high kickers and ends the same way, only more so, with blades and bodies flashing, hissing and sizzling and with sensational feats of levitation and cla

In [11]:
sentiment_scores = API.sentiment.corenlp.score_text(article_text)
print(sentiment_scores)

[2, 2, 0, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 3, 2, 1, 2, 1, 1, 0, 2, 3, 1]

# Problem?

We know our review is supposed to be "fresh", but we're getting a lot of low scores.  What's going on here? 

## Challenge 1: 
    - Why does the sentiment seem negative when the review is positive?
    - How can we select the relevant text for sentiment analysis of a review?


In [12]:
API.sentiment.corenlp.score_text("I'm so happy!")

[4]

## Challenge 2:

 - Use of emoticons is quite common in informal speech (social media, etc.).  The CoreNLP sentiment analysis system is not emoticon-aware:

In [13]:
API.sentiment.corenlp.score_text(":)")

[2]

- Build a sentiment analysis system for emoticons