# Sentiment analysis
<img src="./screencast.gif"/>

In this sample, we will build a sentiment annotator for the [Movie Review](http://www.cs.cornell.edu/people/pabo/movie-review-data/) dataset from Cornell.

In [3]:
import json
import tarfile

with tarfile.open('data.json.tgz') as tar:
    file = tar.extractfile('data.json')
    data = file.read().decode('utf8')

In [4]:
data[:500]

'{"0": {"text": "in my review of \\" the spy who shagged me , \\" i postulated an unbreakable law of film physics : every time a sequel is as good as or better than the previous film in the series , it is followed by a third movie that is a bore . \\nthe cause is probably complacency ; a studio sighs with relief when part 2 lives up to expectations and figures part 3 is a sure thing . \\n \\" scream 3 \\" provides the latest proof of this rule . \\nin los angeles production has begun on \\" stab 3 : retu'

In [13]:
from textwrap import wrap


class Record(object):
    def __init__(self, id, text, cornell, vader=None, my=None):
        self.id = id
        self.text = text
        self.cornell = cornell
        self.vader = vader
        self.my = my
        
    def _repr_pretty_(self, printer, cycle):
        printer.text('id=%r' % self.id)
        printer.break_()
        printer.text('cornel=%r' % self.cornell)
        printer.break_()
        printer.text('vader=%r' % self.vader)
        printer.break_()
        printer.text('my=%r' % self.my)
        printer.break_()
        for line in wrap(self.text, 100):
            printer.text(line)
            printer.break_()


def parse(data):
    data = json.loads(data)
    for id in data:
        item = data[id]
        yield Record(
            id=id,
            text=item['text'],
            cornell=item['sent'],
        )
        
        
records = list(parse(data))

In [14]:
records[0]

id='1842'
cornel='pos'
vader=None
my=None
for this review and more , visit clear illusions ( www . clearillusions . com )  the majority of
scary movies signal the fact that a character is about to meet their demise with cheesy music , worn
out dialogue such as " i'll be right back , " or simply with the overall tone of the scene .  how
about a classic john denver song as a death signal for a change ?  that's the kind of bursting
originality that allows " final destination " to invade the viewer's mind , even days after seeing
it , making one pause before ever entering a dark room , taking a shower , or even going to sleep .
the unique and horrifying thriller is the best thing to happen to the slasher genre since 1996's "
scream . "   " final destination , " directed by james wong and penned by jeffrey riddick , glen
morgan , and james wong , the latter two being writers for the t . v .  series " the x-files , " is
a movie with wonderful ideas , and executes them effortlessly .  it's ra

In [15]:
len(records)

2000

In [16]:
import nltk
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/alexkuk/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [17]:
from tqdm import tqdm_notebook as log_progress

from nltk.sentiment.vader import SentimentIntensityAnalyzer
vader = SentimentIntensityAnalyzer()


for record in log_progress(records):
    score = vader.polarity_scores(record.text)
    # {'compound': 0.6156, 'neg': 0.074, 'pos': 0.085, 'neu': 0.842}
    record.vader = score['compound']






In [18]:
records[1]

id='952'
cornel='pos'
vader=0.9988
my=None
most sequels don't do what they're supposed to do like " toy story 2 " does .  far too many of them
end up re-hashing the original and adding very little .  is it any wonder that most sequels fail to
live up to their predecessors ?  thankfully , " toy story 2 " is a wonderful exception .  i can't
remember the last time i saw a sequel as consistently fun and inventive as this one .  it's yet
another sign that pixar , the acclaimed animation studio behind " toy story " and " a bug's life , "
is still at the top of its game .  woody and buzz are back in the sequel to the 1995 hit " toy story
, " and things have changed a bit since the last one ended .  woody ( voiced by tom hanks ) is
preparing to leave with his owner andy for cowboy camp .  in his absence , he has assigned buzz
lightyear ( tim allen ) , now comfortable in his role as a toy , to take charge .  unfortunately ,
woody's trip is ruined when he is accidentally put into the family gara

We will write a simple display formatter to make our output look nice

In [19]:
from IPython.display import display, HTML


RED = 'red'
GREEN = 'green'


def format_color(value, color):
    return '<span style="color:{color};">{value}</span>'.format(
        color=color,
        value=value
    )


def display_record(record):
    value = record.cornell
    if value == 'neg':
        color = RED
    elif value == 'pos':
        color = GREEN
    else:
        raise ValueError(value)
    display(HTML('cornell: ' + format_color(value, color)))
    
    value = record.vader
    color = RED if value < 0 else GREEN
    display(HTML('vader: ' + format_color(value, color)))

    value = record.my
    if value is not None:
        color = RED if value < 0 else GREEN
        display(HTML('my: ' + format_color(value, color)))
    
    print(record.text)

    
display_record(records[0])

for this review and more , visit clear illusions ( www . clearillusions . com ) 
the majority of scary movies signal the fact that a character is about to meet their demise with cheesy music , worn out dialogue such as " i'll be right back , " or simply with the overall tone of the scene . 
how about a classic john denver song as a death signal for a change ? 
that's the kind of bursting originality that allows " final destination " to invade the viewer's mind , even days after seeing it , making one pause before ever entering a dark room , taking a shower , or even going to sleep . 
the unique and horrifying thriller is the best thing to happen to the slasher genre since 1996's " scream . " 
 " final destination , " directed by james wong and penned by jeffrey riddick , glen morgan , and james wong , the latter two being writers for the t . v . 
series " the x-files , " is a movie with wonderful ideas , and executes them effortlessly . 
it's rare a film of this nature can grab an audi

## Assemble our annotator
Now we can assemble our checker using `ipyannotate`. For this task, we will show the user the model-evaluated sentiment, and let them override it with `+1`, `0` and `-1` buttons, which will modify the annotation tasks.

In [15]:
from ipyannotate.buttons import ValueButton as Button, NextButton, BackButton
from ipyannotate.toolbar import Toolbar
from ipyannotate.tasks import Task, Tasks
from ipyannotate.canvas import OutputCanvas
from ipyannotate.annotation import Annotation


def callback(button):
    annotation.tasks.current.output.my = button.value


tasks = Tasks(Task(_) for _ in records[:100])

pos = Button(1, shortcut='1', color='green')
neu = Button(0, shortcut='2', color='gray')
neg = Button(-1, shortcut='3', color='red')

for button in [pos, neu, neg]:
    button.on_click(callback)

buttons = [pos, neu, neg, BackButton(shortcut='j'), NextButton(shortcut='k')]
toolbar = Toolbar(buttons)

canvas = OutputCanvas(display=display_record)

annotation = Annotation(toolbar, tasks, canvas=canvas)
annotation

# annotation.tasks

In [16]:
annotation.tasks[:10]

[Task(output=Record(id='285', text=' " the blair witch project " was perhaps one of a kind , ..., value=1),
 Task(output=Record(id='1243', text='this is the last carry on film with its almost intact re..., value=0),
 Task(output=Record(id='551', text="you've got mail works alot better than it deserves to . \..., value=1),
 Task(output=Record(id='915', text='a slight romantic comedy with a feminist bent , but one w..., value=0),
 Task(output=Record(id='1870', text='scream 2 has a titillating little scene that lays down t..., value=1),
 Task(output=Record(id='510', text='quiz show , an almost perfectly accurate true story , is ..., value=0),
 Task(output=Record(id='109', text='i must admit i\'m going to be a bit biased in my review o..., value=None),
 Task(output=Record(id='1375', text="ahh yes . \nthe teenage romance . \nan attractive young ..., value=None),
 Task(output=Record(id='541', text="let me start off by saying that leading up to the release..., value=None),
 Task(output=Record