# Sentiment analysis
<img src="./screencast.gif"/>

In this sample, we will build a sentiment annotator for the [Movie Review](http://www.cs.cornell.edu/people/pabo/movie-review-data/) dataset from Cornell.

In [1]:
import json
import tarfile

with tarfile.open('data.json.tgz') as tar:
    file = tar.extractfile('data.json')
    data = file.read().decode('utf8')

In [2]:
data[:500]

'{"0": {"text": "in my review of \\" the spy who shagged me , \\" i postulated an unbreakable law of film physics : every time a sequel is as good as or better than the previous film in the series , it is followed by a third movie that is a bore . \\nthe cause is probably complacency ; a studio sighs with relief when part 2 lives up to expectations and figures part 3 is a sure thing . \\n \\" scream 3 \\" provides the latest proof of this rule . \\nin los angeles production has begun on \\" stab 3 : retu'

In [3]:
from textwrap import wrap


class Record(object):
    def __init__(self, id, text, cornell, vader=None, my=None):
        self.id = id
        self.text = text
        self.cornell = cornell
        self.vader = vader
        self.my = my
        
    def _repr_pretty_(self, printer, cycle):
        printer.text('id=%r' % self.id)
        printer.break_()
        printer.text('cornel=%r' % self.cornell)
        printer.break_()
        printer.text('vader=%r' % self.vader)
        printer.break_()
        printer.text('my=%r' % self.my)
        printer.break_()
        for line in wrap(self.text):
            printer.text(line)
            printer.break_()


def parse(data):
    data = json.loads(data)
    for id in data:
        item = data[id]
        yield Record(
            id=id,
            text=item['text'],
            cornell=item['sent'],
        )
        
        
records = list(parse(data))

In [4]:
records[0]

id='242'
cornel='neg'
vader=None
my=None
arye cross and courteney cox star as a pair of bostonians who meet in
a bar , go to the movies , fall in love , move in together , etc .
said , she said , or if you don't watch love & war on television , you
might think this is the most inventive film to come along in ages .
however , if you've seen any of these , than you have seen most of
this film .  this of course doesn't mean its bad .  some of it is
amusing , but overall , i just had to ask what's the point ?  arye
cross is the stereotypical single male who falls in love .  kevin
pollack is the stereotypical female-fearing best friend who make a lot
of rather sexist and vulgar jokes , most if which weren't very funny .
couteney cox is the stereotypical career-minded woman who falls in
love .  julie brown is the stereotypical bizarre best friend of said
woman .   ( notice the frequent use of the word stereotypical .  this
film uses a lot of formula , the plot is basically known from the
ope

In [5]:
len(records)

2000

In [6]:
import nltk
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/alexkuk/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [7]:
from tqdm import tqdm_notebook as log_progress

from nltk.sentiment.vader import SentimentIntensityAnalyzer
vader = SentimentIntensityAnalyzer()


for record in log_progress(records):
    score = vader.polarity_scores(record.text)
    # {'compound': 0.6156, 'neg': 0.074, 'pos': 0.085, 'neu': 0.842}
    record.vader = score['compound']






In [8]:
records[1]

id='937'
cornel='pos'
vader=0.9996
my=None
it stands as a moment one will not soon forget : a giant , green ogre
flips through the pages of a cliche fairy tale , narrating it with
every bit of dull inspiration that the story holds .  this leads one
to believe that this serves as the prologue to shrek , dreamworks'
second computer animated feature , but in a pricelessly hilarious bit
of cinema , a page of this tale serves as that ogre's toilet paper .
from this opening moment , one can infer shrek's defying of all
expectations regarding it as a standard , disney-esque fairy tale .
although rampant moments of hilarity dot shrek , the true charm of the
film lies in the bold elements of friendship , courage , and
acceptance , excelled by outstanding direction , stunning actor voice
work , and most importantly , a witty screenplay with more going on
than meets the eye .  while shrek features an abundance of humor
related directly toward adults , positive friendship values aimed at
younger c

We will write a simple display formatter to make our output look nice

In [9]:
from IPython.display import display, HTML


RED = 'red'
GREEN = 'green'


def format_color(value, color):
    return '<span style="color:{color};">{value}</span>'.format(
        color=color,
        value=value
    )


def display_record(record):
    value = record.cornell
    if value == 'neg':
        color = RED
    elif value == 'pos':
        color = GREEN
    else:
        raise ValueError(value)
    display(HTML('cornell: ' + format_color(value, color)))
    
    value = record.vader
    color = RED if value < 0 else GREEN
    display(HTML('vader: ' + format_color(value, color)))

    value = record.my
    if value is not None:
        color = RED if value < 0 else GREEN
        display(HTML('my: ' + format_color(value, color)))
    
    print(record.text)

    
display_record(records[0])

arye cross and courteney cox star as a pair of bostonians who meet in a bar , go to the movies , fall in love , move in together , etc . 
well , if you haven't seen when harry met sally or he said , she said , or if you don't watch love & war on television , you might think this is the most inventive film to come along in ages . 
however , if you've seen any of these , than you have seen most of this film . 
this of course doesn't mean its bad . 
some of it is amusing , but overall , i just had to ask what's the point ? 
arye cross is the stereotypical single male who falls in love . 
kevin pollack is the stereotypical female-fearing best friend who make a lot of rather sexist and vulgar jokes , most if which weren't very funny . 
couteney cox is the stereotypical career-minded woman who falls in love . 
julie brown is the stereotypical bizarre best friend of said woman . 
 ( notice the frequent use of the word stereotypical . 
this film uses a lot of formula , the plot is basically kn

## Assemble our annotator
Now we can assemble our checker using `ipyannotate`. For this task, we will show the user the model-evaluated sentiment, and let them override it with `+1`, `0` and `-1` buttons, which will modify the annotation tasks.

In [10]:
from ipyannotate.buttons import ValueButton as Button, NextButton, BackButton
from ipyannotate.toolbar import Toolbar
from ipyannotate.tasks import Task, Tasks
from ipyannotate.canvas import OutputCanvas
from ipyannotate.annotation import Annotation


def callback(button):
    annotation.tasks.current.output.my = button.value


tasks = Tasks(Task(_) for _ in records[:100])

pos = Button(1, shortcut='1', color='green')
neu = Button(0, shortcut='2', color='gray')
neg = Button(-1, shortcut='3', color='red')

for button in [pos, neu, neg]:
    button.on_click(callback)

buttons = [pos, neu, neg, BackButton(shortcut='j'), NextButton(shortcut='k')]
toolbar = Toolbar(buttons)

canvas = OutputCanvas(display=display_record)

annotation = Annotation(toolbar, tasks, canvas=canvas)
annotation

# annotation.tasks

In [12]:
annotation.tasks[:10]

[Task(output=<__main__.Record object at 0x10d085eb8>, value=1),
 Task(output=<__main__.Record object at 0x10d085ef0>, value=0),
 Task(output=<__main__.Record object at 0x10d085f28>, value=-1),
 Task(output=<__main__.Record object at 0x10d085f60>, value=1),
 Task(output=<__main__.Record object at 0x10d085f98>, value=0),
 Task(output=<__main__.Record object at 0x10d085fd0>, value=-1),
 Task(output=<__main__.Record object at 0x10d087048>, value=1),
 Task(output=<__main__.Record object at 0x10d087080>, value=0),
 Task(output=<__main__.Record object at 0x10d0870b8>, value=-1),
 Task(output=<__main__.Record object at 0x10d0870f0>, value=1)]