# Sentiment analysis
<img src="./screencast.gif"/>

In this sample, we will build a sentiment annotator for the [Movie Review](http://www.cs.cornell.edu/people/pabo/movie-review-data/) dataset from Cornell.

In [1]:
import json
import tarfile

with tarfile.open('data.json.tgz') as tar:
    file = tar.extractfile('data.json')
    data = file.read().decode('utf8')

In [2]:
data[:500]

'{"0": {"text": "in my review of \\" the spy who shagged me , \\" i postulated an unbreakable law of film physics : every time a sequel is as good as or better than the previous film in the series , it is followed by a third movie that is a bore . \\nthe cause is probably complacency ; a studio sighs with relief when part 2 lives up to expectations and figures part 3 is a sure thing . \\n \\" scream 3 \\" provides the latest proof of this rule . \\nin los angeles production has begun on \\" stab 3 : retu'

In [3]:
class Record(object):
    def __init__(self, id, text, cornell, vader=None, my=None):
        self.id = id
        self.text = text
        self.cornell = cornell
        self.vader = vader
        self.my = my
        
    def __repr__(self):
        return 'Record(id={self.id!r}, text={self.text!r}), cornell={self.cornell!r}, vader={self.vader!r}, my={self.my!r})'.format(self=self)
        

def parse(data):
    data = json.loads(data)
    for id in data:
        item = data[id]
        yield Record(
            id=id,
            text=item['text'],
            cornell=item['sent'],
        )
        
        
records = list(parse(data))

In [4]:
records[0]

Record(id='285', text=' " the blair witch project " was perhaps one of a kind , a unique film that played completely on its own merit , managing to scare even the most experienced horror fans out of their senses . \nits success made a sequel inevitable , but this is not the sequel , i suspect , anyone much wanted . \nafter the release of " the blair witch project " , tourists have practically invaded the small town of burkettsville , in order to get a glimpse of the blair witch . \nlocals have turned this mass hysteria into a great business opportunity , selling twig-sculptures , stones and dirt like those in the movie , and the exasperated local sheriff patrols the woods with a bullhorn , shouting , " get out of these woods and go home ! \nthere is no goddamned blair witch ! " . \njeff ( ) is one of those people , who has used the sudden popularity of the small town to his advantage . \nafter he got released from the mental institution , he created a mobile business that attracts thou

In [5]:
len(records)

2000

In [6]:
import nltk
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/alexkuk/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [7]:
from tqdm import tqdm_notebook as log_progress

from nltk.sentiment.vader import SentimentIntensityAnalyzer
vader = SentimentIntensityAnalyzer()


for record in log_progress(records):
    score = vader.polarity_scores(record.text)
    # {'compound': 0.6156, 'neg': 0.074, 'pos': 0.085, 'neu': 0.842}
    record.vader = score['compound']






In [8]:
records[1]

Record(id='1243', text='this is the last carry on film with its almost intact regular cast and it is the swansong of hattie jacques and sid james . \ndick turpin ( sid james ) and his gang which includes harriett/harry ( barbara windsor ) and tom " doc " scholl ( peter butterworth ) , terrorise the countryside by staging highway robberies : " stand and deliver ! " \nowing to the increased occurrence of turpin\'s robberies , captain desmond fancey of the bow street runners ( kenneth williams ) and his sidekick sergeant jock strapp ( jack douglas ) visit turpin\'s area of influence to bring him to justice . \nthey are under the express orders of sir roger daley ( bernard bresslaw ) . \ntheir intellect does not count for much and when they increasingly become suspicious of reverend flasher aka dick turpin ( sid james ) , whom they confided in earlier , they still cannot believe that the rector has any part in these robberies . \nhowever , once they catch harriett and put her in jail , and

We will write a simple display formatter to make our output look nice

In [14]:
from IPython.display import display, HTML


RED = 'red'
GREEN = 'green'


def format_color(value, color):
    return '<span style="color:{color};">{value}</span>'.format(
        color=color,
        value=value
    )


def display_record(record):
    value = record.cornell
    if value == 'neg':
        color = RED
    elif value == 'pos':
        color = GREEN
    else:
        raise ValueError(value)
    display(HTML('cornell: ' + format_color(value, color)))
    
    value = record.vader
    color = RED if value < 0 else GREEN
    display(HTML('vader: ' + format_color(value, color)))

    value = record.my
    if value is not None:
        color = RED if value < 0 else GREEN
        display(HTML('my: ' + format_color(value, color)))
    
    print(record.text)

    
display_record(records[0])

 " the blair witch project " was perhaps one of a kind , a unique film that played completely on its own merit , managing to scare even the most experienced horror fans out of their senses . 
its success made a sequel inevitable , but this is not the sequel , i suspect , anyone much wanted . 
after the release of " the blair witch project " , tourists have practically invaded the small town of burkettsville , in order to get a glimpse of the blair witch . 
locals have turned this mass hysteria into a great business opportunity , selling twig-sculptures , stones and dirt like those in the movie , and the exasperated local sheriff patrols the woods with a bullhorn , shouting , " get out of these woods and go home ! 
there is no goddamned blair witch ! " . 
jeff ( ) is one of those people , who has used the sudden popularity of the small town to his advantage . 
after he got released from the mental institution , he created a mobile business that attracts thousands of customers through th

## Assemble our annotator
Now we can assemble our checker using `ipyannotate`. For this task, we will show the user the model-evaluated sentiment, and let them override it with `+1`, `0` and `-1` buttons, which will modify the annotation tasks.

In [15]:
from ipyannotate.buttons import ValueButton as Button, NextButton, BackButton
from ipyannotate.toolbar import Toolbar
from ipyannotate.tasks import Task, Tasks
from ipyannotate.canvas import OutputCanvas
from ipyannotate.annotation import Annotation


def callback(button):
    annotation.tasks.current.output.my = button.value


tasks = Tasks(Task(_) for _ in records[:100])

pos = Button(1, shortcut='1', color='green')
neu = Button(0, shortcut='2', color='gray')
neg = Button(-1, shortcut='3', color='red')

for button in [pos, neu, neg]:
    button.on_click(callback)

buttons = [pos, neu, neg, BackButton(shortcut='j'), NextButton(shortcut='k')]
toolbar = Toolbar(buttons)

canvas = OutputCanvas(display=display_record)

annotation = Annotation(toolbar, tasks, canvas=canvas)
annotation

# annotation.tasks

In [16]:
annotation.tasks[:10]

[Task(output=Record(id='285', text=' " the blair witch project " was perhaps one of a kind , ..., value=1),
 Task(output=Record(id='1243', text='this is the last carry on film with its almost intact re..., value=0),
 Task(output=Record(id='551', text="you've got mail works alot better than it deserves to . \..., value=1),
 Task(output=Record(id='915', text='a slight romantic comedy with a feminist bent , but one w..., value=0),
 Task(output=Record(id='1870', text='scream 2 has a titillating little scene that lays down t..., value=1),
 Task(output=Record(id='510', text='quiz show , an almost perfectly accurate true story , is ..., value=0),
 Task(output=Record(id='109', text='i must admit i\'m going to be a bit biased in my review o..., value=None),
 Task(output=Record(id='1375', text="ahh yes . \nthe teenage romance . \nan attractive young ..., value=None),
 Task(output=Record(id='541', text="let me start off by saying that leading up to the release..., value=None),
 Task(output=Record