# Annotating News Headlines
<img src="./screencast.gif"/>

We will load up a sample set of news headlines, tag parts of speech using `spaCy`, and create an output annotator to mark records that required a second check

In [1]:
import json
from random import seed, sample


with open('data.json') as file:
    headlines = [_['text'] for _ in json.load(file)]
    

seed(7)
sample(headlines, 10)

['Daily Report: Narendra Modi, Indian Prime Minister, Conquers Silicon Valley',
 'Tech Recruiting Clashes With Immigration Rules',
 'Start-Up Fervor Shifts to Energy in Silicon Valley',
 'To Survive, Net Start-Ups Slow Their Metabolism',
 'As New Zealand Courts Tech Talent, Isolation Becomes a Draw',
 'Wall Street and Silicon Valley Form an Uneasy Alliance',
 'Investing Early On for Insights, Not Profits',
 'A Determined Outpost of Tiny Technology',
 "As Silicon Valley Cheers Yahoo Chief, Wall Street's Reaction Is Muted",
 'Airbnb and Others Set Terms for Employees to Cash Out']

We will use the small english corpus for portability

In [2]:
# pip install spacy
import spacy


# python -m spacy download en_core_web_sm
nlp = spacy.load('en_core_web_sm')

In [3]:
docs = []
for headline in headlines:
    doc = nlp(headline)
    docs.append(doc)

In [4]:
doc = docs[0]
list(doc)

[Uber,
 ’s,
 Lesson,
 :,
 Silicon,
 Valley,
 ’s,
 Start,
 -,
 Up,
 Machine,
 Needs,
 Fixing]

In [5]:
word = doc[2]
word, word.idx

(Lesson, 7)

In [6]:
word.pos_

'PROPN'

Using `ipymarkup`, we will display the words and their corresponding parts of speech.

In [7]:
from IPython.display import display
# pip install ipymarkup
from ipymarkup import BoxLabelMarkup as Markup, Span


def display_doc(doc):
    spans = [
        Span(_.idx, _.idx + len(_), _.pos_)
        for _ in doc
    ]
    markup = Markup(doc.text, spans)
    display(markup)


display_doc(docs[0])

## Assemble our annotator
Now we can assemble our checker using `ipyannotate`. For this simple task, we will simply have `Ok` and `Check` options, but ipyannotate offers greater flexibility we could leverage to build more complex annotators.

In [8]:
from ipyannotate import annotate


annotation = annotate(docs, display=display_doc)
annotation

In [9]:
annotation.tasks[:10]

[Task(output=Uber’s Lesson: Silicon Valley’s Start-Up Machine Needs Fixing, value=None),
 Task(output=Pearl Automation, Founded by Apple Veterans, Shuts Down, value=None),
 Task(output=How Silicon Valley Pushed Coding Into American Classrooms, value=None),
 Task(output=Women in Tech Speak Frankly on Culture of Harassment, value=None),
 Task(output=Silicon Valley Investors Flexed Their Muscles in Uber Fight, value=None),
 Task(output=Uber is a Creature of an Industry Struggling to Grow Up, value=None),
 Task(output=‘The Internet Is Broken’: @ev Is Trying to Salvage It, value=None),
 Task(output=The South Park Commons Fills a Hole in the Tech Landscape, value=None),
 Task(output=The Closing of the Republican Mind, value=None),
 Task(output=Writers From the Right and Left on Trump Jr., the Future of the F.B.I., Health C..., value=None)]