# HUMANNOTATOR
*Example notebook*  
  
Build easy custom annotators for your Jupyter/pandas workflow!

In [1]:
import sys
sys.path.insert(0, '../')
from humannotator import Annotator, task_factory, load_data
import pandas as pd

---
### Load the data
You can pass a `list`, `dict`, `Series` or `DataFrame` object into the Annotator.  
Here we will load a dataframe.

In [2]:
df = pd.read_csv('news.csv', index_col=0)
data = load_data(df, item_cols=['title', 'date'], id_col='news_id')

### Set up some tasks
We can create tasks using the `task_factory`.  
Lastly, we instantiate the annotator by passing it our annotation tasks and data.

In [3]:
choices={
    '0': 'not adverse media',
    '1': 'adverse media',
    '3': 'exclude from dataset',
}
instruct = "Is the topic political?"
task1 = task_factory(choices, 'Adverse media')
task2 = task_factory('bool', 'Political', instruction=instruct, nullable=True) 

annotator = Annotator([task1, task2], data)

---
### Run the annotator by calling it
The annotator keeps track of where you were.  
Pass the annotator a list of ids if you only want to annotate specific records.  
You can exit the annotator and it will continue where you left of when you run it again.

In [4]:
annotator()

We can use the highlighter to highlight specific phrases.  
Pass 'highlight_text' as a key-word argument to the annotator call to do so.  
Alternatively, we could have instantiated the annotator with the 'highlight_text' argument.

In [5]:
annotator(phrases=['Breitbart', 'media'], flags=2)

---
### Access your annotations
The annotations are stored in a dataframe.

In [6]:
annotator.annotated

Unnamed: 0,Adverse media,Political,timestamp
052632_2015-02-28,not adverse media,True,2019-09-05 01:06:57.401941
071607_2016-12-12,adverse media,False,2019-09-05 01:06:59.951821
141694_2016-02-10,exclude from dataset,True,2019-09-05 01:07:02.669965
137157_2017-02-09,exclude from dataset,True,2019-09-05 01:07:11.014806
034187_2016-09-27,not adverse media,True,2019-09-05 01:07:14.744201
018678_2017-04-23,not adverse media,True,2019-09-05 01:07:18.130129
120386_2016-11-14,exclude from dataset,True,2019-09-05 01:07:21.944958
135236_2016-11-10,not adverse media,,2019-09-05 01:07:29.129704
184514_2017-03-17,adverse media,False,2019-09-05 01:07:33.138284
106098_2017-06-02,exclude from dataset,,2019-09-05 01:07:39.384583


### Merge your annotations with the data

In [7]:
annotator.merged()

Unnamed: 0_level_0,DATA,DATA,DATA,ANNOTATIONS,ANNOTATIONS,ANNOTATIONS
Unnamed: 0_level_1,title,date,text,Adverse media,Political,timestamp
052632_2015-02-28,Rand Paul wins 2015 CPAC straw poll,2015-02-28,[Washington (CNN)Sen. Rand Paul won the Conser...,not adverse media,True,2019-09-05 01:06:57.401941
071607_2016-12-12,Can Singing Mice Reveal the Roots of Human Spe...,2016-12-12,"[One chilly day in February 1877, a British co...",adverse media,False,2019-09-05 01:06:59.951821
141694_2016-02-10,Dollar hits 15-month low against yen after Yel...,2016-02-10,The dollar fell to a 15-month low against the...,exclude from dataset,True,2019-09-05 01:07:02.669965
137157_2017-02-09,Trump's Supreme Court pick dispirited by presi...,2017-02-09,"Donald Trump's Supreme Court nominee, Neil Go...",exclude from dataset,True,2019-09-05 01:07:11.014806
034187_2016-09-27,FULL TEXT: 10 Things Milo Hates About Islam - ...,2016-09-27,"I’m Milo Yiannopoulos, thank you for coming. T...",not adverse media,True,2019-09-05 01:07:14.744201
018678_2017-04-23,5 Border Horrors Establishment Media Mostly Ig...,2017-04-23,The brutality that comes from the open border ...,not adverse media,True,2019-09-05 01:07:18.130129
120386_2016-11-14,Crew members injured as plane avoids near coll...,2016-11-14,A Canadian airliner with 54 passengers on boar...,exclude from dataset,True,2019-09-05 01:07:21.944958
135236_2016-11-10,Bodies Of Missing Married Couple Found On Susp...,2016-11-10,[The bodies of two more presumed victims of To...,not adverse media,,2019-09-05 01:07:29.129704
184514_2017-03-17,"350 Square Feet, Two Kids, Two Cats and a Rabb...",2017-03-17,Maligned though New York’s rental market may b...,adverse media,False,2019-09-05 01:07:33.138284
106098_2017-06-02,CDC warns about deadly mushrooms amid surge in...,2017-06-02,Dangerous wild “death cap” mushrooms in Califo...,exclude from dataset,,2019-09-05 01:07:39.384583


### Save and load your data

In [9]:
annotator.save('annotator.pkl')

In [10]:
annotator2 = Annotator.load('annotator.pkl')

We can access our annotations:

In [11]:
annotator2.annotated

Unnamed: 0,Adverse media,Political,timestamp
052632_2015-02-28,not adverse media,True,2019-09-04 00:23:43.430301
071607_2016-12-12,exclude from dataset,,2019-09-04 00:23:48.755016
141694_2016-02-10,not adverse media,False,2019-09-04 00:23:57.141184
137157_2017-02-09,not adverse media,True,2019-09-04 00:24:01.263520
034187_2016-09-27,adverse media,False,2019-09-04 00:24:05.898087
018678_2017-04-23,adverse media,False,2019-09-04 00:24:11.623225


But when we try to access the data something unexpected happens:

In [13]:
annotator2.data

NO DATA LOADED
Load the data first by assigning it to the `data` property of the annotator.


By default the humannotator will not store the data when you pickle it.  
After unpickling our annotator we need to then load our data back in for it to work:

In [14]:
annotator2.data = data

Now we can continue where we left off:

In [16]:
annotator2(phrases='drone')

If you do wish to save the data with the annotator, then set the `save_data` flag to True.