## Example notebook
In this notebook we show 3 different ways to you the spacy annotator:   

    - annotation without spaCy model   
    - annotations with spaCy model   
    - annotations with spaCy model and entityRuler   
    
Enjoy :)

In [1]:
# python -m pip install -e .

In [2]:
import sys
sys.path.append('../')

In [3]:
import pandas as pd
import spacy_annotator as spa

## Import data

In [4]:
df = pd.DataFrame({
    "text": [
        "New york is lovely, Milan is nice, but london is amazing!",
        "Stockholm is too cold. Ingrid Bergman says so."
    ]})

df

Unnamed: 0,text
0,"New york is lovely, Milan is nice, but london ..."
1,Stockholm is too cold. Ingrid Bergman says so.


## Annotation _without_ spaCy model
Basic implementation of the spacy annotator. The user input labels and entities manually.

In [5]:
annotator = spa.Annotator(labels=["GPE", "PERSON"])

In [6]:
annotator.instructions


            [1mInstructions[0m 

            For each entity type, input must be a DELIMITER separated string. 

            If no entities in text, leave as is and press submit.
            Similarly, if no entities for a particular label, leave as is. 

            Buttons: 

            	 * submit inserts new annotation (or overwrites existing one if one is present). 

            	 * skip moves forward and leaves empty string (or existing annotation if one is present). 

            	 * finish terminates the annotation session.
            


In [7]:
df_labels = annotator.annotate(df=df, col_text="text")

HTML(value='-1 examples annotated, 3 examples left')

Text(value='', description='GPE', layout=Layout(width='auto'), placeholder='ent one, ent two, ent three')

Text(value='', description='PERSON', layout=Layout(width='auto'), placeholder='ent one, ent two, ent three')

HBox(children=(Button(button_style='success', description='submit', style=ButtonStyle()), Button(button_style=…

Output()

### Inspect output

In [8]:
df_labels

Unnamed: 0,text,annotations
0,"New york is lovely, Milan is nice, but london ...",
1,Stockholm is too cold. Ingrid Bergman says so.,


## Annotation _with_ spaCy model
Use the small, medium, large spaCy model or even transformers to label you data

In [9]:
import spacy

In [10]:
nlp = spacy.load("en_core_web_trf")

In [11]:
annotator = spa.Annotator(labels=["GPE", "PERSON"], model=nlp)

In [12]:
df_labels = annotator.annotate(df=df, col_text="text", shuffle=True)

HTML(value='-1 examples annotated, 3 examples left')

Text(value='', description='GPE', layout=Layout(width='auto'), placeholder='ent one, ent two, ent three')

Text(value='', description='PERSON', layout=Layout(width='auto'), placeholder='ent one, ent two, ent three')

HBox(children=(Button(button_style='success', description='submit', style=ButtonStyle()), Button(button_style=…

Output()

### Inspect output

In [13]:
df_labels

Unnamed: 0,text,annotations
0,"New york is lovely, Milan is nice, but london ...",
1,Stockholm is too cold. Ingrid Bergman says so.,


## Annotation _with_ spaCy model _and_ EntityRuler
Use a combinations of spaCy models and entity ruler patters to label those entities that even a large model might miss

In [14]:
patterns = [
    {"label": "GPE", "pattern": "london"}, # this one isn't picked up by "ner"
    {"label": "GPE", "pattern": "Stockholm"},
    {"label": "PERSON", "pattern": "Humphrey Bogart"},
]

In [15]:
ruler = nlp.add_pipe("entity_ruler", config={"phrase_matcher_attr": "LOWER"}, before="ner")
ruler.add_patterns(patterns)



In [16]:
annotator = spa.Annotator(labels=["GPE", "PERSON"], model=nlp)

In [21]:
df_labels = annotator.annotate(df=df, col_text="text", shuffle=True)

HTML(value='-1 examples annotated, 3 examples left')

Text(value='', description='GPE', layout=Layout(width='auto'), placeholder='ent one, ent two, ent three')

Text(value='', description='PERSON', layout=Layout(width='auto'), placeholder='ent one, ent two, ent three')

HBox(children=(Button(button_style='success', description='submit', style=ButtonStyle()), Button(button_style=…

Output()

### Inspect output and save dataframe of annotations to .spacy format for training in Spacy3 pipeline.

In [22]:
df_labels

Unnamed: 0,text,annotations
0,Stockholm is too cold. Ingrid Bergman says so.,(Stockholm is too cold. Ingrid Bergman says so...
1,"New york is lovely, Milan is nice, but london ...","(New york is lovely, Milan is nice, but london..."


In [25]:
# saves to current working directory with the default name 'annotations.spacy'
annotator.to_spacy(df_labels)



Spacy file saved to: spacy_labels.spacy


<spacy.tokens._serialize.DocBin at 0x1cb038637f0>

In [None]:
# saves to current working directory
annotator.to_spacy(df_labels, "spacy_labels.spacy")

In [None]:
# saves to a specified directory
annotator.to_spacy(df_labels, "C:\pick_your_directory")

In [None]:
#fin