# 1. Basics of NER

Named Entity Recognition is a subtask of information extraction that classify named entities into pre-defined categories such as names of persons, organizations, locations

spaCy features an extremely fast statistical entity recognition system, that assigns labels to contiguous spans of tokens

The default model identifies a variety of named and numeric entities, including companies, locations, organizations and products


In [34]:
# officaial documentation 
# https://spacy.io/usage/linguistic-features/#named-entities

In [35]:
import spacy
nlp = spacy.load(name='en_core_web_sm')

In [36]:
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

In [37]:
doc.ents

(Apple, U.K., $1 billion)

In [38]:
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_, str(spacy.explain(ent.label_)))

Apple 0 5 ORG Companies, agencies, institutions, etc.
U.K. 27 31 GPE Countries, cities, states
$1 billion 44 54 MONEY Monetary values, including unit


In [39]:
doc_2 = nlp("San Francisco considers banning sidewalk delivery robots")

In [40]:
for ent in doc_2.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_, str(spacy.explain(ent.label_)))

San Francisco 0 13 GPE Countries, cities, states


# 2. Adding Named Entity to Span

In [43]:
doc_3 = nlp("My_own_company is hiring a new vice president in U.S.")

In [44]:
for ent in doc_3.ents:
    print(ent.text, ent.label_, str(spacy.explain(ent.label_)))

# it doesn't recognize My_own_company at this point. 
# How to add it.

U.S. GPE Countries, cities, states


In [45]:
from spacy.tokens import Span

In [46]:
# Get the hash value of ORG entity label
ORG = doc_3.vocab.strings['ORG']
print(ORG)

383


In [47]:
# Create a Span for new entity
new_ent = Span(doc_3, 0, 1, label=ORG)
# Index locations from 0 to 1 (excludes 1)

In [48]:
# Add the entity to the existing Doc object
doc_3.ents = list(doc_3.ents) + [new_ent]

In [49]:
for ent in doc_3.ents:
    print(ent.text, ent.label_, str(spacy.explain(ent.label_)))

My_own_company ORG Companies, agencies, institutions, etc.
U.S. GPE Countries, cities, states


# 3. Visualizing Named Entities

In [50]:
from spacy import displacy

In [51]:
displacy.render(docs=doc_3,style='ent',jupyter=True)

In [52]:
# Highlighting just certain entities

displacy.render(docs=doc_3,style='ent',jupyter=True, options={'ents':['ORG']})