#### Named Entity Recognition (NER) Practice

In [7]:
import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_sm")

In [3]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [6]:
text = "Apple is looking at buying U.K. startup for $1 billion"
doc = nlp(text)
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", spacy.explain(ent.label_))

Apple  |  ORG  |  Companies, agencies, institutions, etc.
U.K.  |  GPE  |  Countries, cities, states
$1 billion  |  MONEY  |  Monetary values, including unit


In [9]:
displacy.render(doc, style="ent", jupyter=True)

In [11]:
# All Entities available
for label in nlp.get_pipe("ner").labels:
    print(label, " : ", spacy.explain(label))

CARDINAL  :  Numerals that do not fall under another type
DATE  :  Absolute or relative dates or periods
EVENT  :  Named hurricanes, battles, wars, sports events, etc.
FAC  :  Buildings, airports, highways, bridges, etc.
GPE  :  Countries, cities, states
LANGUAGE  :  Any named language
LAW  :  Named documents made into laws.
LOC  :  Non-GPE locations, mountain ranges, bodies of water
MONEY  :  Monetary values, including unit
NORP  :  Nationalities or religious or political groups
ORDINAL  :  "first", "second", etc.
ORG  :  Companies, agencies, institutions, etc.
PERCENT  :  Percentage, including "%"
PERSON  :  People, including fictional
PRODUCT  :  Objects, vehicles, foods, etc. (not services)
QUANTITY  :  Measurements, as of weight or distance
TIME  :  Times smaller than a day
WORK_OF_ART  :  Titles of books, songs, etc.


Custome Entities with SpaCy

In [14]:
text2 = "X is looking to acquire Tesla for $2 million"
doc2 = nlp(text2)
for ent in doc2.ents:
    print(ent.text, " | ", ent.label_, " | ", spacy.explain(ent.label_))

# X is not recognized as an entity

Tesla  |  ORG  |  Companies, agencies, institutions, etc.
$2 million  |  MONEY  |  Monetary values, including unit


In [17]:
from spacy.tokens import Span
sub_part = doc2[0:1]
type(sub_part)

spacy.tokens.span.Span

In [19]:
tag = Span(doc2, 0, 1, label="ORG")
doc2.set_ents([tag], default="unmodified")

In [21]:
for ent in doc2.ents:
    print(ent.text, " | ", ent.label_, " | ", spacy.explain(ent.label_))

# Now X is recognized as ORG

X  |  ORG  |  Companies, agencies, institutions, etc.
Tesla  |  ORG  |  Companies, agencies, institutions, etc.
$2 million  |  MONEY  |  Monetary values, including unit
