In [1]:
import spacy
from spacy import displacy

In [2]:
nlp = spacy.load("en_core_web_sm")

# Example 1 - NER identification

In [3]:
doc = nlp("Musk sold Tesla shares for $3.6 billion")

In [4]:
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", spacy.explain(ent.label_))

Tesla  |  ORG  |  Companies, agencies, institutions, etc.
$3.6 billion  |  MONEY  |  Monetary values, including unit


# Visualize entities
Visualizing a named entities in a text can be incredibly helpful in speeding up development and debugging your code and training process. That’s why our popular visualizers, displaCy and displaCy ENT are also an official part of the core library. If you’re running a Jupyter notebook, displaCy will detect this and return the markup in a format ready to be rendered and exported.

In [5]:
displacy.render(doc, style="ent")

# List down all the entities
List of the supported entities in the en_core_web_sm model

In [6]:
nlp.pipe_labels['ner']

['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']

# Example 2 - NER identification
In this example we can that Bloomberg is categorized differently

In [7]:
doc = nlp("Michael Bloomberg founded Bloomberg Inc in 1982")
for ent in doc.ents:
    print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Michael Bloomberg | PERSON | People, including fictional
Bloomberg Inc | ORG | Companies, agencies, institutions, etc.
1982 | DATE | Absolute or relative dates or periods


In [8]:
doc = nlp("Musk sold Tesla Inc shares for $3.6 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", ent.start_char, "|", ent.end_char)

Tesla Inc  |  ORG  |  10 | 19
$3.6 billion  |  MONEY  |  31 | 43


# Setting custom entities
Can you override the model logic and add your specific logic on top?

In [9]:
# Twitter is not recognized
doc = nlp("Tesla is going to acquire twitter for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_)

Tesla  |  ORG
$45 billion  |  MONEY


Set variable s to be tokens (2, 3, 4), the index starts from 0

In [10]:
s = doc[2:5]
s

going to acquire

The type of variable retrieved is Span

In [11]:
type(s)

spacy.tokens.span.Span

In [12]:
from spacy.tokens import Span

# Span (doc: Doc, start: int, end: int, label )
s1 = Span(doc, 0, 1, label="ORG")
s2 = Span(doc, 5, 6, label="ORG")

doc.set_ents([s1, s2], default="unmodified")

In [13]:
for ent in doc.ents:
    print(ent.text, " | ", ent.label_)

Tesla  |  ORG
twitter  |  ORG
$45 billion  |  MONEY
