In [1]:
import spacy 
  
nlp = spacy.load('en_core_web_sm') 

In [2]:
sentence = "Apple is looking at buying U.K. startup for $1 billion"
  
doc = nlp(sentence) 
  
for ent in doc.ents: 
    print(ent.text, ent.start_char, ent.end_char, ent.label_) 
    print('\t', sentence[ent.start_char : ent.end_char]) # the start_char and end_char are offsets

Apple 0 5 ORG
	 Apple
U.K. 27 31 GPE
	 U.K.
$1 billion 44 54 MONEY
	 $1 billion


### Capitalization

Some old version of spaCy is sensitive to capitalization but the new version now is using deep neural network models, which is much better.

```
spaCy v2.0's Named Entity Recognition system features a sophisticated word embedding strategy using subword features and "Bloom" embeddings, a deep convolutional neural network with residual connections, and a novel transition-based approach to named entity parsing.
```

See more here: https://spacy.io/universe/project/video-spacys-ner-model

In [3]:
sentence = "apple is looking at buying U.K. startup for $1 billion"

doc = nlp(sentence)

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

U.K. 27 31 GPE
$1 billion 44 54 MONEY


### Try out different domains

In [4]:
sentence = "To celebrate Michael Jordan's 58th birthday today, we take a look back at his 58 point scoring performance vs. the Nets in February of 1987! "

doc = nlp(sentence)

for ent in doc.ents:
    print(ent.text, ent.label_)

Michael Jordan's PERSON
58th ORDINAL
today DATE
58 CARDINAL
February of 1987 DATE


In [5]:
sentence = 'Shang is teaching right now.'

doc = nlp(sentence)

for ent in doc.ents:
    print(ent.text, ent.label_)

Shang PERSON


In [9]:
sentence = 'Most of our incorrect predictions occur for words that can have multiple tags depending on the context. Words like are, the, an occur in Plots, Quotes and so on. This means our model is yet to have a really great language understanding.'

doc = nlp(sentence)

for ent in doc.ents:
    print(ent.text, ent.label_)

Plots GPE
