## NER
- Name Entity Recognition
- Objective : extract entities from text such as person, company, location, currency

### Use cases
- Google search : not just search name in text (such as Tesla can be person name or company name based on context)
- Tag creation
- Recommendation : suggest articles based on personal preference
- Tech support classification : recognize the related topic and send issues to the appropriate team

In [1]:
import spacy

In [2]:
nlp = spacy.load("en_core_web_sm")
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [3]:
doc = nlp("Tesla Inc is going to acquire twitter for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_, " | ", spacy.explain(ent.label_))

Tesla Inc  |  ORG  |  Companies, agencies, institutions, etc.
$45 billion  |  MONEY  |  Monetary values, including unit


In [4]:
from spacy import displacy

displacy.render(doc, style = "ent")

In [5]:
nlp.pipe_labels['ner']

['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']

In [6]:
doc = nlp("Michael Bloomberg founded Bloomberg in 1982")
for ent in doc.ents:
    print(ent.text, "|", ent.label_, "|", spacy.explain(ent.label_))

Michael Bloomberg | PERSON | People, including fictional
Bloomberg | GPE | Countries, cities, states
1982 | DATE | Absolute or relative dates or periods


Notice that the model is not perfect. Thus, sometimes we have to customize it.

### Setting custom entities

In [7]:
doc = nlp("Tesla is going to acquire Twitter for $45 billion")
for ent in doc.ents:
    print(ent.text, " | ", ent.label_)

Tesla  |  ORG
Twitter  |  PERSON
$45 billion  |  MONEY


In [8]:
from spacy.tokens import Span

s1 = Span(doc, 5,6, label = "ORG")

doc.set_ents([s1], default = "unmodified")

In [9]:
for ent in doc.ents:
    print(ent.text, " | ", ent.label_)

Tesla  |  ORG
Twitter  |  ORG
$45 billion  |  MONEY


## NER buildig approach
1. Hard code : create list of name entity such as Company, Name and just match text.
2. Rule based : Such as if we found the pattern (xxx)-xxx-xxxx, then it is the phone number.
3. Machine learning : Such as CRF and BERT.