## Creating a sample text on which named entity recognition will be done

In [1]:
text = "ABC Inc is going to acquire ZXY inc for $4 billion at 4:00 pm"

## Using Spacy

### Installing Necessary Libraries

In [2]:
import spacy
from spacy import displacy

### Loading spacy's en_core_web_sm model and viewing all it's available pipes and all the labels associated with 'ner'

In [3]:
spacy_model = spacy.load('en_core_web_sm')

In [4]:
spacy_model.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

In [5]:
spacy_model.pipe_labels['ner']

['CARDINAL',
 'DATE',
 'EVENT',
 'FAC',
 'GPE',
 'LANGUAGE',
 'LAW',
 'LOC',
 'MONEY',
 'NORP',
 'ORDINAL',
 'ORG',
 'PERCENT',
 'PERSON',
 'PRODUCT',
 'QUANTITY',
 'TIME',
 'WORK_OF_ART']

### Passing the text to the model

In [6]:
res = spacy_model(text)

### Viewing the entities, the label of each detected entity and the desciption of each label of the text

In [7]:
for entity in res.ents:
    print(entity.text, '|', entity.label_, '|', spacy.explain(entity.label_))

ABC Inc | ORG | Companies, agencies, institutions, etc.
ZXY inc | ORG | Companies, agencies, institutions, etc.
$4 billion | MONEY | Monetary values, including unit
4:00 pm | TIME | Times smaller than a day


### Viewing the entities present inside the text in colored form 

In [8]:
displacy.render(res, style='ent')

## Using bert-base-NER 

In [9]:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

### Initializing the Autoencoder for the model and loading the model

In [10]:
tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")

model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")

Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [11]:
bert_ner_model = pipeline('ner', model=model, tokenizer=tokenizer)

### Passing the text to the model

In [12]:
res = bert_ner_model(text)

### Viewing the entities, the label of each detected entity and the desciption of each label of the text

In [13]:
abbreviations = {
    "B-MISC": "Beginning of a miscellaneous entity right after another miscellaneous entity",
    "I-MISC": "Miscellaneous entity",
    "B-PER": "Beginning of a person’s name right after another person’s name",
    "I-PER": "Person’s name",
    "B-ORG": "Beginning of an organization right after another organization",
    "I-ORG": "Organization",
    "B-LOC": "Beginning of a location right after another location",
    "I-LOC": "Location"
}

In [14]:
for entity in res:
    print(text[entity['start']:entity['end']], '|', entity['entity'], '|', abbreviations.get(entity['entity']))

ABC | B-ORG | Beginning of an organization right after another organization
Inc | I-ORG | Organization
Z | B-ORG | Beginning of an organization right after another organization
X | I-ORG | Organization
Y | I-ORG | Organization
in | I-ORG | Organization
