# **Advanced NER Using BERT**

We will train the sequence tagging mechanism called SpaCy's NER with BERT embeddings and the SpaCy architecture.

**Import Libraries**

In [1]:
import spacy
import torch
from transformers import BertTokenizer, BertModel
import pandas as pd

**Load Dataset**

In [2]:
df = pd.read_csv('Airline_review.csv')

**Initialize BERT Tokenizer And Model**

In [3]:
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

**Initialize SpaCy Model For NER**

In [4]:
nlp = spacy.load("en_core_web_sm")

**Define Function To Get Named Entities**

Define a function to get named entities from a text using SpaCy.

In [5]:
def get_entities(text):
    doc = nlp(text)
    return [(ent.text, ent.label_) for ent in doc.ents]

**Extract And Print Named Entities**

We will extract and print named entities from the first 4 reviews in the DataFrame.

In [6]:
for i, review in df.head(4).iterrows():
    entities = get_entities(review['Review'])
    print(f"Review #{i + 1}:")
    for entity in entities:
        print(f"Entity: {entity[0]}, Label: {entity[1]}")
    print("\n")

Review #1:
Entity: Moheli, Label: GPE


Review #2:
Entity: Anjouan, Label: PERSON
Entity: AB Aviation, Label: ORG
Entity: 0900hrs, Label: CARDINAL
Entity: 1300hrs, Label: CARDINAL
Entity: only 30mins, Label: CARDINAL
Entity: Comoros, Label: PERSON


Review #3:
Entity: Anjouan, Label: PERSON
Entity: Comoros, Label: GPE
Entity: 30, Label: CARDINAL


Review #4:
Entity: Adria, Label: ORG
Entity: Munich, Label: GPE
Entity: July 2019, Label: DATE
Entity: 10 days in a row, Label: DATE
Entity: 11 days later, Label: DATE
Entity: 345, Label: CARDINAL
Entity: Frankfurt - Pristina, Label: ORG
Entity: September 2019, Label: DATE
Entity: 24 hours, Label: TIME
Entity: Adria, Label: ORG
Entity: Adria, Label: LOC


