<a href="https://colab.research.google.com/github/nouraoaldawsari/T5/blob/main/Noura_NER.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Named Entity Recognition (NER)

Using spaCy for NER
1. **Named Entities**: Entities that refer to specific objects or people, e.g., "Barack Obama", "Google", "New York".
2. **Categories**: Common categories include PERSON, ORGANIZATION, LOCATION, DATE, TIME, and others.
3. **NER Models**: NER models can be based on rule-based systems, machine learning, or deep learning techniques.

## Example: Using spaCy for NER


In [2]:
import spacy

# Load the spaCy model


In [3]:
nlp = spacy.load("en_core_web_sm")

In [4]:
# Example text
text = "Apple is looking at buying U.K. startup for $1 billion. Tim Cook is the CEO of Apple."

# Process the text with spaCy
doc = nlp(text)

# Extract named entities
for ent in doc.ents:
    print(ent.text, ent.label_)


Apple ORG
U.K. GPE
$1 billion MONEY
Tim Cook PERSON
Apple ORG


In [5]:
text = "Elon Musk is the CEO of Tesla and SpaceX. He was born in Pretoria, South Africa."

# Process the text with spaCy
doc = nlp(text)

# Extract named entities
for ent in doc.ents:
    print(ent.text, ent.label_) #GPE stands for == geopolitical entities, such as countries, cities, and states


Elon Musk PERSON
Tesla ORG
Pretoria GPE
South Africa GPE


In [6]:
from spacy.training import Example

# Custom NER training data


# Custom NER training data


In [59]:
TRAIN_DATA = [
    ("Tesla is an electric car manufacturer.", {"entities": [(0, 5, "COMPANY")]}),
    ("SpaceX is a space exploration company.", {"entities": [(0, 6, "COMPANY")]}),
]

In [60]:
# Load a blank model
nlp = spacy.blank("en")

In [61]:
# Create a new NER component and add it to the pipeline
ner = nlp.add_pipe("ner")

In [62]:
# Add new label to the NER component
ner.add_label("COMPANY")

1

# Training the NER model


In [63]:
optimizer = nlp.begin_training()
for epoch in range(10): #Torch >> as a base model
    losses = {}
    for text, annotations in TRAIN_DATA:
        doc = nlp.make_doc(text)
        example = Example.from_dict(doc, annotations)
        nlp.update([example], drop=0.5, losses=losses)
    print (f"Losses at epoch {epoch+1}  : {losses}")

Losses at epoch 1  : {'ner': 11.439644813537598}
Losses at epoch 2  : {'ner': 10.530448853969574}
Losses at epoch 3  : {'ner': 9.239084005355835}
Losses at epoch 4  : {'ner': 8.073611408472061}
Losses at epoch 5  : {'ner': 7.030029803514481}
Losses at epoch 6  : {'ner': 6.008567675948143}
Losses at epoch 7  : {'ner': 4.7828566282987595}
Losses at epoch 8  : {'ner': 3.771090354770422}
Losses at epoch 9  : {'ner': 3.0688527692109346}
Losses at epoch 10  : {'ner': 3.091960904188454}


# TODO :: Test the trained model


In [68]:
test_text = "Tesla is expanding its factory in Nevada."
doc = nlp(test_text)
for ent in doc.ents:
    print(ent.text, ent.label_)
