#Named Entity Recognition (NER)

Using spaCy for NER
1. **Named Entities**: Entities that refer to specific objects or people, e.g., "Barack Obama", "Google", "New York".
2. **Categories**: Common categories include PERSON, ORGANIZATION, LOCATION, DATE, TIME, and others.
3. **NER Models**: NER models can be based on rule-based systems, machine learning, or deep learning techniques.

## Example: Using spaCy for NER


In [1]:
import spacy


# Load the spaCy model


In [2]:
nlp = spacy.load("en_core_web_sm")

In [3]:
# Example text
text = "Apple is looking at buying U.K. startup for $1 billion. Tim Cook is the CEO of Apple."

# Process the text with spaCy
doc = nlp(text)

# Extract named entities
for ent in doc.ents:
    print(ent.text, ent.label_)


Apple ORG
U.K. GPE
$1 billion MONEY
Tim Cook PERSON
Apple ORG


In [4]:
text = "Elon Musk is the CEO of Tesla and SpaceX. He was born in Pretoria, South Africa."

# Process the text with spaCy
doc = nlp(text)

# Extract named entities
for ent in doc.ents:
    print(ent.text, ent.label_)


Elon Musk PERSON
Tesla ORG
Pretoria GPE
South Africa GPE


In [5]:
from spacy.training import Example


# Custom NER training data


# Custom NER training data


In [6]:
TRAIN_DATA = [
    ("Tesla is an electric car manufacturer.", {"entities": [(0, 5, "COMPANY")]}),
    ("SpaceX is a space exploration company.", {"entities": [(0, 6, "COMPANY")]}),
]

In [7]:
# Load a blank model
nlp = spacy.blank("en")

In [8]:
# Create a new NER component and add it to the pipeline
ner = nlp.add_pipe("ner")

In [9]:
# Add new label to the NER component
ner.add_label("COMPANY")

1

# Training the NER model


In [10]:
optimizer = nlp.begin_training()
for epoch in range(10):
    for text, annotations in TRAIN_DATA:
        doc = nlp.make_doc(text)
        example = Example.from_dict(doc, annotations)
        nlp.update([example], drop=0.5, losses={})

# TODO :: Test the trained model


In [11]:
text = "Elon Musk is the CEO of Tesla and SpaceX."

# Process the text with spaCy
doc = nlp(text)

# Extract named entities
for ent in doc.ents:
  print(ent.text, ent.label_)