# NER Introduction

Named Entity Recognition (NER) is a natural language processing technique that identifies and classifies named entities in text into predefined categories such as person names, organizations, locations, dates, and more. NER is crucial for various applications, including information extraction, question answering, and text summarization.

# How NER Works

NER typically involves the following steps:
1. Tokenization: Breaking down the text into individual words or tokens.
2. Feature extraction: Analyzing the properties of tokens, including morphological, syntactic, and semantic features.
3. Entity detection: Identifying potential named entities in the text.
4. Entity classification: Assigning the detected entities to specific categories.

# Implementation with SpaCy

spaCy is a popular Python library for NLP tasks, including NER.
Below is an example of how to perform NER using spaCy.

In [None]:
import spacy

# Load the English language model
nlp = spacy.load("en_core_web_sm")

# Sample text
text = "Apple Inc. was founded by Steve Jobs in Cupertino, California on April 1, 1976."

# Process the text
doc = nlp(text)

# Extract named entities
for ent in doc.ents:
    print(f"Entity: {ent.text}, Type: {ent.label_}")

# Entity Types

Common entity types recognized by NER models include:
1. PERSON: Names of individuals
2. ORG: Organizations, companies, institutions
3. GPE: Geopolitical entities (countries, cities, states)
4. LOC: Non-GPE locations (mountains, water bodies)
5. DATE: Dates or periods
6. TIME: Times
7. MONEY: Monetary values
8. PRODUCT: Products or objects

# Customizing NER

For domain-specific applications, you can train custom NER models or extend existing ones. 
Here is an example of how to add a custom entity type to spaCy.

In [None]:
import spacy
from spacy.tokens import Doc
from spacy.training import Example

# Create a blank English model
nlp = spacy.blank("en")

# Add the entity recognizer to the pipeline
ner = nlp.add_pipe("ner")

# Add a new label to the entity recognizer
ner.add_label("PRODUCT")

# Training data
train_data = [
    ("Apple released the new iPhone 12.", {"entities": [(20, 30, "PRODUCT")]}),
    ("The MacBook Pro is a popular laptop.", {"entities": [(4, 14, "PRODUCT")]})
]

# Train the model
optimizer = nlp.begin_training()
for _ in range(20):
    for text, annotations in train_data:
        example = Example.from_dict(nlp.make_doc(text), annotations)
        nlp.update([example], sgd=optimizer)

# Test the custom NER model
test_text = "I love my new iPhone 12 and MacBook Pro."
doc = nlp(test_text)
for ent in doc.ents:
    print(f"Entity: {ent.text}, Type: {ent.label_}")

This example demonstrates how to add a custom "PRODUCT" entity type and train a simple NER model to recognize it.

# Evaluation

NER models are typically evaluated using metrics such as precision, recall, and F1-score. The CoNLL-2003 dataset is a common benchmark for assessing NER performance3.

By leveraging pre-trained models and customizing them for specific domains, NER can be a powerful tool for extracting structured information from unstructured text, enabling various downstream NLP applications.