# Text Processing with Named Entity Recognition (NER)

## What is Named Entity Recognition (NER)?

Named Entity Recognition (NER) is a natural language processing technique that identifies and classifies named entities in text into predefined categories. It's the process of finding and categorizing key information (entities) in unstructured text.

### Common Named Entity Categories:

- **PERSON**: Names of people
  - Examples: "John Smith", "Marie Curie", "Albert Einstein"
- **ORGANIZATION (ORG)**: Companies, institutions, government agencies
  - Examples: "Apple Inc.", "Harvard University", "United Nations"
- **LOCATION (LOC)**: Geographic locations
  - Examples: "New York", "Paris", "Mount Everest"
- **GEOPOLITICAL ENTITY (GPE)**: Countries, cities, states
  - Examples: "United States", "California", "London"
- **DATE**: Dates and time expressions
  - Examples: "January 1st, 2023", "next Monday", "2020s"
- **TIME**: Time expressions
  - Examples: "3:30 PM", "morning", "midnight"
- **MONEY**: Monetary values
  - Examples: "$100", "€50", "ten dollars"
- **PERCENT**: Percentage values
  - Examples: "25%", "half", "three quarters"
- **FACILITY**: Buildings, airports, highways, bridges
  - Examples: "Golden Gate Bridge", "JFK Airport"
- **PRODUCT**: Objects, vehicles, foods, etc.
  - Examples: "iPhone", "Toyota Camry", "Coca-Cola"

### Why is NER Important?

NER is crucial for:
- **Information extraction**: Automatically finding key facts in documents
- **Content categorization**: Organizing text by entities mentioned
- **Search and retrieval**: Improving search results by entity matching
- **Knowledge graphs**: Building structured knowledge from unstructured text
- **Question answering**: Understanding what entities a question refers to
- **Document summarization**: Highlighting important entities in summaries
- **Privacy protection**: Identifying and masking personal information
- **News analysis**: Tracking mentions of people, organizations, and locations

In [None]:
import spacy
from spacy import displacy
from spacy import tokenizer
import re

In [None]:
nlp = spacy.load("en_core_web_sm")

In [None]:
# add me some text with entitiesto test the NER
text = """Apple is looking at buying U.K. startup for $1 billion. 
Google is also interested in acquiring the startup. 
Microsoft has already acquired a similar startup in the U.K. for $500 million. 
Amazon is considering a partnership with the startup to expand its services in Europe."""

In [None]:
spacy_doc = nlp(text)
# store it in a panda dataframe
import pandas as pd
df = pd.DataFrame([(ent.text, ent.label_) for ent in spacy_doc.ents], columns=["Entity", "Label"])
print(df)

In [None]:
displacy.render(spacy_doc, style="ent", jupyter=True)