# Named Entity Recognition

Named Entity Recognition (NER) in NLP is a technique to find and classify "named entities" (like people, places, organizations, dates, money) in text into pre-defined categories, turning unstructured text into structured data for better understanding, information extraction, and tasks like search and summarization. It involves identifying the entity (e.g., "Apple") and then classifying it (e.g., "Organization").

# How NER Works

Identification: Finds words or phrases that are names of specific things (e.g., "Barack Obama", "Paris", "Google").
Classification: Assigns these identified entities to specific categories, such as:

**PER**: Person (e.g., Barack Obama)

**ORG**: Organization (e.g., Google)

**LOC**: Location (e.g., Paris)

**DATE/TIME**: Dates, times, durations

**MONEY**: Monetary values

**GPE**: Geopolitical Entity (e.g., France)

Context is Key: Models use context (like surrounding words) to resolve ambiguities, such as distinguishing "Apple" the company from "apple" the fruit.

# Why it's Important (Use Cases)
**Information Extraction**: Pulls key data from documents.

**Search Engines**: Improves search relevance by understanding entities in queries.

**Customer Service**: Analyzes feedback to identify product names, locations, or customer issues.

**Text Summarization & QA**: Helps systems understand text structure for better summaries and answers.

#Spacy
spaCy is an open-source Python library designed for industrial-strength Natural Language Processing (NLP), emphasizing speed, efficiency, and production-readiness. It's widely used for building real-world applications that process and "understand" large volumes of text data.

In [3]:
pip install spacy

Collecting spacy
  Downloading spacy-3.8.11-cp313-cp313-win_amd64.whl.metadata (28 kB)
Collecting spacy-legacy<3.1.0,>=3.0.11 (from spacy)
  Downloading spacy_legacy-3.0.12-py2.py3-none-any.whl.metadata (2.8 kB)
Collecting spacy-loggers<2.0.0,>=1.0.0 (from spacy)
  Downloading spacy_loggers-1.0.5-py3-none-any.whl.metadata (23 kB)
Collecting murmurhash<1.1.0,>=0.28.0 (from spacy)
  Downloading murmurhash-1.0.15-cp313-cp313-win_amd64.whl.metadata (2.3 kB)
Collecting cymem<2.1.0,>=2.0.2 (from spacy)
  Downloading cymem-2.0.13-cp313-cp313-win_amd64.whl.metadata (9.9 kB)
Collecting preshed<3.1.0,>=3.0.2 (from spacy)
  Downloading preshed-3.0.12-cp313-cp313-win_amd64.whl.metadata (2.6 kB)
Collecting thinc<8.4.0,>=8.3.4 (from spacy)
  Downloading thinc-8.3.10-cp313-cp313-win_amd64.whl.metadata (15 kB)
Collecting wasabi<1.2.0,>=0.9.1 (from spacy)
  Downloading wasabi-1.1.3-py3-none-any.whl.metadata (28 kB)
Collecting srsly<3.0.0,>=2.4.3 (from spacy)
  Downloading srsly-2.5.2-cp313-cp313-win_am

In [8]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
      --------------------------------------- 0.3/12.8 MB ? eta -:--:--
      --------------------------------------- 0.3/12.8 MB ? eta -:--:--
     - ------------------------------------- 0.5/12.8 MB 530.5 kB/s eta 0:00:24
     - ------------------------------------- 0.5/12.8 MB 530.5 kB/s eta 0:00:24
     - ------------------------------------- 0.5/12.8 MB 530.5 kB/s eta 0:00:24
     - ------------------------------------- 0.5/12.8 MB 530.5 kB/s eta 0:00:24
     -- ------------------------------------ 0.8/12.8 MB 371.7 kB/s eta 0:00:33
     -- ------------------------------------ 0.8/12.8 MB 371.7 k

In [9]:
import spacy

In [10]:
nlp = spacy.load('en_core_web_sm')

In [11]:
text = "Virat Kohli was born in Delhi and playes cricket for India"
text = "Mary from the HR department said that The Ritz London was a great hotel option to stay in London"

In [12]:
doc = nlp(text)

In [13]:
doc.ents

(Mary, The Ritz London, London)

In [14]:
for entity in doc.ents:
  print('Entity: ', entity.text)
  print("Type :", entity.label_)


Entity:  Mary
Type : PERSON
Entity:  The Ritz London
Type : ORG
Entity:  London
Type : GPE


In [15]:
import spacy

# Load the pre-trained English model
nlp = spacy.load("en_core_web_sm")

text = "Apple is looking at buying U.K. startup for $1 billion. Tim Cook announced this in London yesterday."

# Process the text with the model
doc = nlp(text)

# Iterate over the recognized entities and print their text and label
print("Named Entities:")
for ent in doc.ents:
    print(f"- Entity: {ent.text} | Label: {ent.label_} | Explanation: {spacy.explain(ent.label_)}")

Named Entities:
- Entity: Apple | Label: ORG | Explanation: Companies, agencies, institutions, etc.
- Entity: U.K. | Label: GPE | Explanation: Countries, cities, states
- Entity: $1 billion | Label: MONEY | Explanation: Monetary values, including unit
- Entity: Tim Cook | Label: PERSON | Explanation: People, including fictional
- Entity: London | Label: GPE | Explanation: Countries, cities, states
- Entity: yesterday | Label: DATE | Explanation: Absolute or relative dates or periods
