<a href="https://colab.research.google.com/github/jeyanthan-gj/NLP-AND-LLM/blob/main/Named_Entity_Recognition_(NER)_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Named Entity Recognition (NER)
**Named Entity Recognition (NER)** is a Natural Language Processing (NLP) task that identifies and classifies **named entities** in text into predefined categories such as people, organizations, locations, dates, and quantities.

---

## What does NER do?
NER performs two main tasks:
1. **Entity Detection** ‚Äì finds relevant entities in text  
2. **Entity Classification** ‚Äì assigns a label to each entity

### Example
**Sentence:**  
‚ÄúSundar Pichai is the CEO of Google and lives in California.‚Äù

**Entities Identified:**

| Entity        | Type |
|--------------|------|
| Sundar Pichai | PERSON |
| Google        | ORGANIZATION |
| California    | LOCATION |

---
## Approaches to NER

### Rule-Based Approach
- Uses manually defined linguistic rules  
- Easy to understand but hard to scale  

### Machine Learning Approach
- Learns from labeled data  
- Common models: HMM, CRF  

### Deep Learning Approach
- Automatically learns features from data  
- Models: BiLSTM-CRF, Transformer-based models  


# Rule-Based Named Entity Recognition (NER)
Rule-based NER identifies entities using **manually defined patterns** such as regular expressions.  
This approach does not require training data and works well for **simple, fixed patterns**.

## Approach Used
- Regular expressions (`re`)
- Capitalization rules
- Keyword-based matching



In [12]:
import re

text = "Narendra Modi is the Prime Minister of India"

# Simple rule-based patterns
person_pattern = r"\b[A-Z][a-z]+ [A-Z][a-z]+\b"
location_pattern = r"\bCalifornia|India|USA\b"

persons = re.findall(person_pattern, text)
locations = re.findall(location_pattern, text)

print("Persons:", persons)
print("Locations:", locations)


Persons: ['Narendra Modi', 'Prime Minister']
Locations: ['India']


## Advantages
- Simple and fast
- No external libraries or models needed

## Limitations
- Not scalable
- Fails for complex or unseen patterns
- Requires manual rule updates


# Machine Learning‚ÄìBased Named Entity Recognition (NLTK)
This approach uses **classical NLP pipelines**:
- Tokenization
- Part-of-Speech (POS) tagging
- Named Entity Chunking

NLTK provides **pretrained statistical models** for these tasks.




In [10]:
import nltk

nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('averaged_perceptron_tagger')
nltk.download('maxent_ne_chunker')
nltk.download('words')
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('maxent_ne_chunker_tab')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping chunkers

True

In [13]:
from nltk import word_tokenize, pos_tag, ne_chunk

sentence = "MS Dhoni was the captain of Indian Cricket Team"

tokens = word_tokenize(sentence)
pos_tags = pos_tag(tokens)
ner_tree = ne_chunk(pos_tags)

print(ner_tree)


(S
  MS/NNP
  (PERSON Dhoni/NNP)
  was/VBD
  the/DT
  captain/NN
  of/IN
  (GPE Indian/JJ)
  Cricket/NNP
  Team/NN)


## Advantages
- No manual rules needed
- Better than pure rule-based systems

## Limitations
- Lower accuracy than deep learning models
- Limited contextual understanding
- Slower for large-scale applications

# Deep Learning‚ÄìBased Named Entity Recognition (spaCy)
This approach uses **neural networks and word embeddings** to perform NER.
spaCy models are trained on large annotated datasets and understand **context**.

## Model Used
- `en_core_web_sm`
- Transformer-inspired architecture (lightweight version)

## How It Works
- Text is processed as a document object
- The model predicts entity spans and labels
- Each entity contains text and a semantic label




In [4]:
!pip install -U spacy
!python -m spacy download en_core_web_sm


Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m12.8/12.8 MB[0m [31m95.1 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m‚úî Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m‚ö† Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [5]:
import spacy

nlp = spacy.load("en_core_web_sm")

text = "Apple was founded by Steve Jobs in California in 1976"

doc = nlp(text)

for ent in doc.ents:
    print(ent.text, "->", ent.label_)


Apple -> ORG
Steve Jobs -> PERSON
California -> GPE
1976 -> DATE


## Advantages
- High accuracy
- Context-aware
- Industry-standard NLP library

## Limitations
- Requires more memory
- Pretrained model may miss domain-specific entities

üöÄ Exploring Named Entity Recognition (NER) in NLP

Today, I worked on Named Entity Recognition (NER) ‚Äî a core task in Natural Language Processing (NLP) that helps identify and classify real-world entities such as people, organizations, locations, dates, and more from raw text.

In this mini-project, I implemented and compared three different NER approaches, moving from basics to industry-standard solutions:

üîπ 1. Rule-Based NER
‚Ä¢ Uses regular expressions and linguistic rules
‚Ä¢ Simple and fast
‚Ä¢ Best for small, fixed patterns

üîπ 2. Machine Learning‚ÄìBased NER (NLTK)
‚Ä¢ Uses tokenization, POS tagging, and chunking
‚Ä¢ Pretrained statistical models
‚Ä¢ Better than rule-based, but limited context understanding

üîπ 3. Deep Learning‚ÄìBased NER (spaCy)
‚Ä¢ Uses neural networks and word embeddings
‚Ä¢ Context-aware entity detection
‚Ä¢ Widely used in real-world NLP applications

üéØ This progression clearly shows how NER evolves from manual rules to powerful deep learning models.

üìπ I‚Äôve shared a short demo video explaining the workflow and outputs step by step.

üîó Code link:
üëâ [Paste your GitHub / Colab link here]

#NLP
#NamedEntityRecognition
#MachineLearning
#DeepLearning
#spaCy
#NLTK
#Python
#AI
#LearningJourney