# 📚 Named Entity Recognition (NER) and Visualization using spaCy

## 🧠 What is Named Entity Recognition?

**Named Entity Recognition (NER)** is a sub-task of information extraction that classifies named entities in text into pre-defined categories such as:
- **Person (PERSON)**
- **Organization (ORG)**
- **Location (GPE, LOC)**
- **Date, Time (DATE, TIME)**
- **Money, Quantity (MONEY, QUANTITY)**, etc...


NER helps in structuring unstructured data for downstream tasks like:
- Text classification
- Question answering
- Knowledge graph construction
- Resume parsing
- Customer feedback analysis

---
- Named Entity Recognition with spaCy is a powerful technique to convert raw text into structured, meaningful data.
- Visualizing it with displacy makes it even more intuitive and helpful for exploration or presentation.

---
## 🔧 Installation and Setup

Make sure `spaCy` is installed and the English model is loaded.

```python
!pip install spacy
!python -m spacy download en_core_web_sm


# 🔍 Use Case: Analyzing Business News
- Let’s say you are building a financial news summarizer. NER can help you extract companies, money, products, and locations to highlight key entities involved.
### 🧪 Example: Perform and Visualize NER

In [3]:
import spacy
from spacy import displacy

# Load spaCy model
nlp = spacy.load("en_core_web_sm")

In [4]:

# Sample text
text = """
Tesla Inc. is planning to acquire a U.S. -based battery manufacturer for $2.3 billion.
The deal is expected to close by December 2025, according to Elon Musk.
The acquisition aims to boost Tesla's electric vehicle production in Germany.
"""

# Process text
doc = nlp(text)

# Print named entities with labels and explanation
print("Named Entities, Phrases, and Labels:\n")
for ent in doc.ents:
    print(f"{ent.text:<30}  |  {ent.label_:<10}  |  {spacy.explain(ent.label_)}")

Named Entities, Phrases, and Labels:

Tesla Inc.                      |  ORG         |  Companies, agencies, institutions, etc.
U.S.                            |  GPE         |  Countries, cities, states
$2.3 billion                    |  MONEY       |  Monetary values, including unit
December 2025                   |  DATE        |  Absolute or relative dates or periods
Elon Musk                       |  PERSON      |  People, including fictional
Tesla                           |  ORG         |  Companies, agencies, institutions, etc.
Germany                         |  GPE         |  Countries, cities, states


In [5]:
# Visualize entities in a Jupyter notebook
displacy.render(doc, style="ent", jupyter=True)

In [6]:
colors = {'ORG':'linear-gradient(90deg, #aa9cfc, #fc9ce7)'}
options = {'ents':['GPE','ORG'],'colors':colors}
displacy.render(doc,style='ent',jupyter=True,options=options)

In [14]:
# displacy.serve(doc,style='ent',options=options)

# go the default port
# http://127.0.0.1:5000/

## ✅ Tips for Real-World Usage
- Use doc.ents to loop through all detected entities.
- You can filter entities based on ent.label_ for tasks like:
    - Extracting only MONEY for financial apps
    - Extracting PERSON and ORG for news aggregators
- You can train custom NER models if built-in categories don’t match your domain.

# 🧠 Bonus: Display Entities as a Table

In [17]:
# ! pip install pandas
import pandas as pd

# Create dataframe of entities
entity_data = [(ent.text, ent.label_, spacy.explain(ent.label_)) for ent in doc.ents]
df = pd.DataFrame(entity_data, columns=["Entity", "Label", "Description"])
df

Unnamed: 0,Entity,Label,Description
0,Tesla Inc.,ORG,"Companies, agencies, institutions, etc."
1,U.S.,GPE,"Countries, cities, states"
2,$2.3 billion,MONEY,"Monetary values, including unit"
3,December 2025,DATE,Absolute or relative dates or periods
4,Elon Musk,PERSON,"People, including fictional"
5,Tesla,ORG,"Companies, agencies, institutions, etc."
6,Germany,GPE,"Countries, cities, states"


# 🎨 Customizing spaCy Named Entity Visualization with displacy for Web Apps

## 📌 Why Customize?

spaCy’s `displacy` visualizer can render NER results as raw HTML. This is very useful when:
- Embedding entity visualization in **web applications**
- Styling the output to match your **brand/theme**
- Exporting it into **HTML reports** or **dashboards**

---

## ✅ Step-by-Step Example

### 🧪 Step 1: Load spaCy and Sample Text

```python
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")


In [19]:
text = """
Apple Inc. is planning to launch the iPhone 16 Pro Max in October 2025.
The event will take place in California, and Tim Cook is expected to announce it.
The phone is priced at $1299.
"""

doc = nlp(text)

# Render and get HTML string instead of displaying
# ✅ Key fix: Set jupyter=False to force HTML return even outside Jupyter
html = displacy.render(doc, style="ent", page=True, jupyter=False)  # page=True returns a full HTML page
# jupyter=False: This crucial argument tells displacy.render to always return the HTML string, 
# regardless of whether it's running within a Jupyter notebook, a regular Python script, or an IDE.

#print(html)

In [20]:
# Save the HTML to a file to view in browser
with open("ner_output.html", "w", encoding="utf-8") as f:
    f.write(html)

print("✅ NER visualization saved to ner_output.html")


✅ NER visualization saved to ner_output.html


In [21]:
import spacy
from pathlib import Path
import spacy.displacy as displacy  # ✅ This is key fix

# Load model
nlp = spacy.load("en_core_web_sm")

# Sample text
text = "Google was founded by Larry Page and Sergey Brin in California in 1998."

# Process the doc
doc = nlp(text)

# ✅ Create HTML string
html = displacy.render(doc, style="ent", page=True,  jupyter=False)

# ✅ Check what was returned
print("HTML type:", type(html))  # Should be <class 'str'>, not None

# ✅ Save to file
Path("ner_output.html").write_text(html, encoding="utf-8")

print("✅ Named Entity HTML saved to ner_output.html. Open it in browser.")


HTML type: <class 'str'>
✅ Named Entity HTML saved to ner_output.html. Open it in browser.


# ✅ Copy-Paste Ready Python Helper Function

In [23]:
import spacy
from spacy import displacy
from pathlib import Path
import webbrowser

def save_ner_visualization(
    text,
    model="en_core_web_sm",
    filename="ner_output.html",
    custom_colors=None,
    labels=None,
    open_in_browser=True
):
    """
    Render and save NER visualization using spaCy and displacy.

    Parameters:
        text (str): Input text to analyze.
        model (str): spaCy language model to use.
        filename (str): Output HTML filename.
        custom_colors (dict): Optional color overrides per entity label.
        labels (list): List of entity labels to highlight (e.g., ["PERSON", "ORG"]).
        open_in_browser (bool): Whether to auto-open saved HTML in browser.
    """
    nlp = spacy.load(model)
    doc = nlp(text)

    options = {}
    if custom_colors:
        options["colors"] = custom_colors
    if labels:
        options["ents"] = labels

    html = displacy.render(doc, style="ent", page=True, jupyter=False, options=options)

    # Save to file
    Path(filename).write_text(html, encoding="utf-8")
    print(f"✅ NER visualization saved to {filename}")

    # Open in browser
    if open_in_browser:
        webbrowser.open(Path(filename).absolute().as_uri())

# 🧪 Example usage
sample_text = "Google acquired DeepMind in London for $500 million in 2014. Sundar Pichai confirmed it."

custom_colors = {
    "ORG": "#ffd700",
    "GPE": "#87ceeb",
    "MONEY": "#90ee90",
    "PERSON": "#fa8072",
    "DATE": "#ffb6c1"
}

save_ner_visualization(
    text=sample_text,
    filename="custom_ner_output.html",
    custom_colors=custom_colors,
    labels=["ORG", "GPE", "MONEY", "PERSON", "DATE"],
    open_in_browser=True
)


✅ NER visualization saved to custom_ner_output.html
