# Named Entity Recognition (NER) Lab with spaCy

**Objective:** Identify and label entities in a complex piece of news text using spaCy's pre-trained language model.

**Tools:** Python, spaCy

## Step 1: Install spaCy and Download Language Model

Run these commands in your terminal (only needs to be done once):
```bash
pip install spacy
python -m spacy download en_core_web_sm
```

## Step 2: Import spaCy and Load the Language Model

In [None]:
import spacy

# Load the English language model
nlp = spacy.load('en_core_web_sm')

print("spaCy model loaded successfully!")

## Step 3: Define the Sample News Text

In [None]:
# Sample news text with various entity types
news_text = "Apple Inc. announced that CEO Tim Cook will visit their headquarters in Cupertino, California next Monday. The company reported revenue of $394 billion in 2022. Google and Microsoft are also competing in the artificial intelligence space, with significant investments in recent years."

print("Original Text:")
print(news_text)

## Step 4: Process the Text with spaCy

In [None]:
# Process the text through the NLP pipeline
doc = nlp(news_text)

print("Text processing complete!")
print(f"Total tokens processed: {len(doc)}")

## Step 5: Extract and Display Named Entities

In [None]:
# Iterate over identified entities
print("\nIdentified Entities:")
print("-" * 50)
print(f"{'Entity Text':<30} | {'Label':<15}")
print("-" * 50)

for ent in doc.ents:
    print(f"{ent.text:<30} | {ent.label_:<15}")

print("-" * 50)
print(f"Total entities found: {len(doc.ents)}")

## Step 6: Understand Entity Labels

In [None]:
# Display explanations for entity labels
print("\nEntity Label Meanings:")
print("-" * 50)

# Get unique labels from our entities
unique_labels = set([ent.label_ for ent in doc.ents])

for label in sorted(unique_labels):
    explanation = spacy.explain(label)
    print(f"{label:<10} - {explanation}")

## Step 7: Visualize Entities (Optional)

In [None]:
# Use spaCy's built-in visualizer
from spacy import displacy

# Display entities with highlighting
displacy.render(doc, style='ent', jupyter=True)

## Step 8: Entity Analysis by Type

In [None]:
# Count entities by type
from collections import Counter

entity_counts = Counter([ent.label_ for ent in doc.ents])

print("\nEntity Count by Type:")
print("-" * 30)
for label, count in entity_counts.most_common():
    print(f"{label:<10}: {count}")

## Key Insight

Pre-trained models like spaCy can immediately extract structured information from unstructured text without any training on your part. 

**Common Entity Types:**
- **ORG**: Organizations (companies, agencies, institutions)
- **PERSON**: People, including fictional characters
- **GPE**: Geopolitical entities (countries, cities, states)
- **DATE**: Absolute or relative dates or periods
- **MONEY**: Monetary values, including units
- **CARDINAL**: Numerals that do not fall under another type

This demonstrates the power of **transfer learning** in NLP - leveraging models trained on massive datasets to solve specific tasks immediately!