## Named Entity Recognition

Named Entity Recognition (NER) is a technique in natural language processing (NLP) that focuses on identifying and classifying entities. The purpose of NER is to automatically extract structured information from unstructured text, enabling machines to understand and categorize entities in a meaningful manner for various applications like text summarization, building knowledge graphs, question answering, and knowledge graph construction.

In [1]:
!pip install spacy



In [2]:
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 12.8/12.8 MB 2.7 MB/s eta 0:00:00
[38;5;2m[+] Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [13]:
import spacy 
import pandas as pd 

In [4]:
nlp = spacy.load("en_core_web_sm")

In [7]:
content = """

Prime Minister Narendra Modi met with India's World Cup-winning squad, 
led by Rohit Sharma, today at his residence. The team had been stranded in Barbados since Saturday due to Hurricane Beryl, 
but finally arrived in New Delhi today around 6 AM IST.

"""

doc = nlp(content)
for ent in doc.ents:
    print(ent.text , ent.label_)
    
    

Narendra Modi PERSON
India GPE
World Cup EVENT
Rohit Sharma PERSON
today DATE
Barbados GPE
Saturday DATE
Hurricane Beryl EVENT
New Delhi GPE
today DATE
6 AM TIME


In [8]:
from spacy import displacy

In [11]:
displacy.render(doc,style='ent')

In [15]:
entities = [(ent.text , ent.label_ , ent.lemma_) for ent in doc.ents]
df = pd.DataFrame(entities, columns=['text','label','lemma'])
df

Unnamed: 0,text,label,lemma
0,Narendra Modi,PERSON,Narendra Modi
1,India,GPE,India
2,World Cup,EVENT,World Cup
3,Rohit Sharma,PERSON,Rohit Sharma
4,today,DATE,today
5,Barbados,GPE,Barbados
6,Saturday,DATE,Saturday
7,Hurricane Beryl,EVENT,Hurricane Beryl
8,New Delhi,GPE,New Delhi
9,today,DATE,today
