Reference: https://www.analyticsvidhya.com/blog/2021/06/nlp-application-named-entity-recognition-ner-in-python-with-spacy/

In [1]:
import spacy
from spacy import displacy

NER = spacy.load("en_core_web_sm")

In [2]:
raw_text="The Indian Space Research Organisation or is the national space agency of India, headquartered in Bengaluru. It operates under Department of Space which is directly overseen by the Prime Minister of India while Chairman of ISRO acts as executive of DOS as well."

In [3]:
text1= NER(raw_text)

In [4]:
for word in text1.ents:
  print(word.text,word.label_)

The Indian Space Research Organisation ORG
India GPE
Bengaluru GPE
under Department of Space ORG
India GPE


In [5]:
spacy.explain("ORG")

'Companies, agencies, institutions, etc.'

In [6]:
spacy.explain("GPE")

'Countries, cities, states'

In [7]:
displacy.render(text1,style="ent",jupyter=True)

In [8]:
raw_text2 = "The Mars Orbiter Mission (MOM), informally known as Mangalyaan, was launched into Earth orbit on 5 November 2013 by the Indian Space"

In [9]:
text2= NER(raw_text2)

In [10]:
for word in text2.ents:
  print(word.text,word.label_)

The Mars Orbiter Mission ORG
Mangalyaan PERSON
Earth LOC
5 November 2013 DATE
the Indian Space ORG


In [11]:
spacy.explain("PRODUCT")

'Objects, vehicles, foods, etc. (not services)'

In [12]:
spacy.explain("LOC")

'Non-GPE locations, mountain ranges, bodies of water'

In [13]:
spacy.explain("DATE")

'Absolute or relative dates or periods'

In [14]:
spacy.explain("ORDINAL")

'"first", "second", etc.'

In [15]:
spacy.explain("MONEY")

'Monetary values, including unit'

In [16]:
displacy.render(text2,style="ent",jupyter=True)

## NER of a News Article

We shall web scrape data from a news article and do NER on the text data gathered from there.

We shall use Beautiful Soup for web scraping purposes.

In [17]:
from bs4 import BeautifulSoup
import requests
import re

In [18]:
URL="https://www.zeebiz.com/markets/currency/news-cryptocurrency-news-today-june-12-bitcoin-dogecoin-shiba-inu-and-other-top-coins-prices-and-all-latest-updates-158490"

In [27]:
html_content = requests.get(URL).text

In [28]:
soup = BeautifulSoup(html_content, "lxml")

In [49]:
body=soup.body.text

In [59]:
body= body.replace('\n', ' ')
body= body.replace('\t', '')
body= body.replace('\r', '')
body= body.replace('\xa0', '')
re.sub(r'\ws', '', body)

'                            हिंदी में पढ़ें              Live TV  Live TV       Home   Marke  Stoc Global Marke IPO Compani Currency Commoditi     Peonal Finance  Income tax GST EPFO Pt Office Banking Invtment Mutual Fun Iurance    My Money    Calculato  Income Tax Calculator Home Loan/EMI Calculator Mutual Fund Retur Calculator Peonal Loan Calculator Car Loan Calculator SIP calculator Lumum Calculator SWP Calculator     Vide  M Se Market Jeb Zindagi Zaroorat Light Camera Stoc Tech Tak Big Bul Ka Nhana Paa Wool Commodity Ke MrityunjayMantra The Right Property Show     Economy & Infra  Policy & Initiativ Agriculture Telecom Aviation Railwa Retail Roadwa World Economy     Exclive  Wealth Guide Intervie    Auto  Ca Bik Electric Vehicle Commercial Vehicl     Tech  Mobil Gadge Revie Ti and Tric Ap      Start u  Funding  Appointmen     Real tate  Home Buye  Builde   Ax MF   Trending  Entertainment Politi Liftyle Travel Spor Viral Education Jo                              Read in App        B

In [60]:
body[:100]

'                            हिंदी में पढ़ें              Live TV  Live TV       Home   Markets  Stock'

In [37]:
text3= NER(body)

In [38]:
displacy.render(text3,style="ent",jupyter=True)