### Named Entity Recognition

In [1]:
corpus = "The Taj Mahal is an ivory-white marble mausoleum on the right bank of the river Yamuna in Agra, Uttar Pradesh, India. It was commissioned in 1631 by the fifth Mughal emperor, Shah Jahan (1628-1658) to house the tomb of his beloved wife, Mumtaz Mahal; it also houses the tomb of Shah Jahan himself. The tomb is the centrepiece of a 17-hectare (42-acre) complex, which includes a mosque and a guest house, and is set in formal gardens bounded on three sides by a crenellated wall. Construction of the mausoleum was completed in 1648, but work continued on other phases of the project for another five years. The first ceremony held at the mausoleum was an observance by Shah Jahan, on 6 February 1643, of the 12th anniversary of the death of Mumtaz Mahal. The Taj Mahal complex is believed to have been completed in its entirety in 1653 at a cost estimated at the time to be around ₹5 million, which in 2023 would be approximately ₹35 billion (US$77.8 million). The building complex incorporates the design traditions of Indo-Islamic and Mughal architecture. It employs symmetrical constructions with the usage of various shapes and symbols. While the mausoleum is constructed of white marble inlaid with semi-precious stones, red sandstone was used for other buildings in the complex similar to the Mughal era buildings of the time. The construction project employed more than 20,000 workers and artisans under the guidance of a board of architects led by Ustad Ahmad Lahori, the emperor's court architect. The Taj Mahal was designated as a UNESCO World Heritage Site in 1983 for being 'the jewel of Islamic art in India and one of the universally admired masterpieces of the world's heritage'. It is regarded as one of the best examples of Mughal architecture and a symbol of Indian history. The Taj Mahal is a major tourist attraction and attracts more than five million visitors a year. In 2007, it was declared a winner of the New 7 Wonders of the World initiative."

In [2]:
corpus

"The Taj Mahal is an ivory-white marble mausoleum on the right bank of the river Yamuna in Agra, Uttar Pradesh, India. It was commissioned in 1631 by the fifth Mughal emperor, Shah Jahan (1628-1658) to house the tomb of his beloved wife, Mumtaz Mahal; it also houses the tomb of Shah Jahan himself. The tomb is the centrepiece of a 17-hectare (42-acre) complex, which includes a mosque and a guest house, and is set in formal gardens bounded on three sides by a crenellated wall. Construction of the mausoleum was completed in 1648, but work continued on other phases of the project for another five years. The first ceremony held at the mausoleum was an observance by Shah Jahan, on 6 February 1643, of the 12th anniversary of the death of Mumtaz Mahal. The Taj Mahal complex is believed to have been completed in its entirety in 1653 at a cost estimated at the time to be around ₹5 million, which in 2023 would be approximately ₹35 billion (US$77.8 million). The building complex incorporates the d

### Using nltk:

In [3]:
import nltk
words=nltk.word_tokenize(corpus)

In [4]:
tag_elements=nltk.pos_tag(words)

In [5]:
nltk.download('maxent_ne_chunker_tab')

[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     C:\Users\itzsh\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker_tab is already up-to-date!


True

In [6]:
nltk.download('words')

[nltk_data] Downloading package words to
[nltk_data]     C:\Users\itzsh\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!


True

In [7]:
nltk.ne_chunk(tag_elements).draw()

![image.png](attachment:49dc6c9c-e33a-471e-b510-362972218539.png)
![image.png](attachment:c24845df-1d11-49ca-945c-1e64d1edb43c.png)

### Using spacy :

In [8]:
! pip install spacy





### List of all the entity types (tags) used in spaCy for Named Entity Recognition (NER):

- **PERSON**: People, including fictional.
- **NORP**: Nationalities or religious or political groups.
- **FAC**: Buildings, airports, highways, bridges, etc.
- **ORG**: Companies, agencies, institutions, etc.
- **GPE**: Countries, cities, states.
- **LOC**: Non-GPE locations, mountain ranges, bodies of water.
- **PRODUCT**: Objects, vehicles, foods, etc. (not services).
- **EVENT**: Named hurricanes, battles, wars, sports events, etc.
- **WORK_OF_ART**: Titles of books, songs, etc.
- **LAW**: Named documents made into laws.
- **LANGUAGE**: Any named language.
- **DATE**: Absolute or relative dates or periods.
- **TIME**: Times smaller than a day.
- **PERCENT**: Percentage, including "%".
- **MONEY**: Monetary values, including unit.
- **QUANTITY**: Measurements, as of weight or distance.
- **ORDINAL**: "first", "second", etc.
- **CARDINAL**: Numerals that do not fall under another type.


In [9]:
import spacy

nlp = spacy.load("en_core_web_sm")

doc = nlp(corpus)

for ent in doc.ents:
    print(ent.text, ent.label_)


The Taj Mahal ORG
Yamuna NORP
Agra GPE
Uttar Pradesh GPE
India GPE
1631 DATE
fifth ORDINAL
Shah Jahan PERSON
1628-1658 DATE
Mumtaz Mahal PERSON
Shah Jahan PERSON
17 CARDINAL
42-acre QUANTITY
three CARDINAL
1648 DATE
another five years DATE
first ORDINAL
Shah Jahan PERSON
6 February 1643 DATE
the 12th anniversary DATE
Mumtaz Mahal PERSON
Taj Mahal ORG
1653 DATE
around ₹5 million MONEY
2023 DATE
approximately ₹35 billion MONEY
US$77.8 million MONEY
Indo-Islamic and Mughal ORG
more than 20,000 CARDINAL
Ahmad Lahori PERSON
The Taj Mahal ORG
UNESCO World Heritage Site ORG
1983 DATE
Islamic NORP
India GPE
one CARDINAL
Indian NORP
The Taj Mahal ORG
more than five million CARDINAL
2007 DATE
the New 7 Wonders of the World ORG
