### Parts Of Speech Tags

In [9]:
import nltk
nltk.download('averaged_perceptron_tagger_eng')

[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     C:\Users\itzsh\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping taggers\averaged_perceptron_tagger_eng.zip.


True

#### The AveragedPerceptronTagger is a part of the Natural Language Toolkit (NLTK) in Python, and it's used for Part-of-Speech (POS) tagging. POS tagging involves assigning parts of speech (like nouns, verbs, adjectives, etc.) to each word in a given text. This is useful for various natural language processing (NLP) tasks such as information extraction, text-to-speech conversion, and language modeling.

| Tag   | Description                                | Example                    |
|-------|--------------------------------------------|----------------------------|
| **CC**  | Coordinating conjunction                  | "and", "but", "or"         |
| **CD**  | Cardinal digit                            | "one", "two", "3"          |
| **DT**  | Determiner                                | "the", "a", "an"           |
| **EX**  | Existential there                         | "there is", "there exists" |
| **FW**  | Foreign word                              | "faux pas"                 |
| **IN**  | Preposition/subordinating conjunction     | "in", "of", "like"         |
| **JJ**  | Adjective                                 | "big", "small"             |
| **JJR** | Adjective, comparative                    | "bigger", "smaller"        |
| **JJS** | Adjective, superlative                    | "biggest", "smallest"      |
| **LS**  | List item marker                          | "1)", "A)"                 |
| **MD**  | Modal                                     | "could", "will", "might"   |
| **NN**  | Noun, singular                            | "desk"                     |
| **NNS** | Noun, plural                              | "desks"                    |
| **NNP** | Proper noun, singular                     | "Harrison"                 |
| **NNPS**| Proper noun, plural                       | "Americans"                |
| **PDT** | Predeterminer                             | "all the kids"             |
| **POS** | Possessive ending                         | "parent's"                 |
| **PRP** | Personal pronoun                          | "I", "he", "she"           |
| **PRP$**| Possessive pronoun                        | "my", "his", "hers"        |
| **RB**  | Adverb                                    | "very", "silently"         |
| **RBR** | Adverb, comparative                       | "better"                   |
| **RBS** | Adverb, superlative                       | "best"                     |
| **RP**  | Particle                                  | "give up"                  |
| **TO**  | "to"                                      | "to go to the store"       |
| **UH**  | Interjection                              | "errrrrrrrm"               |
| **VB**  | Verb, base form                           | "take"                     |
| **VBD** | Verb, past tense                          | "took"                     |
| **VBG** | Verb, gerund/present participle           | "taking"                   |
| **VBN** | Verb, past participle                     | "taken"                    |
| **VBP** | Verb, non-3rd person singular present     | "take"                     |
| **VBZ** | Verb, 3rd person singular present         | "takes"                    |
| **WDT** | WH-determiner                             | "which"                    |
| **WP**  | WH-pronoun                                | "who", "what"              |
| **WRB** | WH-adverb                                 | "where", "when"            |


#### Examples:

In [19]:
 "Taj Mahal is a beautiful Monument".split()

['Taj', 'Mahal', 'is', 'a', 'beautiful', 'Monument']

In [14]:
print(nltk.pos_tag("Taj Mahal is a beautiful Monument".split()))

[('Taj', 'NNP'), ('Mahal', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('beautiful', 'JJ'), ('Monument', 'NN')]


#### Applying on paragraph:

In [10]:
paragraph = "The Taj Mahal is an ivory-white marble mausoleum on the right bank of the river Yamuna in Agra, Uttar Pradesh, India. It was commissioned in 1631 by the fifth Mughal emperor, Shah Jahan (1628-1658) to house the tomb of his beloved wife, Mumtaz Mahal; it also houses the tomb of Shah Jahan himself. The tomb is the centrepiece of a 17-hectare (42-acre) complex, which includes a mosque and a guest house, and is set in formal gardens bounded on three sides by a crenellated wall. Construction of the mausoleum was completed in 1648, but work continued on other phases of the project for another five years. The first ceremony held at the mausoleum was an observance by Shah Jahan, on 6 February 1643, of the 12th anniversary of the death of Mumtaz Mahal. The Taj Mahal complex is believed to have been completed in its entirety in 1653 at a cost estimated at the time to be around ₹5 million, which in 2023 would be approximately ₹35 billion (US$77.8 million). The building complex incorporates the design traditions of Indo-Islamic and Mughal architecture. It employs symmetrical constructions with the usage of various shapes and symbols. While the mausoleum is constructed of white marble inlaid with semi-precious stones, red sandstone was used for other buildings in the complex similar to the Mughal era buildings of the time. The construction project employed more than 20,000 workers and artisans under the guidance of a board of architects led by Ustad Ahmad Lahori, the emperor's court architect. The Taj Mahal was designated as a UNESCO World Heritage Site in 1983 for being 'the jewel of Islamic art in India and one of the universally admired masterpieces of the world's heritage'. It is regarded as one of the best examples of Mughal architecture and a symbol of Indian history. The Taj Mahal is a major tourist attraction and attracts more than five million visitors a year. In 2007, it was declared a winner of the New 7 Wonders of the World initiative."

In [11]:
paragraph

"The Taj Mahal is an ivory-white marble mausoleum on the right bank of the river Yamuna in Agra, Uttar Pradesh, India. It was commissioned in 1631 by the fifth Mughal emperor, Shah Jahan (1628-1658) to house the tomb of his beloved wife, Mumtaz Mahal; it also houses the tomb of Shah Jahan himself. The tomb is the centrepiece of a 17-hectare (42-acre) complex, which includes a mosque and a guest house, and is set in formal gardens bounded on three sides by a crenellated wall. Construction of the mausoleum was completed in 1648, but work continued on other phases of the project for another five years. The first ceremony held at the mausoleum was an observance by Shah Jahan, on 6 February 1643, of the 12th anniversary of the death of Mumtaz Mahal. The Taj Mahal complex is believed to have been completed in its entirety in 1653 at a cost estimated at the time to be around ₹5 million, which in 2023 would be approximately ₹35 billion (US$77.8 million). The building complex incorporates the d

In [12]:
from nltk.corpus import stopwords
sentences=nltk.sent_tokenize(paragraph)

In [13]:
sentences

['The Taj Mahal is an ivory-white marble mausoleum on the right bank of the river Yamuna in Agra, Uttar Pradesh, India.',
 'It was commissioned in 1631 by the fifth Mughal emperor, Shah Jahan (1628-1658) to house the tomb of his beloved wife, Mumtaz Mahal; it also houses the tomb of Shah Jahan himself.',
 'The tomb is the centrepiece of a 17-hectare (42-acre) complex, which includes a mosque and a guest house, and is set in formal gardens bounded on three sides by a crenellated wall.',
 'Construction of the mausoleum was completed in 1648, but work continued on other phases of the project for another five years.',
 'The first ceremony held at the mausoleum was an observance by Shah Jahan, on 6 February 1643, of the 12th anniversary of the death of Mumtaz Mahal.',
 'The Taj Mahal complex is believed to have been completed in its entirety in 1653 at a cost estimated at the time to be around ₹5 million, which in 2023 would be approximately ₹35 billion (US$77.8 million).',
 'The building c

In [18]:
for i in range(len(sentences)):
    words=nltk.word_tokenize(sentences[i])
    words=[word for word in words if word not in set(stopwords.words('english'))]
    pos_tag=nltk.pos_tag(words)
    print(pos_tag)

[('The', 'DT'), ('Taj', 'NNP'), ('Mahal', 'NNP'), ('ivory-white', 'JJ'), ('marble', 'NN'), ('mausoleum', 'NN'), ('right', 'RB'), ('bank', 'NN'), ('river', 'NN'), ('Yamuna', 'NNP'), ('Agra', 'NNP'), (',', ','), ('Uttar', 'NNP'), ('Pradesh', 'NNP'), (',', ','), ('India', 'NNP'), ('.', '.')]
[('It', 'PRP'), ('commissioned', 'VBD'), ('1631', 'CD'), ('fifth', 'JJ'), ('Mughal', 'NNP'), ('emperor', 'NN'), (',', ','), ('Shah', 'NNP'), ('Jahan', 'NNP'), ('(', '('), ('1628-1658', 'JJ'), (')', ')'), ('house', 'NN'), ('tomb', 'NN'), ('beloved', 'VBD'), ('wife', 'NN'), (',', ','), ('Mumtaz', 'NNP'), ('Mahal', 'NNP'), (';', ':'), ('also', 'RB'), ('houses', 'NNS'), ('tomb', 'VBP'), ('Shah', 'NNP'), ('Jahan', 'NNP'), ('.', '.')]
[('The', 'DT'), ('tomb', 'NN'), ('centrepiece', 'VBD'), ('17-hectare', 'JJ'), ('(', '('), ('42-acre', 'JJ'), (')', ')'), ('complex', 'NN'), (',', ','), ('includes', 'VBZ'), ('mosque', 'JJ'), ('guest', 'NN'), ('house', 'NN'), (',', ','), ('set', 'VBN'), ('formal', 'JJ'), ('gard