# Part of Speech(POS) Tagging
Part-of-speech (POS) tagging is a fundamental task in natural language processing (NLP) that involves assigning a grammatical category (such as noun, verb, adjective, adverb, etc.) to each word in a sentence. POS tagging is crucial for many downstream NLP tasks as it provides valuable information about the syntactic structure and semantic meaning of text.

POS tagging helps in:

Parsing and understanding the grammatical structure of sentences.

Disambiguating the meanings of words based on their context.

Extracting features for various NLP tasks such as named entity recognition, sentiment analysis, and machine translation.

### Common Tags
Noun (NN)

Verb (VB)

Adjective (JJ)

Adverb (RB)

Pronoun (PRP)

Determiner (DT)

Conjunction (CC)

Preposition (IN)

Interjection (UH)

Particle (RP)

### Using NLTK

In [None]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('stopwords')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [None]:
corpus = "NLTK is a leading platform for building Python programs to work with human language data. SpaCy is a powerful library for natural language processing."

In [None]:
words = nltk.word_tokenize(corpus)
pos_tags = nltk.pos_tag(words)

In [None]:
print(pos_tags)

[('NLTK', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('leading', 'VBG'), ('platform', 'NN'), ('for', 'IN'), ('building', 'VBG'), ('Python', 'NNP'), ('programs', 'NNS'), ('to', 'TO'), ('work', 'VB'), ('with', 'IN'), ('human', 'JJ'), ('language', 'NN'), ('data', 'NNS'), ('.', '.'), ('SpaCy', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('powerful', 'JJ'), ('library', 'NN'), ('for', 'IN'), ('natural', 'JJ'), ('language', 'NN'), ('processing', 'NN'), ('.', '.')]


#### POS Tagging (removing Stopwords)

In [None]:
from nltk.corpus import stopwords

words = nltk.word_tokenize(corpus)

stop_words = set(stopwords.words('english'))

filtered_corpus = [word for word in words if word.lower() not in stop_words]

print("Filtered corpus:", filtered_corpus)

pos_tags = nltk.pos_tag(filtered_corpus)

print("POS Tags after removing Stopwords: ",pos_tags)

Filtered corpus: ['NLTK', 'leading', 'platform', 'building', 'Python', 'programs', 'work', 'human', 'language', 'data', '.', 'SpaCy', 'powerful', 'library', 'natural', 'language', 'processing', '.']
POS Tags after removing Stopwords:  [('NLTK', 'NNP'), ('leading', 'VBG'), ('platform', 'NN'), ('building', 'NN'), ('Python', 'NNP'), ('programs', 'NNS'), ('work', 'VBP'), ('human', 'JJ'), ('language', 'NN'), ('data', 'NNS'), ('.', '.'), ('SpaCy', 'NNP'), ('powerful', 'JJ'), ('library', 'JJ'), ('natural', 'JJ'), ('language', 'NN'), ('processing', 'NN'), ('.', '.')]


### Using SpaCy

In [None]:
import spacy
nlp = spacy.load("en_core_web_sm")

In [None]:
corpus = "NLTK is a leading platform for building Python programs to work with human language data. SpaCy is a powerful library for natural language processing."

In [None]:
doc = nlp(corpus)
pos_tags = [(words.text, words.pos_) for words in doc]

In [None]:
print(pos_tags)

[('NLTK', 'PROPN'), ('is', 'AUX'), ('a', 'DET'), ('leading', 'VERB'), ('platform', 'NOUN'), ('for', 'ADP'), ('building', 'VERB'), ('Python', 'PROPN'), ('programs', 'NOUN'), ('to', 'PART'), ('work', 'VERB'), ('with', 'ADP'), ('human', 'ADJ'), ('language', 'NOUN'), ('data', 'NOUN'), ('.', 'PUNCT'), ('SpaCy', 'PROPN'), ('is', 'AUX'), ('a', 'DET'), ('powerful', 'ADJ'), ('library', 'NOUN'), ('for', 'ADP'), ('natural', 'ADJ'), ('language', 'NOUN'), ('processing', 'NOUN'), ('.', 'PUNCT')]


#### POS Tagging (removing Stopwords) using spaCy

In [None]:
corpus = "NLTK is a leading platform for building Python programs to work with human language data. SpaCy is a powerful library for natural language processing."

doc = nlp(corpus)

filtered_corpus = [words for words in doc if not words.is_stop]
print("Filtered corpus:", filtered_corpus)


print("POS Tags after removing Stopwords: ")
for word in filtered_corpus:
    print(word.text +":"+ word.pos_)

Filtered corpus: [NLTK, leading, platform, building, Python, programs, work, human, language, data, ., SpaCy, powerful, library, natural, language, processing, .]
POS Tags after removing Stopwords: 
NLTK:PROPN
leading:VERB
platform:NOUN
building:VERB
Python:PROPN
programs:NOUN
work:VERB
human:ADJ
language:NOUN
data:NOUN
.:PUNCT
SpaCy:PROPN
powerful:ADJ
library:NOUN
natural:ADJ
language:NOUN
processing:NOUN
.:PUNCT
