POS Tagging (Part-of-Speech Tagging)

Definition: POS tagging is the process of assigning a part of speech (noun, verb, adjective, etc.) to each word in a sentence based on its definition and context.

Example: In the sentence "The cat sat on the mat," POS tagging would assign:

"The" → Determiner (DT)
"cat" → Noun (NN)
"sat" → Verb (VBD)
"on" → Preposition (IN)
"the" → Determiner (DT)
"mat" → Noun (NN)

Importance: It helps in understanding sentence structure, disambiguating word meanings, and is often used in higher-level NLP tasks like named entity recognition (NER) and chunking.

In [20]:
import nltk

In [21]:
from nltk import word_tokenize

In [22]:
sentence = "The cat sat on the mat"

In [23]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [24]:
tokens = word_tokenize(sentence)

In [25]:
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


True

In [26]:
pos_tags = nltk.pos_tag(tokens)

In [27]:
print(pos_tags)

[('The', 'DT'), ('cat', 'NN'), ('sat', 'VBD'), ('on', 'IN'), ('the', 'DT'), ('mat', 'NN')]


"The" → Determiner (DT)

"cat" → Noun (NN)

"sat" → Verb (VBD)

"on" → Preposition (IN)

"the" → Determiner (DT)

"mat" → Noun (NN)

 NER (Named Entity Recognition)

Definition: NER is a process that identifies and classifies named entities (like people, places, organizations, dates, etc.) in text into predefined categories.

Example: In the sentence "Barack Obama was born in Hawaii in 1961," NER would recognize:

"Barack Obama" as a PERSON,

"Hawaii" as a LOCATION,

"1961" as a DATE.

Importance: NER is crucial for extracting important information, enabling tasks like document summarization, question answering, and more.

In [28]:
import spacy

Load the pre-trained model for English

In [29]:
nlp = spacy.load("en_core_web_sm")

In [30]:
sentence = "Barack Obama was born in Hawaii in 1961."

# Process the sentence using Spacy's NLP pipeline

In [31]:
doc = nlp(sentence)

# Perform NER and print the entities

In [32]:
for ent in doc.ents:
    print(ent.text, ent.label_)

Barack Obama PERSON
Hawaii GPE
1961 DATE


Chunking in NLP (with Code)
Chunking is a way to group POS-tagged words into meaningful phrases, like noun phrases (NP) or verb phrases (VP). It uses patterns (called grammars) to define these chunks.

Example of Chunking:
In the sentence, "The quick brown fox jumps over the lazy dog":

"The quick brown fox" is a noun phrase (NP).
"jumps over" is a verb phrase (VP).
"the lazy dog" is another noun phrase (NP).
We will use the nltk library to perform chunking.

Steps to Perform Chunking:
Tokenize the sentence.
POS Tag each token (word).
Define a grammar for chunking (e.g., a pattern for noun phrases).
Use a chunk parser to extract chunks.

In [33]:
import nltk

In [34]:
from nltk import word_tokenize,pos_tag,RegexpParser

In [35]:
sentence = "The quick brown fox jumps over the lazy dog."

In [36]:
 nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [37]:
tokens = word_tokenize(sentence)

In [39]:
pos_tags = pos_tag(tokens)


In [50]:
grammar = "NP: {<DT>?<JJ>*<NN>}"


In [49]:
chunk_parser = RegexpParser(grammar)

In [46]:
chunked_sentence = chunk_parser.parse(pos_tags)

In [47]:
print(chunked_sentence)

(S
  (NP The/DT quick/JJ brown/NN)
  (NP fox/NN)
  jumps/VBZ
  over/IN
  (NP the/DT lazy/JJ dog/NN)
  ./.)


Explanation:
Sentence: "The tall man quickly runs towards the small house."

Noun Phrases (NP): We're chunking determiners (DT), adjectives (JJ), and nouns (NN).

Verb Phrases (VP): We're chunking optional adverbs (RB), verbs (VB.*), and optional prepositions (IN).

POS Tagging:

"The" → Determiner (DT)

"tall" → Adjective (JJ)

"man" → Noun (NN)

"quickly" → Adverb (RB)

"runs" → Verb (VBZ)

"towards" → Preposition (IN)

"the" → Determiner (DT)

"small" → Adjective (JJ)

"house" → Noun (NN)

Grammar:

NP: {<DT>?<JJ>*<NN>} defines a noun phrase (NP) consisting of an optional

determiner, any number of adjectives, and a noun.

VP: {<RB>?<VB.*><IN>?} defines a verb phrase (VP) that starts with an

optional adverb (RB), followed by any verb (VB.*), and an optional

preposition (IN).

Output Example:

(S
  (NP The/DT tall/JJ man/NN)
  (VP quickly/RB runs/VBZ towards/IN)
  (NP the/DT small/JJ house/NN)
  ./.)
(NP The tall man): The first noun phrase is correctly chunked, grouping the determiner, adjective, and noun.

(VP quickly runs towards): The verb phrase is identified, grouping the adverb, verb, and preposition.

(NP the small house): Another noun phrase is chunked, grouping the determiner, adjective, and noun.

Conclusion:
This program helps illustrate how chunking works by identifying both noun

phrases (NP) and verb phrases (VP) in a sentence. It shows how different

parts of speech work together to form meaningful chunks.

NP The tall man): The first noun phrase is correctly chunked, grouping the determiner, adjective, and noun.
(VP quickly runs towards): The verb phrase is identified, grouping the adverb, verb, and preposition.
(NP the small house): Another noun phrase is chunked, grouping the determiner, adjective, and noun.