# Lemmatization and Part of Speech (POS) Tagging

Lemmatization is an advanced text normalization technique that converts words to their base form (lemma) using morphological analysis and part of speech (POS). Unlike stemming, which simply truncates words, lemmatization considers word context to determine the correct base form.

## Key Features
- Uses POS tags (NOUN, VERB, ADJ, ADV) to determine correct lemma
- Produces valid dictionary words (unlike stemming)
- Example: "better" → "good" (as ADJ) or "well" (as ADV)
- "running" → "run", "was" → "be", "studies" → "study"

## Benefits vs. Stemming
- More accurate word reduction
- Maintains semantic meaning
- Better for NLP tasks like text analysis, classification, and translation
- **Slower but produces meaningful results (stems can be nonsensical)**

# Useful for
**Q&A, Chatbots and Text Summarizations**

# Part of Speech (POS) in Lemmatization

POS tagging is crucial for accurate lemmatization as the same word can have different lemmas based on its part of speech.

## Common POS Tags
- `NOUN`: Objects, concepts (e.g., dog, happiness)
- `VERB`: Actions, states (e.g., run, be)
- `ADJ`: Descriptive words (e.g., happy, large)
- `ADV`: Modifies verbs/adjectives (e.g., quickly, very)

## Why POS Matters in Lemmatization
Consider the word "better":
- As an ADJ → lemmatizes to "good"
- As an ADV → lemmatizes to "well"

Examples:
```
"He is better" (ADJ) → "He is good"
"He performs better" (ADV) → "He performs well"
```

This context-aware approach makes lemmatization more accurate than stemming, which would treat "better" the same way regardless of its use in the sentence.

In [3]:
# Words to provide input for Lemmatization 

import nltk
nltk.download('wordnet')
nltk.download('omw-1.4') 

words = ["running", "runner", "ran", "easily", "fairly", "fairness","history","programming.","programmer's","eating","eaten","eat"]

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\manis\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\manis\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


In [5]:
from nltk.stem import WordNetLemmatizer
lm = WordNetLemmatizer()

for w in words:
    print(f"Lemma of {w} is {lm.lemmatize(w,'v')}")  # 'v' for verb



Lemma of running is run
Lemma of runner is runner
Lemma of ran is run
Lemma of easily is easily
Lemma of fairly is fairly
Lemma of fairness is fairness
Lemma of history is history
Lemma of programming. is programming.
Lemma of programmer's is programmer's
Lemma of eating is eat
Lemma of eaten is eat
Lemma of eat is eat


In [None]:
# POS tags for lemmatization

"""" POS
 a	Adjective
 s	Adjective satellite
 r	Adverb
 n	Noun
 v	Verb
"""

In [6]:
from nltk.stem import WordNetLemmatizer
lm = WordNetLemmatizer()

for w in words:
    print(f"Lemma of {w} is {lm.lemmatize(w,'n')}")  # 'n' for noun

Lemma of running is running
Lemma of runner is runner
Lemma of ran is ran
Lemma of easily is easily
Lemma of fairly is fairly
Lemma of fairness is fairness
Lemma of history is history
Lemma of programming. is programming.
Lemma of programmer's is programmer's
Lemma of eating is eating
Lemma of eaten is eaten
Lemma of eat is eat
