### **Lemmatization in NLP**  

#### **What is Lemmatization?**  
Lemmatization is an **NLP technique** used to reduce a word to its **base or dictionary form** (lemma) while ensuring that the root word is a valid word. Unlike **stemming**, which simply chops off suffixes, lemmatization **considers the meaning and context** of words using a dictionary.  

---

### **Example of Lemmatization vs. Stemming**  

| **Word**   | **Stemming (Porter)** | **Lemmatization** |
|------------|----------------------|------------------|
| running    | run                   | run             |
| better     | better                 | good            |
| studies    | studi                  | study           |
| mice       | mice                   | mouse           |
| happiness  | happi                  | happiness       |

As you can see, **lemmatization provides real words**, whereas **stemming might produce non-words**.

---

### **How Lemmatization Works**  
Lemmatization relies on:  
✅ A **dictionary** to find correct lemmas.  
✅ **Part of Speech (POS)** to determine the correct form (e.g., "better" → "good" as an adjective).  

If no **POS tag** is provided, words are usually treated as **nouns**.

---

## **How to Use Lemmatization in Python?**  
### **1. Using WordNetLemmatizer (NLTK)**  
NLTK provides **WordNet Lemmatizer**, which requires a **POS tag** for better accuracy.  


```

- `"pos='v'"` tells the lemmatizer that "running" is a **verb**.  
- `"pos='a'"` tells the lemmatizer that "better" is an **adjective** (so it converts it to "good").  
- If **no POS tag is provided**, it assumes the word is a **noun**.

---

```
Here, spaCy:  
✅ Converts **"flying" → "fly"**  
✅ Converts **"better" → "good"** automatically  

---

## **Stemming vs. Lemmatization**  

| Feature        | **Stemming**      | **Lemmatization** |
|---------------|------------------|------------------|
| **Approach**  | Rule-based (suffix removal) | Dictionary-based (meaningful words) |
| **Speed**     | Faster            | Slower (needs dictionary lookup) |
| **Accuracy**  | Lower (may create non-words) | Higher (produces real words) |
| **Handles Context?** | No  | Yes |
| **Example**   | "running" → "runn" | "running" → "run" |

---

### **When to Use Lemmatization vs. Stemming?**  
✅ **Use Lemmatization** when **accuracy matters** (e.g., Chatbots, Text Summarization, Search Engines).  
✅ **Use Stemming** when **speed is more important** (e.g., quick keyword extraction).  

Would you like a **real-world example** of using lemmatization in a text-processing pipeline? 🚀

In [2]:
#### **Example: Basic Lemmatization**  

from nltk.stem import WordNetLemmatizer


In [8]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\Aftab\AppData\Roaming\nltk_data...


True

In [9]:
lemmatizer = WordNetLemmatizer()


In [10]:
lemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize("running"))  # Output: running (default is noun)
print(lemmatizer.lemmatize("running", pos="v"))  # Output: run (verb form)
print(lemmatizer.lemmatize("better", pos="a"))  # Output: good (adjective form)

running
run
good


In [11]:
lemmatizer.lemmatize('sportingly'),lemmatizer.lemmatize('fairly')

('sportingly', 'fairly')