# Introduction to Lemmatization and Stemming
## What is Lemmatization?
Lemmatization is the process of reducing words to their base or dictionary form, known as the **lemma**. Unlike simple stemming, lemmatization considers the **context and meaning** of a word.

**Examples:**
- **go, went, gone** → **go**
- **is, am, are, was, were** → **be**
- **running, runs, ran** → **run**


## 1. Normalization (Case Folding)
Normalization is a preprocessing step in text processing where words are converted into a standard format. One common type is **case folding**, which means converting all letters to lowercase.

**Example:**
- "HELLO" → "hello"
- "NLP is FUN!" → "nlp is fun!"

## 2. Morphologically Different Words
Words can appear in different **morphological forms**, but they may have the same root meaning.

**Examples:**
- **Duck** (singular) → **Ducks** (plural)
- **Come** (present) → **Came** (past)
- **Go** (present) → **Went** (past)

Lemmatization helps unify these words under a common base form.

## 3. Surfacially Different but Same Root/Stem
Some words may look different but share the same underlying meaning. Lemmatization helps to bring them to their base form.

**Example:** "Went", "Go", and "Gone" all have the same lemma: **go**.

## 4. Lemmatization Example Sentences
Consider the sentence:
**"He is reading detective stories."**

Lemmatized version: **"He be read detective story."**

Here, "is" → "be", "reading" → "read", and "stories" → "story" (singular form).

## 5. Two Broad Classes of Morphemes: Stem + Affix
Words in English consist of a **stem (root word)** and an **affix (prefix or suffix)**.

**Examples:**
- "cats" = "cat" (stem) + "s" (affix)
- "running" = "run" (stem) + "ing" (affix)

A **morphological parser** is used to split words into **stem + affix**.

## 6. Porter Stemmer: Stemming vs. Lemmatization
The **Porter Stemmer Algorithm** is a rule-based approach that removes affixes to get the stem of a word. Unlike lemmatization, stemming does not consider the word’s actual meaning.

**Rewrite Rules in Porter Stemmer:**
- **ATIONAL → ATE** (e.g., "relational" → "relate")
- **ING → ϵ** (remove "ing" if the stem contains a vowel, e.g., "motoring" → "motor")
- **SSES → SS** (e.g., "grasses" → "grass")

## 7. Limitations of Porter Stemmer
While effective, **stemming** sometimes over-generalizes or under-generalizes:

**Over-Generalizing:** "Policy" → "Police" (incorrect)

**Under-Generalizing:** "Not European" → "Europe" (this doesn't happen, though it should)

## Conclusion
- **Stemming** is a fast and simple way to reduce words to their base form but may result in incorrect stems.
- **Lemmatization** ensures accuracy by considering the context and meaning of words, making it more reliable for NLP applications.

By using lemmatization, we can improve text analysis in applications like **search engines, chatbots, and sentiment analysis**.