### What is Lemmatization?

**Lemmatization** is a text preprocessing technique similar to stemming, but with a key difference: it reduces a word to its **lemma**, or root form, which is always a **valid word**. Unlike stemming, which uses algorithmic rules to chop off suffixes and might produce a non-existent word, lemmatization uses a dictionary-based approach to find the correct, grammatically meaningful root. For example, "goes" becomes "go," and "better" becomes "good."

### Lemmatization with NLTK

The NLTK library provides the **`WordNetLemmatizer`** class for this purpose. This class uses the WordNet corpus, a large lexical database of English, to find the correct lemma for a word.


#### Using the `WordNetLemmatizer`

The `lemmatize()` function takes two arguments: the **word** itself and its **part-of-speech (POS)** tag. By default, the POS tag is set to **'n'** (for noun).

In [9]:
words = ["eating", "eats", "eaten", "writing", "writes", "programming", "programs", "history", "finally", "finalized"]

In [None]:
import nltk
from nltk.stem import WordNetLemmatizer

# Download the WordNet corpus if you haven't already
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\sai\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [13]:
lemmatizer = WordNetLemmatizer()

# Default behavior: POS is 'n' (noun)
print(f"'eating' (default): {lemmatizer.lemmatize('eating')}")
print(f"'goes' (default):   {lemmatizer.lemmatize('goes')}")

# Specifying the POS tag 'v' (verb)
print(f"'eating' (verb):    {lemmatizer.lemmatize('eating', pos='v')}")
print(f"'goes' (verb):      {lemmatizer.lemmatize('goes', pos='v')}")

'eating' (default): eating
'goes' (default):   go
'eating' (verb):    eat
'goes' (verb):      go


#### The Importance of POS Tags

As shown in the example, the **POS tag is crucial** for accurate lemmatization. If you don't provide a POS tag, the lemmatizer might fail to reduce a word. For instance, "eating" is not a noun, so when treated as one, it remains unchanged. However, when you specify its POS as a verb, the lemmatizer correctly reduces it to "eat."

The commonly used POS tags are:

  * **'n'**: Noun
  * **'v'**: Verb
  * **'a'**: Adjective
  * **'r'**: Adverb

Let's see this in a more comprehensive example:

In [14]:
print("--- Lemmatizing with default POS ('n') ---")
for word in words:
    print(f"{word:<12} -> {lemmatizer.lemmatize(word)}")

print("\n--- Lemmatizing with POS as verb ('v') ---")
for word in words:
    print(f"{word:<12} -> {lemmatizer.lemmatize(word, pos='v')}")

--- Lemmatizing with default POS ('n') ---
eating       -> eating
eats         -> eats
eaten        -> eaten
writing      -> writing
writes       -> writes
programming  -> programming
programs     -> program
history      -> history
finally      -> finally
finalized    -> finalized

--- Lemmatizing with POS as verb ('v') ---
eating       -> eat
eats         -> eat
eaten        -> eat
writing      -> write
writes       -> write
programming  -> program
programs     -> program
history      -> history
finally      -> finally
finalized    -> finalize


### Lemmatization vs. Stemming:

| Feature | **Stemming** | **Lemmatization** |
| :--- | :--- | :--- |
| **Method** | Algorithmic rules | Dictionary-based (e.g., WordNet) |
| **Output** | A **stem** (may not be a valid word) | A **lemma** (a valid word) |
| **Accuracy** | Lower; can produce incorrect results | Higher; context-aware |
| **Speed** | Faster; computationally less expensive | Slower; requires looking up words |
| **Use Case** | Information retrieval, simpler text analysis | Chatbots, summarization, or where grammatical correctness is critical |

### Use Cases and Performance

**Lemmatization** is a vital step in pipelines where preserving the original meaning of a word is paramount, such as in **Q\&A chatbots** and **text summarization**. Since it uses a dictionary to perform the lookup, it is **slower** than stemming. This trade-off between speed and accuracy is an important consideration when building an NLP application.