<a href="https://colab.research.google.com/github/krishanu34/DataScience/blob/main/01.NLP/03.Text Preprocessing-Lemmatization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Lemmatization

Lemmatization is the process of reducing a word to its base or dictionary form, known as a **lemma**. Unlike stemming, which often chops off suffixes and can result in non-words, lemmatization uses vocabulary and morphological analysis to return a valid word.

Think of it like finding the root of a family of words. For example, the words "running," "ran," and "runs" all have the lemma "run."

Lemmatization is crucial in Natural Language Processing (NLP) tasks where understanding the true meaning of a word, regardless of its inflections, is important. This includes tasks like:

*   **Text analysis:** Analyzing the frequency of concepts rather than just word forms.
*   **Information retrieval:** Finding documents that contain variations of a search term.
*   **Machine translation:** Ensuring accurate translation of word meanings.

#### WordNetLemmatizer

A popular dictionary-based lemmatizer available in the NLTK library is the **WordNetLemmatizer**. It uses the WordNet lexical database to find the lemma of a word. WordNet is a large English lexical database that groups English words into sets of synonyms called synsets, and provides short definitions and usage examples.

The WordNetLemmatizer requires the part-of-speech (POS) tag of the word to accurately determine its lemma. For example, the word "leaves" can be the plural of the noun "leaf" or the third-person singular present of the verb "leave." Providing the correct POS tag helps the lemmatizer distinguish between these cases.

In [1]:
from nltk.stem import WordNetLemmatizer

In [2]:
lemmatizer=WordNetLemmatizer()

In [8]:
words = ["leaves","running", "quickly", "beautiful", "houses", "better", "sings", "happily", "large", "cats", "went"]

In [9]:
print("As Nouns")
for word in words:
  print(f"{word} ---> {lemmatizer.lemmatize(word=word,pos='n')}")
print("========================================")
print("As verbs")
for word in words:
  print(f"{word} ---> {lemmatizer.lemmatize(word=word,pos='v')}")
print("========================================")
print("As Adjectives")
for word in words:
  print(f"{word} ---> {lemmatizer.lemmatize(word=word,pos='a')}")
print("========================================")
print("As Adverbs")
for word in words:
  print(f"{word} ---> {lemmatizer.lemmatize(word=word,pos='r')}")

As Nouns
leaves ---> leaf
running ---> running
quickly ---> quickly
beautiful ---> beautiful
houses ---> house
better ---> better
sings ---> sings
happily ---> happily
large ---> large
cats ---> cat
went ---> went
As verbs
leaves ---> leave
running ---> run
quickly ---> quickly
beautiful ---> beautiful
houses ---> house
better ---> better
sings ---> sing
happily ---> happily
large ---> large
cats ---> cat
went ---> go
As Adjectives
leaves ---> leaves
running ---> running
quickly ---> quickly
beautiful ---> beautiful
houses ---> houses
better ---> good
sings ---> sings
happily ---> happily
large ---> large
cats ---> cats
went ---> went
As Adverbs
leaves ---> leaves
running ---> running
quickly ---> quickly
beautiful ---> beautiful
houses ---> houses
better ---> well
sings ---> sings
happily ---> happily
large ---> large
cats ---> cats
went ---> went
