# Lemmatization - Text Preprocessing

**Goal**: This notebook focuses on lemmatization, an advanced text preprocessing technique used to convert words to their base or dictionary form (lemma). Unlike stemming, lemmatization uses a more sophisticated approach that considers the word's meaning.

**Context**: Lemmatization is often preferred over stemming in NLP applications where word context matters, as it reduces words to their meaningful base form. This ensures better results in machine learning models by maintaining linguistic integrity. In this notebook, we walk through lemmatization techniques using popular NLP libraries and compare it with stemming.


## Wordnet Lemmatizer
Other way to see it: Lemmatization technique is like stemming. The output we will get after lemmatization is called ‘lemma’, which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.

NLTK provides WordNetLemmatizer class which is a thin wrapper around the wordnet corpus. This class uses morphy() function to the WordNet CorpusReader class to find a lemma. Let us understand it with an example −

In [3]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\bleew\AppData\Roaming\nltk_data...


True

In [1]:
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

In [5]:
'''
POS - Noun-n
verb-v
adjective-a
adverb-r
'''
lemmatizer.lemmatize("going", pos='v')

'go'

In [6]:
words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

In [9]:
for word in words:
    print(word+"--->"+lemmatizer.lemmatize(word, pos='v'))

eating--->eat
eats--->eat
eaten--->eat
writing--->write
writes--->write
programming--->program
programs--->program
history--->history
finally--->finally
finalized--->finalize


In [10]:
lemmatizer.lemmatize("fairly",pos='v'),lemmatizer.lemmatize("sportingly")


('fairly', 'sportingly')

Lemmatization is a time-consumming task because it needs to understand the context and perform lookups to produce correct forms. Nevertheless, it is more accurate than stemming.

## Use-case examples
Q&A, chatbots, text summarization...

Cases where:
- Accuracy and context matter more than speed. This is common in tasks like sentiment analysis, named entity recognition, or machine translation.

- You need valid and correct forms of words, especially in cases where different forms have distinct meanings (e.g., "better" vs "good", "was" vs "be").

- The language is complex and requires understanding of grammatical rules (e.g., working with highly inflected languages like French, Spanish, or German).
