## Wordnet Lemmatizer
Lemmatization technique is like stemming. The output we will get after lemmatization is called ‘lemma’, which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.

NLTK provides WordNetLemmatizer class which is a thin wrapper around the wordnet corpus. This class uses morphy() function to the WordNet CorpusReader class to find a lemma. Let us understand it with an example −


In [3]:
import nltk
from nltk.stem import WordNetLemmatizer

In [4]:
lemmatizer=WordNetLemmatizer()

In [5]:
words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

In [6]:
for word in words: 
    print(word+"------->"+lemmatizer.lemmatize(word))

eating------->eating
eats------->eats
eaten------->eaten
writing------->writing
writes------->writes
programming------->programming
programs------->program
history------->history
finally------->finally
finalized------->finalized


In [7]:
lemmatizer.lemmatize('going')

'going'

In [9]:
'''
POS- Noun-n
verb-v
adjective-a
adverb-r
'''

lemmatizer.lemmatize('going', pos='v')

'go'

In [10]:
lemmatizer.lemmatize('going', pos='n')

'going'

In [None]:
lemmatizer.lemmatize('fairly', pos='v')


'strongly'

In [18]:
lemmatizer.lemmatize('strongly', pos='n')

'strongly'

1. Stemming
🔹 Definition: Stemming is a rule-based process that removes suffixes from words to reach the root form, often without considering whether the resulting word is valid in the language.

🔹 Method: It applies simple heuristics (e.g., chopping off "ing", "ed", "s", etc.) without understanding the meaning of the word.

🔹 Fast but less accurate: Since it follows fixed rules, it sometimes produces incorrect words that do not exist in the dictionary.

🔹 Example using PorterStemmer (NLTK):

In [20]:
from nltk.stem import PorterStemmer

stemmer = PorterStemmer()

words = ["running", "flies", "happily", "better", "studies"]
stems = [stemmer.stem(word) for word in words]

print(stems)  # Output: ['run', 'fli', 'happili', 'better', 'studi']


['run', 'fli', 'happili', 'better', 'studi']


🔹 Issues with stemming:

"flies" → "fli" (incorrect)

"happily" → "happili" (incorrect)

"studies" → "studi" (not a proper word)

2. Lemmatization
🔹 Definition: Lemmatization reduces a word to its dictionary root form (lemma) by considering the word’s part of speech (POS) and meaning.

🔹 Method: It uses a language dictionary to find the correct root form.

🔹 Slower but more accurate: Unlike stemming, it ensures the output word is a valid word in the language.

🔹 Example using WordNetLemmatizer (NLTK):

In [21]:
from nltk.stem import WordNetLemmatizer

lemmatizer = WordNetLemmatizer()

words = ["running", "flies", "happily", "better", "studies"]
lemmas = [lemmatizer.lemmatize(word, pos="v") for word in words]  # 'v' for verb

print(lemmas)  # Output: ['run', 'fly', 'happily', 'be', 'study']


['run', 'fly', 'happily', 'better', 'study']


🔹 Why is lemmatization better?

"flies" → "fly" ✅ (correct word)

"happily" → "happily" ✅ (remains unchanged since it's an adverb)

"studies" → "study" ✅ (correct word)

"better" → "be" (context-dependent)

In [None]:
# 💡 Use stemming when you need fast, simple word reduction.
# 💡 Use lemmatization when accuracy and meaning matter.