## Wordnet Lemmatizer

Lemmatization technique is like stemming. The output we will get after lemmatization is called 'lemma'. which is a root word rather than root stem, the output of stemming. After lemmatization, we will be getting a valid word that means the same thing.

## Root Word
In NLP, "root word" and "root stem" are often used interchangeably, but technically they refer to different concepts in morphology,
The root word is the most basic form of a word that carries meaning.
It is usually a standalone word in the language.
It doesn’t include any prefixes or suffixes.
Think of it as the original word from which others are derived.

Example:
"Unhappiness"
Root word: "happy"
Prefix: "un-"
Suffix: "-ness"
Here, "happy" is a complete word by itself. It is the root word.

## Root Stem
A stem is the part of the word that remains after removing inflectional endings, like plurals or tenses.
It may or may not be a valid word on its own.
Used mainly in stemming algorithms like Porter Stemmer.

Example:
"running", "runner", "ran"
All have the stem "run".
Here, "run" is both a root and a stem.

But consider:
"studies", "studied", "studying"
The stem (after stemming) might be: "studi" (not a valid English word!)
The root word: "study" (valid word)

## Example for Root word and root stem

Word: "nationalization"
Root word: "nation"
Stem (via stemming): "nation" or "national" (depends on algorithm)

Word: "better"
Root word (via lemmatization): "good"
Stem (via stemming): "better"

In [1]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to /Users/udmdev/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [2]:
from nltk.stem import WordNetLemmatizer

In [3]:
lemmatizer = WordNetLemmatizer()

In [4]:
'''
POS-Noun -n
verb-v
adejective-a
adverb-r
'''
## going is verb so we use pos='v'

# lemmatizer.lemmatize("going", pos='n')
lemmatizer.lemmatize("going", pos='v')
# lemmatizer.lemmatize("going", pos='a')
# lemmatizer.lemmatize("going", pos='r')

'go'

In [5]:
words=["unhappiness","eating", "eats", "eaten", "writing", "writes", "written", "Programming", "programs", "history", "finally", "finalized", "studies", "studied", "studying" ]

In [6]:
for word in words:
    # print(word+" -----> "+lemmatizer.lemmatize(word))
    print(word+" -----> "+lemmatizer.lemmatize(word, pos='r'))

# Why is the output still “unhappiness”?
# Because:

# NLTK lemmatizer treats "unhappiness" as a valid noun in WordNet.

# It doesn't break the word into "un-" + "happiness" or further down to "happy".

unhappiness -----> unhappiness
eating -----> eating
eats -----> eats
eaten -----> eaten
writing -----> writing
writes -----> writes
written -----> written
Programming -----> Programming
programs -----> programs
history -----> history
finally -----> finally
finalized -----> finalized
studies -----> studies
studied -----> studied
studying -----> studying


In [7]:
lemmatizer.lemmatize("goes", pos='v')

'go'

In [8]:
# lemmatizer.lemmatize("fairly"),lemmatizer.lemmatize("sportingly")
lemmatizer.lemmatize("fairly", pos='v'),lemmatizer.lemmatize("sportingly", pos='v')

('fairly', 'sportingly')

So at the end if we ask Stemming or Lemmatization, answer is simple it's Lemmatization. with this we get meaningful word.

NLTK provides WordNetLemmatizer class which is thin wrapper around the wordnet corpus. This class uses morphy() function to the wordnet CorpusReader class to find a lemma. Let us understand it with an example.

## Usecase
Q&A, chatbots, text summarization