# Lemmatization
Lemmatization is a text normalization technique used in natural language processing (NLP) to reduce words to their base or dictionary form, called the lemma. Unlike stemming, which simply removes suffixes or prefixes from words to derive their root forms, lemmatization considers the context of words and their morphological analysis to ensure that the resulting lemma is a valid word.

Original words: "happier", "happiest", "happy"

Lemmatized form: "happy"

### Different types of lemmatizers used commonly:

WordNet Lemmatizer: Based on the WordNet lexical database, this lemmatizer maps words to their corresponding lemmas using a predefined set of rules and mappings stored in WordNet.

SpaCy Lemmatizer: An open-source NLP library that provides lemmatization capabilities along with other text processing functionalities.

NLTK Lemmatizer: The Natural Language Toolkit (NLTK) library for Python includes a lemmatizer module that offers lemmatization functionalities.

TextBlob Lemmatizer: TextBlob, a Python library built on NLTK and Pattern, also provides lemmatization capabilities.

### Drawbacks of lemmatization:

Computational Complexity: Lemmatization can be computationally more expensive compared to stemming, as it requires access to a dictionary or lexical database and involves morphological analysis.

Dependency on Language Resources: Lemmatization often relies on language-specific resources such as dictionaries or lexicons, making it less suitable for languages with limited or incomplete linguistic resources.

Ambiguity Resolution: Lemmatization may struggle with disambiguating words with multiple meanings or forms, leading to inaccuracies in lemma assignment.

### Advantages of lemmatization over stemming:

Better Accuracy: Lemmatization produces valid words (lemmas) that are present in the language's vocabulary, ensuring better accuracy in word normalization compared to stemming.

Semantic Preservation: Lemmatization preserves the semantic meaning of words by reducing them to their base forms, which helps in maintaining the interpretability of text data.

Contextual Analysis: Lemmatization considers the context of words and their morphological analysis, leading to more accurate normalization compared to the rule-based approach of stemming.

### Usage:
Lemmatization is typically used in applications where preserving the semantic meaning of words is crucial, such as machine translation, sentiment analysis, or question answering systems. It is preferred over stemming when more accurate word normalization is required, as lemmatization produces valid lemmas that are present in the language's vocabulary.

# WordNet Lemmatizer

In [None]:
import nltk
nltk.download('wordnet')
from nltk.stem import WordNetLemmatizer

[nltk_data] Downloading package wordnet to /root/nltk_data...


In [None]:
words = ["running", "runs", "ran", "runner",
                      "better", "best", "good",
                      "happier", "happiest", "happy",
                      "jumps", "jumping"]

In [None]:
wnl = WordNetLemmatizer()

In [None]:
for word in words:
  print(word+"--->"+wnl.lemmatize(word,pos='v'))

running--->run
runs--->run
ran--->run
runner--->runner
better--->better
best--->best
good--->good
happier--->happier
happiest--->happiest
happy--->happy
jumps--->jump
jumping--->jump


In [None]:
for word in words:
  print(word+"--->"+wnl.lemmatize(word))

running--->running
runs--->run
ran--->ran
runner--->runner
better--->better
best--->best
good--->good
happier--->happier
happiest--->happiest
happy--->happy
jumps--->jump
jumping--->jumping


# spaCy Lemmatizer

In [None]:
import spacy

In [None]:
nlp = spacy.load("en_core_web_sm")

In [None]:
words = ["running", "runs", "ran", "runner",
                      "better", "best", "good",
                      "happier", "happiest", "happy",
                      "jumps", "jumping"]

In [None]:
for word in words:
    doc = nlp(word)    # Process the word using SpaCy
    lemma = doc[0].lemma_
    print(word + " ---> " + lemma)

running ---> run
runs ---> run
ran ---> run
runner ---> runner
better ---> well
best ---> good
good ---> good
happier ---> happy
happiest ---> happy
happy ---> happy
jumps ---> jump
jumping ---> jump
