Lemmatization in NLP refers to reducing words to their base or dictionary form (lemma). Unlike stemming, which just removes prefixes and suffixes, lemmatization uses a vocabulary and morphological analysis of words. The main types of lemmatization methods are:

1. Dictionary-based Lemmatization:

Uses a dictionary or lexical database (e.g., WordNet) to look up the base form (lemma) of a word.
It finds the lemma by matching the word with its base form in the dictionary.
Example: "better" → "good", "running" → "run".

2. Rule-based Lemmatization:

Applies morphological rules to convert a word to its lemma.
It might involve removing certain suffixes or applying grammar-based rules.
Example: "am" → "be", "cats" → "cat".

3. Contextual Lemmatization:

More advanced, it analyzes the context of the word to determine the correct lemma.
It often requires NLP models or algorithms to detect the word's part of speech (POS), then uses this context to apply the correct lemma.
Example: "ran" (past tense of "run") vs. "run" (verb in present tense) are lemmatized based on context.

In [None]:
import nltk
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet

# Download necessary NLTK data files
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('omw-1.4')

# Initialize the lemmatizer
lemmatizer = WordNetLemmatizer()

# Example words to lemmatize
words = ["running", "better", "cats", "am", "flying", "geese"]

# Lemmatize without POS (default is 'noun')
lemmatized_words_default = [lemmatizer.lemmatize(word) for word in words]

# Lemmatize with part of speech (POS)
lemmatized_words_with_pos = []
for word in words:
    # We can use wordnet.POS to specify the POS tag
    # For example, 'v' is for verb, 'n' for noun, 'a' for adjective
    pos = wordnet.VERB if word == "running" else wordnet.NOUN
    lemmatized_words_with_pos.append(lemmatizer.lemmatize(word, pos))

# Print results
print("Original Words:", words)
print("Lemmatized (Default POS):", lemmatized_words_default)
print("Lemmatized (With POS):", lemmatized_words_with_pos)
