In [3]:
import nltk
from nltk.corpus import wordnet
from nltk.stem import WordNetLemmatizer
from nltk import pos_tag, word_tokenize

# Download required NLTK resources
nltk.download('wordnet')
nltk.download('omw-1.4')
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('punkt_tab')
# Download the 'averaged_perceptron_tagger_eng' resource explicitly
nltk.download('averaged_perceptron_tagger_eng')

# Initialize WordNet Lemmatizer
lemmatizer = WordNetLemmatizer()

# Function to map POS tags to WordNet format
def get_wordnet_pos(tag):
    if tag.startswith('J'):
        return wordnet.ADJ
    elif tag.startswith('V'):
        return wordnet.VERB
    elif tag.startswith('N'):
        return wordnet.NOUN
    elif tag.startswith('R'):
        return wordnet.ADV
    else:
        return wordnet.NOUN # Default to noun

# Lemmatization function
def lemmatize_text(text):
    tokens = word_tokenize(text)
    pos_tags = pos_tag(tokens)
    lemmatized_words = [lemmatizer.lemmatize(word, get_wordnet_pos(tag)) for word, tag in pos_tags]
    return ' '.join(lemmatized_words)

if __name__ == "__main__":
    text = input("Enter text for lemmatization: ")
    lemmatized_text = lemmatize_text(text)
    print(f"Original Text: {text}")
    print(f"Lemmatized Text: {lemmatized_text}")

[nltk_data] Downloading package wordnet to
[nltk_data]     C:\Users\ganes\AppData\Roaming\nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to
[nltk_data]     C:\Users\ganes\AppData\Roaming\nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\ganes\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\ganes\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\ganes\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     C:\Users\ganes\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_percep

Enter text for lemmatization:  everyone is playing and enjoying the cricket


Original Text: everyone is playing and enjoying the cricket
Lemmatized Text: everyone be play and enjoy the cricket


In [None]:
'''✅ What the Code Does (Summary)
This Python script performs lemmatization using NLTK’s WordNetLemmatizer. It:

Tokenizes the input sentence

Tags each token with its Part-of-Speech (POS)

Maps POS tags to a format suitable for WordNet

Applies lemmatization to each word

Returns the lemmatized text

📚 Dataset / Model Used
WordNet: A lexical database for English that provides synonyms, antonyms, and base forms (lemmas) of words.

POS tagging is done using the averaged_perceptron_tagger from NLTK.

🌱 What is Lemmatization?
Lemmatization is the process of reducing a word to its base or dictionary form (called a lemma), considering the context and part of speech.

Example:

"running" → "run"

"better" → "good"

Compared to stemming:

Lemmatization is more accurate but slower.

It uses a vocabulary and grammar rules (via WordNet).

Stemming uses simple rule-based chopping (e.g., PorterStemmer).

📦 NLTK Resources Used
python
Copy
Edit
nltk.download('wordnet')                   # WordNet dictionary for lemmas
nltk.download('omw-1.4')                   # Open Multilingual WordNet
nltk.download('averaged_perceptron_tagger')# POS tagging
nltk.download('punkt')                     # Tokenizer
Note: punkt_tab and averaged_perceptron_tagger_eng are unnecessary—could be removed.

🧠 Viva Questions & Answers
❓ Q1. What is lemmatization?
A:
Lemmatization reduces words to their base (lemma) form using a vocabulary (WordNet) and POS tags to ensure the transformation respects grammar.

❓ Q2. Why is POS tagging important in lemmatization?
A:
Lemmatization needs to know the correct POS to find the accurate lemma.

"running" as a verb → "run"

"running" as a noun → stays "running"

❓ Q3. How does your code map POS tags?
A:
It uses the get_wordnet_pos() function to convert tags from pos_tag() into the format expected by WordNetLemmatizer.

❓ Q4. What would happen if you didn't map POS tags?
A:
The lemmatizer would assume all words are nouns, possibly giving incorrect results.

❓ Q5. Example of lemmatization with and without POS tag?
A:

python
Copy
Edit
lemmatizer.lemmatize("running")        # run (assumes noun)
lemmatizer.lemmatize("running", 'v')   # run
❓ Q6. What’s the difference between stemmer.stem() and lemmatizer.lemmatize()?
A:

Stemmer: Rule-based, may produce non-words (e.g., “studies” → “studi”)

Lemmatizer: Uses dictionary, returns real words (e.g., “studies” → “study”)

❓ Q7. Can lemmatization handle irregular words?
A:
Yes, WordNet includes irregular forms.
Example: "better" → "good"

❓ Q8. Why did you choose NLTK over spaCy?
A:
NLTK offers:

Detailed control over lemmatization

Integration with WordNet

Explicit POS mapping

spaCy is faster and easier for pipeline tasks but less transparent in control.

❓ Q9. What are limitations of this code?
A:

Doesn't remove stopwords or punctuation

No error handling for blank input

Depends on English language (WordNet)'''