# 🔹 Step 3: Stemming & Lemmatization

Concept: Converting words to their base or root form.

✅ We’ll learn:

Difference between stemming and lemmatization

NLTK’s PorterStemmer vs spaCy’s Lemmatizer

## Downloading the stemming lemma package

In [1]:
import nltk
nltk.download('wordnet')

[nltk_data] Downloading package wordnet to
[nltk_data]     c:\Users\jsril\anaconda3\envs\nlp_env\nltk_data...


True

🧩 STEP 3 — STEMMING vs LEMMATIZATION
- 🎯 Both are techniques to reduce words to their base or root form,
- so that words like “studying”, “studies”, “studied” all refer to “study”.

- But the way they do it is very different 👇

# Step 3 Stemming and lemmatization
- while changing the sentence into root form/base form the
- stemming does not add meaning to the word
- lemmatization adds meaning to the word
- stemming can cut words incorrectly, while lemmatization gives proper dictionary forms.

| Word    | Stemming Output | Lemmatization Output |
| ------- | --------------- | -------------------- |
| studies | studi           | study                |
| running | run             | run                  |
| better  | better          | good                 |
| leaves  | leav            | leaf                 |
| changing | chang          | change               |


In [11]:
from nltk.stem import PorterStemmer

ps = PorterStemmer()

words = ["studies", "studying", "studied", "changing", "running", "leaves", "beautiful", "easily"]


In [12]:
for w in words:
    print(ps.stem(w))

studi
studi
studi
chang
run
leav
beauti
easili


In [13]:
from nltk.stem import WordNetLemmatizer

wn = WordNetLemmatizer()
for w in words:
    print(wn.lemmatize(w))

study
studying
studied
changing
running
leaf
beautiful
easily


# Overall code

In [17]:
#step1 tokenize
from nltk.tokenize import sent_tokenize, word_tokenize

#step 2 stopwords
from nltk.corpus import stopwords

text = "NLP helps computers to understand and generate human language easily. It is a very helpful"

words_tokens = word_tokenize(text)

stopping_words = set(stopwords.words("english"))
filtered_words = [word for word in words_tokens if word not in stopping_words]

#step 3 stemming and lemma

from nltk.stem import PorterStemmer, WordNetLemmatizer

ps = PorterStemmer()
wn = WordNetLemmatizer()

print("**Stemming**")
for w in filtered_words:
    print(w , "->",ps.stem(w))

print()

print("**Lemmatization**")
for w in filtered_words:
    print(w, '->', wn.lemmatize(w))

**Stemming**
NLP -> nlp
helps -> help
computers -> comput
understand -> understand
generate -> gener
human -> human
language -> languag
easily -> easili
. -> .
It -> it
helpful -> help

**Lemmatization**
NLP -> NLP
helps -> help
computers -> computer
understand -> understand
generate -> generate
human -> human
language -> language
easily -> easily
. -> .
It -> It
helpful -> helpful


# ✅ Summary

- **Stemming = Fast but rough (rule-based cutting).**

- **Lemmatization = Accurate but slower (uses dictionary & grammar).**

- **Modern NLP prefers lemmatization (or sometimes subword tokenization with LLMs).**