# 1. Stemming

Stemming is a text normalization technique used in Natural Language Processing (NLP) that reduces words to their root or base form. For example, "running" becomes "run", "fishes" becomes "fish". This process helps in reducing variations of words with the same meaning, making text analysis more effective.

In [1]:
# Words to provide input for stemming

words = ["running", "runner", "ran", "easily", "fairly", "fairness","history","programming.","programmer's","eating","eaten","eat"]

# 2. Porter Stemmer

Porter Stemmer is one of the most popular stemmers. Features:
- Uses a set of rules to remove common morphological and inflectional endings
- Fast and consistently handles English words
- May sometimes produce stems that are not actual words
- Works well for information retrieval tasks
- Less aggressive compared to other stemmers

In [None]:
#Some words are wrongly stemmed

from nltk.stem import PorterStemmer
ps = PorterStemmer()
for w in words:
    print( w + "----->", ps.stem(w))

running-----> run
runner-----> runner
ran-----> ran
easily-----> easili
fairly-----> fairli
fairness-----> fair
history-----> histori
programming.-----> programming.
programmer's-----> programmer'
eating-----> eat
eaten-----> eaten
eat-----> eat


# 3. RegexpStemmer

RegexpStemmer is a simple but flexible stemmer. Features:
- Uses regular expressions to strip suffixes
- Customizable with user-defined patterns
- Faster than more complex stemmers
- Less accurate than algorithmic stemmers
- Good for specific pattern-based stemming needs

In [None]:
# can use RegexpStemmer to define custom stemming rules and min length for stemming
# rempoves ing from the last word (ing$ means ing at the end of the word)
from nltk.stem import RegexpStemmer
rs = RegexpStemmer('ing$|s$|ed$|er$|ly$|ness$|y$|history$|programming$|programmer$|eat$', min=4)
for w in words:
    print(w +"---->"+ rs.stem(w))

running---->runn
runner---->runn
ran---->ran
easily---->easi
fairly---->fair
fairness---->fair
history---->
programming.---->programming.
programmer's---->programmer'
eating---->eat
eaten---->eaten
eat---->eat


# 4. Snowball Stemmer

Snowball Stemmer (also known as Porter2 Stemmer) is an improved version of Porter Stemmer. Features:
- More accurate than Porter Stemmer
- Supports multiple languages
- Better handling of exceptional cases
- Slightly slower than Porter Stemmer
- More aggressive in reducing words to their stems

In [None]:
#gives better results although not perfect
from nltk.stem import SnowballStemmer
s= SnowballStemmer("english") #language required
for w in words:
    print(w +"---->"+ s.stem(w))

running---->run
runner---->runner
ran---->ran
easily---->easili
fairly---->fair
fairness---->fair
history---->histori
programming.---->programming.
programmer's---->programm
eating---->eat
eaten---->eaten
eat---->eat
