## Stemming
Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma. Stemming is important in natural language understanding (NLU) and natural language processing (NLP).

In [None]:
# type >>>>> 3
    #1 PorterStemmer
    #2 RegexpStemmer
    #3 Snowball Stemmer

In [None]:
## Classification Problem
## Comments of product is a positive review or negative review
## Reviews----> eating, eat,eaten [going,gone,goes]--->go



In [5]:
import nltk

## 1. PorterStemmer
Developed by: Martin Porter (1980)

Approach: Uses a set of predefined rules to remove suffixes from words.

Advantages:

Simple and efficient.

Works well for general text processing.

Disadvantages:

Can be too aggressive, sometimes removing too much of the word.

Does not always produce meaningful stems.

In [6]:
from nltk.stem import PorterStemmer
stemmer=PorterStemmer()

In [7]:
words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

for word in words: 
    print(word +"------------>"+stemmer.stem(word))

eating------------>eat
eats------------>eat
eaten------------>eaten
writing------------>write
writes------------>write
programming------------>program
programs------------>program
history------------>histori
finally------------>final
finalized------------>final


In [8]:
stemmer.stem('congratulations')

'congratul'

In [10]:
stemmer.stem('pankaj')

'pankaj'

## 2. RegexpStemmer
Approach: Uses regular expressions to define how words should be stemmed.

Advantages:

Highly customizable.

Allows fine-tuned control over stemming behavior.

Disadvantages:

Requires manual rule definition.

Less effective for general cases without careful configuration.

NLTK has RegexpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression. Let us see an example

In [11]:
from nltk.stem import RegexpStemmer
stemmer=RegexpStemmer('ing$|s$|e$|able$', min=4)

In [12]:
stemmer.stem('eating')

'eat'

In [13]:
stemmer.stem('stable')

'st'

In [14]:
stemmer.stem('snakes')

'snake'

## 3. Snowball Stemmer (Also called Porter2 Stemmer)
 It is a stemming algorithm which is also known as the Porter2 stemming algorithm as it is a better version of the Porter Stemmer since some issues of it were fixed in this stemmer.
Developed by: Martin Porter (an improvement over PorterStemmer)

Approach: More advanced and supports multiple languages.

Advantages:

More accurate and less aggressive than PorterStemmer.

Can stem words in different languages (English, Spanish, French, etc.).

Disadvantages:

Slightly slower than PorterStemmer.

Can still remove too much from some words.

In [15]:
from nltk.stem import SnowballStemmer
stemmer=SnowballStemmer('english')

In [16]:
words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

for word in words: 
    print(word +"------------>"+stemmer.stem(word))

eating------------>eat
eats------------>eat
eaten------------>eaten
writing------------>write
writes------------>write
programming------------>program
programs------------>program
history------------>histori
finally------------>final
finalized------------>final
