### Stemming

Stemming is a Natural Language Processing (NLP) technique that reduces words to their root or base form. The purpose of stemming is to remove suffixes (like "ing", "ly", "ed", "s") to bring related words to a common root, allowing the model to treat them as the same word.

For example, the words "running", "runner", and "runs" all stem to the root word "run". This is helpful in text processing tasks where variations of a word should be treated as the same for analysis.

#### Example:
If we apply stemming to a few words:

- **"running"** ➔ **"run"**
- **"happily"** ➔ **"happili"**
- **"cars"** ➔ **"car"**

> Note: Stemming is not always perfect; some stemmed words may not be valid words.


In [1]:
words = ["eating","eats","eats","eaten","writing","writes","programming","programs","history","finally","finalize","Congratualations"]

### PorterStemmer

In [2]:
from nltk.stem import PorterStemmer


In [3]:
stemming = PorterStemmer()

In [4]:
for word in words:
    print(word+"--->"+stemming.stem(word))

eating--->eat
eats--->eat
eats--->eat
eaten--->eaten
writing--->write
writes--->write
programming--->program
programs--->program
history--->histori
finally--->final
finalize--->final
Congratualations--->congratual


In [5]:
stemming.stem("Sitting")

'sit'

## RegexpStemmer class
The `RegexpStemmer` in NLTK is a customizable stemmer that removes specific patterns from words based on regular expressions. It’s useful for applying simple stemming rules without the complexity of other stemming algorithms. It takes a single regular expression and remove prefix or suffix that matches the expression.

In [6]:
from nltk.stem import RegexpStemmer

In [9]:
reg_stemmer = RegexpStemmer('ing$|s$|e$|able$', min=4)

In [10]:
reg_stemmer.stem('eating')

'eat'

In [11]:
reg_stemmer.stem('ingeating')

'ingeat'

In [None]:
# reg_stemmer1 = RegexpStemmer('ing', min=4)
# reg_stemmer1.stem('ingeating')


'eat'

## Snowball Stemmer
The Snowball Stemmer is an advanced stemming algorithm available in NLTK that is designed to handle multiple languages and produce more linguistically accurate stems than simpler stemmers like the Porter Stemmer. It’s also sometimes called the Porter2 Stemmer because it's an improvement on the original Porter algorithm.

In [13]:
from nltk.stem import SnowballStemmer

In [14]:
snowballstemmer = SnowballStemmer('english')

In [15]:
# PorterStemmer
stemming.stem('fairly'), stemming.stem('sportingly')

('fairli', 'sportingli')

In [16]:
#Snowball Stemmer
snowballstemmer.stem("fairly"), snowballstemmer.stem('sportingly')

('fair', 'sport')

In [19]:
snowballstemmer.stem("goes") # same disadvantage

'goe'

In [20]:
stemming.stem('goes') #Disadvabtage

'goe'