# Stemming

Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma. Stemming is important in natural language understanding (NLU) and natural language processing (NLP).

[eating,eat,eaten] ---> eat, [going,gone,goes]--->go

In [5]:
words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

## PorterStemmer

In [8]:
from nltk.stem import PorterStemmer

In [10]:
stemming = PorterStemmer()

In [12]:
for word in words:
    print(word+"---->"+stemming.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalized---->final


In [14]:
stemming.stem('congratulations')

'congratul'

In [16]:
stemming.stem('sitting')

'sit'

## Regexp Stemmer Class
NLTK has RegexpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression. Let us see an example

In [19]:
from nltk.stem import RegexpStemmer

In [21]:
reg_stemmer = RegexpStemmer('ing$|s$|e$|able$', min=4)

In [23]:
reg_stemmer.stem('eating')

'eat'

In [25]:
reg_stemmer.stem('cars')

'car'

In [27]:
reg_stemmer.stem('ingeating')

'ingeat'

In [29]:
reg_stemmer.stem('inevitable')

'inevit'

## Snowball Stemmer
It is a stemming algorithm which is also known as the PorterStemming algorithm as it is a better version of the Porter Stemmer since some issues of it were fixed in this stemmer.

In [33]:
from nltk.stem import SnowballStemmer

In [35]:
sb_stemmer = SnowballStemmer('english')

In [37]:
for word in words:
    print(word + '--->' + sb_stemmer.stem(word))

eating--->eat
eats--->eat
eaten--->eaten
writing--->write
writes--->write
programming--->program
programs--->program
history--->histori
finally--->final
finalized--->final


In [39]:
stemming.stem("fairly"),stemming.stem("sportingly")

('fairli', 'sportingli')

In [43]:
sb_stemmer.stem("fairly"),sb_stemmer.stem("sportingly")

('fair', 'sport')

In [45]:
sb_stemmer.stem('going')

'go'

In [47]:
sb_stemmer.stem('goes')

'goe'