# STEMMING

Stemming is the process of reducing a word to its word stem that affixes to suffixes and prefixes or to the roots of words known as a lemma. Stemming is important in natural language understanding (NLU) and natural language processing (NLP).

Examples: [going, gone, goes]--> go(word stem)
[eat, eaten, eating]--> eat

#### Finding Word Stem of the following words with the help of stemming

In [7]:
words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

### PorterStemmer (Technique)

In [10]:
from nltk.stem import PorterStemmer

In [12]:
#create object
stemming = PorterStemmer()

In [14]:
#for each word apply stemming process
for word in words:
    print(word + " -----> " + stemming.stem(word))

eating -----> eat
eats -----> eat
eaten -----> eaten
writing -----> write
writes -----> write
programming -----> program
programs -----> program
history -----> histori
finally -----> final
finalized -----> final


##### Disadvantage of stemming: when stemming is applied for some of the words you may not get the correct meaning, form of that specific word may change

In [17]:
stemming.stem('congratulations')

'congratul'

In [21]:
stemming.stem('sitting')

'sit'

## RegexpStemmer Class

NLTK has RegexpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression. 

In [25]:
from nltk.stem import RegexpStemmer

In [27]:
reg_stemmer = RegexpStemmer('ing$|s$|e$|able$', min=4)    #all regular expressions

In [29]:
reg_stemmer.stem('eating')

'eat'

In [31]:
reg_stemmer.stem('ingeating')

'ingeat'

In [33]:
reg_stemmer = RegexpStemmer('ing', min=4)

In [35]:
reg_stemmer.stem('ingeating')

'eat'

### Snowball Stemmer

 Performs better than PorterStemmer

In [41]:
from nltk.stem import SnowballStemmer

In [43]:
snowballstemmer = SnowballStemmer('english')

In [45]:
for word in words:
    print(word + " -----> " + snowballstemmer.stem(word))

eating -----> eat
eats -----> eat
eaten -----> eaten
writing -----> write
writes -----> write
programming -----> program
programs -----> program
history -----> histori
finally -----> final
finalized -----> final


 ### Ouput comparision of PorterStemmer and Snowball Stemmer

In [51]:
#PorterStemmer
stemming.stem("fairly"),stemming.stem("sportingly")

('fairli', 'sportingli')

In [53]:
snowballstemmer.stem("fairly"),snowballstemmer.stem("sportingly")

('fair', 'sport')