![Stemming](https://i1.wp.com/s3-eu-west-1.amazonaws.com/leadersandco/wp-content/uploads/2017/05/31224050/Diary-writing-is-an-old-human-art.jpg?fit=800%2C600&ssl=1)

Source: https://www.thisdaylive.com/index.php/2017/05/31/death-of-the-diary/

# Stemming

After tokenized word, we may want a root form rather than the original input form for post processing or modelling such as topic classification. The root word does not necessarily a word itself. For example, "reduc" is a root word of "reduce", "suffici" is a root word of "sufficient".

There are lots of stemming algorithm in NLTK. Porter Stemmer and Snowball Stemmer (aka Porter2) will be selected for demonstration because they are the most popular.

In [1]:
# Copy from https://en.wikipedia.org/wiki/Stemming

article = 'In linguistic morphology and information retrieval, stemming is the process of \
reducing inflected (or sometimes derived) words to their word stem, base or root \
form—generally a written word form. The stem need not be identical to the morphological \
root of the word; it is usually sufficient that related words map to the same stem, even \
if this stem is not in itself a valid root.'

### Porter Stemmer

In [2]:
import nltk 
print('NLTK Version: %s' % (nltk.__version__))

porter_stemmer = nltk.stem.PorterStemmer()

NLTK Version: 3.2.5


In [3]:
tokens = nltk.word_tokenize(article)

print('Original Article: %s' % (article))
print()

for token in tokens:
    stemmed_token = porter_stemmer.stem(token)
    
    if token != stemmed_token:
        print('Original : %s, New: %s' % (token, stemmed_token))

Original Article: In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root.

Original : linguistic, New: linguist
Original : morphology, New: morpholog
Original : information, New: inform
Original : retrieval, New: retriev
Original : stemming, New: stem
Original : reducing, New: reduc
Original : inflected, New: inflect
Original : sometimes, New: sometim
Original : derived, New: deriv
Original : words, New: word
Original : form—generally, New: form—gener
Original : The, New: the
Original : identical, New: ident
Original : morphological, New: morpholog
Original : usually, New: usual
Original : sufficient, New: suffici
Original : related, New: relat
Original : words, New:

### Snowball Stemmer

In [5]:
import nltk 
print('NLTK Version: %s' % (nltk.__version__))

snowball_stemmer = nltk.stem.SnowballStemmer('english')

NLTK Version: 3.2.5


In [6]:
tokens = nltk.word_tokenize(article)

print('Original Article: %s' % (article))
print()

for token in tokens:
    stemmed_token = snowball_stemmer.stem(token)
    
    if token != stemmed_token:
        print('Original : %s, New: %s' % (token, stemmed_token))

Original Article: In linguistic morphology and information retrieval, stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form—generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root.

Original : In, New: in
Original : linguistic, New: linguist
Original : morphology, New: morpholog
Original : information, New: inform
Original : retrieval, New: retriev
Original : stemming, New: stem
Original : reducing, New: reduc
Original : inflected, New: inflect
Original : sometimes, New: sometim
Original : derived, New: deriv
Original : words, New: word
Original : form—generally, New: form—gener
Original : The, New: the
Original : identical, New: ident
Original : morphological, New: morpholog
Original : usually, New: usual
Original : sufficient, New: suffici
Original : related, New: relat

Except "In", the result of Snowball Stemmer are same as Porter Stemmer.

# Conclusion


Snowball Stemmer not only support English, but also Germanic and other languages as well. For detail, you may check on the Snowball website. 

Snowball Stemmer: http://snowballstem.org/algorithms/

Besides Porter Stemmer and Snowball Stemmer, reader may also have on look on other stemmer algorithm such as Hunspell

Hunspell Stemmer: https://github.com/hunspell/hunspell