# Stemming
Utilizing NLTK's SnowballStemmer to reduce words to their stems proves effective in consolidating various word variants to their root forms, rather than treating them as distinct entities.

#### Downloads Required

In [1]:
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to /Users/sahithimv/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In this example, we use the SnowballStemmer from NLTK to stem a list of sample words. The stemmed versions of the words are then printed alongside their original forms. The output will show how the words are reduced to their root or stem.

In [2]:
from nltk.stem import SnowballStemmer
import nltk
words_to_stem = ["running", "easily", "jumps", "quickly", "happily", "better"]
stemmer = SnowballStemmer("english")
stemmed_words = [stemmer.stem(word) for word in words_to_stem]
for original, stemmed in zip(words_to_stem, stemmed_words):
    print(f"Original: {original} | Stemmed: {stemmed}")


Original: running | Stemmed: run
Original: easily | Stemmed: easili
Original: jumps | Stemmed: jump
Original: quickly | Stemmed: quick
Original: happily | Stemmed: happili
Original: better | Stemmed: better


To stem words of different languages, plug the language in question into the SnowballStemmer function.

In [3]:
from nltk.stem import SnowballStemmer
import nltk

spanish_words_to_stem = ["corriendo", "rápidamente", "saltos", "felizmente", "mejor"]
spanish_stemmer = SnowballStemmer("spanish")
spanish_stemmed_words = [spanish_stemmer.stem(word) for word in spanish_words_to_stem]
for original, stemmed in zip(spanish_words_to_stem, spanish_stemmed_words):
    print(f"Original: {original} | Stemmed: {stemmed}")


Original: corriendo | Stemmed: corr
Original: rápidamente | Stemmed: rapid
Original: saltos | Stemmed: salt
Original: felizmente | Stemmed: feliz
Original: mejor | Stemmed: mejor
