#### STEMMING ####
Stemming is the process of reducing a word to its base or root form. For example, the words "running", "runner", and "ran" can all be reduced to the root word "run". This is useful in text analysis as it helps to group similar words together, reducing the complexity of the data.

In [1]:
words = ["eating", "eats", "ate", "eaten", "eat", "eater", "eaters", "eating", "writing", "writes", "wrote", "written", "write", "writer", "writers", "programming", "programs", "programmed", "programmer", "programmers", "programming"]

In [2]:
from nltk.stem import PorterStemmer

In [3]:
stemming = PorterStemmer()

The disadvantage of stemming is that it can sometimes produce non-words or words that are not meaningful. For example, the word "running" might be reduced to "run", but it could also be reduced to "runn" which is not a valid word. This can lead to confusion and misinterpretation of the data.

In [4]:
for word in words:
    print(word + " ---> "+ stemming.stem(word))

eating ---> eat
eats ---> eat
ate ---> ate
eaten ---> eaten
eat ---> eat
eater ---> eater
eaters ---> eater
eating ---> eat
writing ---> write
writes ---> write
wrote ---> wrote
written ---> written
write ---> write
writer ---> writer
writers ---> writer
programming ---> program
programs ---> program
programmed ---> program
programmer ---> programm
programmers ---> programm
programming ---> program


In [5]:
# some words that don't give a good stem
stemming.stem("congratulations")

'congratul'

#### RegexpStemmer class ####
NLTK has RegexpSteemer class with the help of which we can easily implement Regular Expression based stemming. The RegexpStemmer class allows us to define our own stemming rules using regular expressions. This gives us more control over the stemming process and allows us to create custom stemming rules that are specific to our data.

In [6]:
from nltk.stem import RegexpStemmer

reg_stemmer = RegexpStemmer('ing$|s$|es$|ed$', min=4)
reg_stemmer

<RegexpStemmer: 'ing$|s$|es$|ed$'>

In [7]:
reg_stemmer.stem("running")

'runn'

#### SNOWBALL STEMMER ####


In [8]:
from nltk.stem import SnowballStemmer

snowball_stemmer = SnowballStemmer("english")

In [9]:
for word in words:
    print(word + " ---> "+ snowball_stemmer.stem(word))

eating ---> eat
eats ---> eat
ate ---> ate
eaten ---> eaten
eat ---> eat
eater ---> eater
eaters ---> eater
eating ---> eat
writing ---> write
writes ---> write
wrote ---> wrote
written ---> written
write ---> write
writer ---> writer
writers ---> writer
programming ---> program
programs ---> program
programmed ---> program
programmer ---> programm
programmers ---> programm
programming ---> program


#### COMPARE BETWEEN POTERSTEMMER AND SNOWBALL STEMMER ####

In [11]:
stemming.stem("fairly"), stemming.stem("supportingly")

('fairli', 'supportingli')

In [12]:
snowball_stemmer.stem("fairly"), snowball_stemmer.stem("supportingly")

('fair', 'support')