## stemming
### In Natural Language Processing (NLP), stemming is the process of reducing words to their base or root form. The idea is to strip away prefixes and suffixes (like "ing", "ly", "es", "s", etc.) to get to the word's stem. This helps in various NLP tasks like text mining, information retrieval, and search engines by normalizing words to their root form, making it easier to analyze and compare them.

In [3]:
## PortStemmer
from nltk.stem import PorterStemmer

In [27]:
words=['eats','eating','eates','goes','going','gone','finally','takeing','moving','moves','history','completion','morning']

In [28]:
 pstemmer=PorterStemmer()

In [29]:
## root_word=[pstemmer.stem(word) for word in words]

In [30]:
for word in words:
    print(word+'------>'+pstemmer.stem(word))

eats------>eat
eating------>eat
eates------>eat
goes------>goe
going------>go
gone------>gone
finally------>final
takeing------>take
moving------>move
moves------>move
history------>histori
completion------>complet
morning------>morn


In [44]:
## Regex-based Stemmer
## regexp (str or regexp) – The regular expression that should be used to identify morphological affixes.
from nltk.stem import RegexpStemmer

In [45]:
st = RegexpStemmer('ing$|s$|e$|able$', min=4)

In [46]:
for word in words:
    print(word+'------>'+st.stem(word))

eats------>eat
eating------>eat
eates------>eate
goes------>goe
going------>go
gone------>gon
finally------>finally
takeing------>take
moving------>mov
moves------>move
history------>history
completion------>completion
morning------>morn


In [48]:
## $-> end->remove matching suffix  ($)->then it will remove matching inffix
st = RegexpStemmer('ing|s|e|able', min=4) -> # these are all affix(an affix is a morpheme (the smallest grammatical unit in a language) that is attached to a word stem to form a new word or word form.)
new_word=['meaningful','suspicious','willingness','unableness']
for word in new_word:
    print(word+'------>'+st.stem(word))


meaningful------>manful
suspicious------>upiciou
willingness------>willn
unableness------>unn


In [35]:
from nltk.stem import SnowballStemmer 

In [41]:
snowst=SnowballStemmer(language='english')

In [42]:
for word in words:
    print(word+'------>'+snowst.stem(word))

eats------>eat
eating------>eat
eates------>eat
goes------>goe
going------>go
gone------>gone
finally------>final
takeing------>take
moving------>move
moves------>move
history------>histori
completion------>complet
morning------>morn
