#Stemming


Stemming is the process of reducing a word to its root form or base form, often by removing suffixes and prefixes. The goal of stemming is to group words with the same meaning but different forms together, so they can be analyzed as a single item. This technique is widely used in natural language processing (NLP) and information retrieval systems, such as search engines, to improve the consistency and efficiency of data processing.

In [4]:
## Classification Problem
## Comments of product is a positive review or negative review
## Reviews----> eating, eat,eaten [going,gone,goes]--->go

words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

# porterStemmer

In [5]:
from nltk.stem import PorterStemmer
stemmer = PorterStemmer()

In [7]:
for word in words:
    print(word+"----->"+stemmer.stem(word))

eating----->eat
eats----->eat
eaten----->eaten
writing----->write
writes----->write
programming----->program
programs----->program
history----->histori
finally----->final
finalized----->final


In [8]:
stemmer.stem('Congratulations') #backdrop

'congratul'

# RegexStemmer Class

The RegexStemmer is a custom implementation of a stemming approach using regular expressions. Unlike traditional stemming algorithms like the Porter or Snowball Stemmer, a RegexStemmer applies predefined rules using regular expressions to match and strip suffixes from words or transform them based on specific patterns. It offers flexibility for defining domain-specific stemming rules but requires careful crafting of regular expressions to ensure correct transformations.

In [None]:
from nltk.stem import RegexpStemmer
reg_stem = RegexpStemmer('ing$|s$|e$able$', min=4) #$ here indicates weather you want front protion or back portion

In [12]:
reg_stem.stem('eating') # here its elimating the ing and everything

'eat'

In [13]:
reg_stem.stem('ingeating')

'ingeat'

The Snowball Stemmer is a more advanced stemming algorithm compared to simpler stemmers like the Porter Stemmer. It is often referred to as the "Porter2" stemmer and offers improvements in terms of efficiency and handling of various edge cases. The Snowball algorithm was developed by Martin Porter, who also created the original Porter Stemmer. It is part of the Snowball language framework, a small string processing language designed for creating stemming algorithms.

In [18]:
from nltk.stem import SnowballStemmer
snowballsstemmer=SnowballStemmer('english')


In [16]:
for word in words:
    print(word+"------->"+snowballsstemmer.stem(word))

eating------->eat
eats------->eat
eaten------->eaten
writing------->write
writes------->write
programming------->program
programs------->program
history------->histori
finally------->final
finalized------->final


In [17]:
stemmer.stem("fairly"),stemmer.stem("sportingly")

('fairli', 'sportingli')

In [19]:
snowballsstemmer.stem("fairly"),snowballsstemmer.stem("sportingly")

('fair', 'sport')

In [20]:
snowballsstemmer.stem('goes')

'goe'

In [22]:
stemmer.stem('goes')

'goe'