# Stemming

Stemming chops off prefixes or suffixes to reduce a word to its stem.
For example:

"running", "runs", "ran" → "run"

"connection", "connected" → "connect"

⚠️ Stemming may produce non-dictionary words (e.g., “studies” → “studi”).

### Main Classes in nltk.stem
Stemmer	Description	Example.

PorterStemmer:	         Most common English stemmer; rule-based.	        “flies” → “fli”.

LancasterStemmer:	     More aggressive than Porter; may over-stem.	    “maximum” → “maxim”, “running” → “run”.

RegexpStemmer:	         Custom stemming using regular expressions.	        You define what to strip off.

SnowballStemmer:	     Like Porter, but supports multiple languages.      “studying” → “studi”.


In [1]:
# Classification Problem
## Comments of product is a positive review or negative review
## Reviews----> eating, eat,eaten [going,gone,goes]--->go

words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]

### PorterStemmer

In [2]:
from nltk.stem import PorterStemmer

In [3]:
stemming=PorterStemmer()

In [4]:
for word in words:
    print(word+"---->"+stemming.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalized---->final


In [5]:
stemming.stem('congratulations'), stemming.stem("sitting")

('congratul', 'sit')

### RegexpStemmer class
NLTK has RegexpStemmer class with the help of which we can easily implement Regular Expression Stemmer algorithms. It basically takes a single regular expression and removes any prefix or suffix that matches the expression. Let us see an example

In [6]:
from nltk.stem import RegexpStemmer

In [7]:
reg_stemmer=RegexpStemmer('ing$|s$|e$|able$', min=4)

In [8]:
reg_stemmer.stem('eating'), reg_stemmer.stem('ingeating')

('eat', 'ingeat')

### Snowball Stemmer
 It is a stemming algorithm which is also known as the Porter2 stemming algorithm as it is a better version of the Porter Stemmer since some issues of it were fixed in this stemmer.

In [9]:
from nltk.stem import SnowballStemmer

In [10]:
snowballsstemmer=SnowballStemmer('english')

In [11]:
for word in words:
    print(word+"---->"+snowballsstemmer.stem(word))

eating---->eat
eats---->eat
eaten---->eaten
writing---->write
writes---->write
programming---->program
programs---->program
history---->histori
finally---->final
finalized---->final


In [12]:
stemming.stem("fairly"),stemming.stem("sportingly")

('fairli', 'sportingli')

In [13]:
snowballsstemmer.stem("fairly"),snowballsstemmer.stem("sportingly")

('fair', 'sport')