## Stemming 

Stemming is a fundamental concept in Natural Language Processing (NLP) that involves reducing words to their root or base form. The goal is to strip suffixes (and sometimes prefixes) from a word so that related words are mapped to a common stem.


In [16]:
### Classification problem
### Comments of product is a positive review of negative review. 
### Reviews ------>  eating , eat , eaten || going , gone , goes    ( eat is the root word ) 

words=["eating","eats","eaten","writing","writes","programming","programs","history","finally","finalized"]


# Porter Stemmer (very popular, rule-based)

Example: running → run, happily → happili

Aggressive and can produce stems that are not real words.

In [12]:
from nltk.stem import PorterStemmer

In [11]:
stemming = PorterStemmer()

In [17]:
for word in words:
    print(word+ " ------> " + stemming.stem(word))

eating ------> eat
eats ------> eat
eaten ------> eaten
writing ------> write
writes ------> write
programming ------> program
programs ------> program
history ------> histori
finally ------> final
finalized ------> final


In [18]:
stemming.stem('congrulations')

'congrul'

In [19]:
stemming.stem('Sitting')

'sit'

***RegexpStemmer Class***

The RegexpStemmer class in NLTK is a very simple and customizable stemmer that removes user-defined suffixes from words using regular expressions.

***When to Use RegexpStemmer?***

When you want lightweight, rule-based stemming.

When you only need to strip a specific set of suffixes (e.g., "-ing", "-ed", "-ly").

Useful for controlled domains where full stemmers like Porter or Snowball might be too aggressive or imprecise.

In [20]:
from nltk.stem import RegexpStemmer

In [29]:
reg_stemmer= RegexpStemmer('ing$|s$|e$|able$', min=4) 

In [25]:
## reg_stemmer= RegexpStemmer('ing$|s$|e$|able$', min=4) 
## Dollar is saying remove ing if it is present in last. 
reg_stemmer.stem('eating')

'eat'

In [27]:
reg_stemmer.stem('ingeating')

'ingeat'

In [31]:
## reg_stemmer= RegexpStemmer('ing|s$|e$|able$', min=4)   Using this expression 
## removing dollar will make it check of removing ing from uper and lower
reg_stemmer.stem('ingeating')

'eat'

## Snowball Stemmer  

It is better than porter stemmer,  It gives the better accuracy than porter stemmer. 

❄️ ***Snowball Stemmer in NLP***

The Snowball Stemmer is a more advanced and improved version of the Porter Stemmer, developed by the same author, Martin Porter. It's sometimes referred to as "Porter2", and it's part of the Snowball stemming framework that supports multiple languages.

✅ **Why Use Snowball Over Porter?**

| Feature          | Snowball Stemmer                | Porter Stemmer         |
| ---------------- | ------------------------------- | ---------------------- |
| Accuracy         | Higher, more consistent         | Lower                  |
| Language support | Multiple languages              | English only           |
| Code structure   | Cleaner and more maintainable   | Older, less flexible   |
| Resulting stems  | Often more linguistically valid | Can be over-aggressive |




In [33]:
from nltk.stem import SnowballStemmer

In [34]:
snow = SnowballStemmer('english')

In [36]:
for word in words:
    print(word+ "-------> " +snow.stem(word))

eating-------> eat
eats-------> eat
eaten-------> eaten
writing-------> write
writes-------> write
programming-------> program
programs-------> program
history-------> histori
finally-------> final
finalized-------> final


In [40]:
## Porter Stemmer 
stemming.stem("fairly"),stemming.stem("sportingly"),

('fairli', 'sportingli')

In [38]:
## Snow Stemmer -- Snow stemmer is giving very good output
snow.stem("fairly"),snow.stem("sportingly")

('fair', 'sport')