<a href="https://colab.research.google.com/github/shanojpillai/GenerativeAI_100Days/blob/main/Day_1_Stemming.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **What is Stemming?**
Stemming reduces words to their root form, or “stem,” by removing prefixes or suffixes. This process helps simplify data for models by reducing variations of words with similar meanings, like “eat,” “eating,” and “eaten,” all stemming to “eat.”

For example, in a classification problem where we classify product reviews as positive or negative, we encounter various word forms that carry the same meaning. Stemming reduces these variations, which helps models focus on meaning rather than word forms.

**Stemming Techniques**
There are several stemming algorithms, each with its unique approach:

**Porter Stemmer:** This algorithm is widely used and works well with many words, but sometimes produces inaccurate stems, especially with words like “history” or “congratulations,” where the stem loses meaning.

**Regex Stemmer:** This approach uses custom regular expressions to remove specific suffixes or prefixes. For example, you can define a rule to strip “ing” or “s” from the ends of words. However, it has limitations, as it only matches specific patterns.

**Snowball Stemmer:** An improvement over the Porter Stemmer, the Snowball Stemmer provides better accuracy with many words and supports multiple languages. For example, it accurately stems words like “fairly” and “sportingly” to “fair” and “sport.”

**Limitations of Stemming**
Stemming can sometimes distort the meaning of words, making it less suitable for nuanced applications like chatbots or complex text analysis. For instance, words like “goes” might not stem correctly, resulting in errors.

In [1]:
from nltk.stem import PorterStemmer

In [2]:
stemming = PorterStemmer()

In [3]:
for word in ['going','goes','go','intelligent','intelligence','intelligently','feet','foot','cars','car']:
    print(stemming.stem(word))

go
goe
go
intellig
intellig
intellig
feet
foot
car
car


In [4]:
stemming.stem('happiness')

'happi'

In [5]:
from nltk.stem import RegexpStemmer

In [6]:
reg_stemmer = RegexpStemmer('ing$|s$|e$|able$', min=4)

In [7]:
reg_stemmer.stem('eating')

'eat'