<a href="https://colab.research.google.com/github/niksom406/Learning_NLP/blob/main/Stemming_and_its_Types_Text_Preprocessing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Stemming Techniques using NLTK

**Stemming** is a process in natural language processing (NLP) that reduces words to their root form, also known as the stem. This helps in grouping together words with similar meanings but different forms (e.g., "running", "runs", "ran" all reduced to "run").

In [57]:
words = ['history', 'historical', 'historically', 'historian', 'histories', 'believe', 'believer', 'believing', 'believed', 'beautiful', 'beauty', 'beautify', 'beautifying', 'run', 'running','eats','eating','eaten','writing','written','programming','programs','finally','finalized']

### PorterStemmer

The **PorterStemmer** is a widely used stemming algorithm. It's known for its simplicity and speed. It applies a series of rules to remove common suffixes from English words.

In [58]:
## PorterStemmer

from nltk.stem import PorterStemmer
stemming = PorterStemmer()

In [59]:
for word in words:
  print(word+"-->"+stemming.stem(word))

history-->histori
historical-->histor
historically-->histor
historian-->historian
histories-->histori
believe-->believ
believer-->believ
believing-->believ
believed-->believ
beautiful-->beauti
beauty-->beauti
beautify-->beautifi
beautifying-->beautifi
run-->run
running-->run
eats-->eat
eating-->eat
eaten-->eaten
writing-->write
written-->written
programming-->program
programs-->program
finally-->final
finalized-->final


In [60]:
stemming.stem('congratulations')

'congratul'

In [61]:
stemming.stem('sitting')

'sit'

### RegexpStemmer

The **RegexpStemmer** is a more flexible stemmer that uses regular expressions to remove suffixes. You can define custom regular expression patterns to match and remove specific word endings.

In [62]:
from nltk.stem import RegexpStemmer

In [63]:
regstemmer = RegexpStemmer('ing$|s$|e$|able$', min=4)

In [64]:
for word in words:
  print(word+"-->"+regstemmer.stem(word))

history-->history
historical-->historical
historically-->historically
historian-->historian
histories-->historie
believe-->believ
believer-->believer
believing-->believ
believed-->believed
beautiful-->beautiful
beauty-->beauty
beautify-->beautify
beautifying-->beautify
run-->run
running-->runn
eats-->eat
eating-->eat
eaten-->eaten
writing-->writ
written-->written
programming-->programm
programs-->program
finally-->finally
finalized-->finalized


In [65]:
regstemmer.stem('ingeating')

'ingeat'

### SnowballStemmer

The **SnowballStemmer** is an improved version of the PorterStemmer. It supports multiple languages and is often considered more aggressive in its stemming than the original PorterStemmer.

In [67]:
from nltk.stem import SnowballStemmer

In [68]:
snowballstem = SnowballStemmer(language='english')

In [69]:
for word in words:
  print(word+"-->"+snowballstem.stem(word))

history-->histori
historical-->histor
historically-->histor
historian-->historian
histories-->histori
believe-->believ
believer-->believ
believing-->believ
believed-->believ
beautiful-->beauti
beauty-->beauti
beautify-->beautifi
beautifying-->beautifi
run-->run
running-->run
eats-->eat
eating-->eat
eaten-->eaten
writing-->write
written-->written
programming-->program
programs-->program
finally-->final
finalized-->final


In [72]:
stemming.stem('fairly'), stemming.stem('sportingly')

('fairli', 'sportingli')

In [71]:
snowballstem.stem('fairly'), snowballstem.stem('sportingly')

('fair', 'sport')

In [73]:
snowballstem.stem('goes')

'goe'