### Stemming |

**Stemming** in NLP is the process of reducing words to their base or root form by removing prefixes or suffixes.

- Example:  
  - "running" → "run"  
  - "happiness" → "happi"  
  - "studies" → "studi"

Stemming is used to group different forms of a word, simplifying the text for tasks like text classification or search indexing. However, it may result in non-standard or incomplete words.

In [4]:
words = [
    'running', 'jumps', 'easily', 'bigger', 'happiness', 'played',
    'playing', 'flying', 'cars', 'leaves', 'studying', 'running',
    'faster', 'joyful', 'believes', 'swimming', 'organization',
    'relational', 'calculation', 'connections'
]

## Porter Stemmer

In [9]:
from nltk.stem import PorterStemmer

In [11]:
ps = PorterStemmer()

In [23]:
for word in words:
    print(word + ' ==> ', ps.stem(word))

running ==>  run
jumps ==>  jump
easily ==>  easili
bigger ==>  bigger
happiness ==>  happi
played ==>  play
playing ==>  play
flying ==>  fli
cars ==>  car
leaves ==>  leav
studying ==>  studi
running ==>  run
faster ==>  faster
joyful ==>  joy
believes ==>  believ
swimming ==>  swim
organization ==>  organ
relational ==>  relat
calculation ==>  calcul
connections ==>  connect


## Regular expression stemmer

In [29]:
from nltk.stem import RegexpStemmer
# Define the suffix pattern to remove. For example, removing common endings like 'ing', 'ly', 'ed', 'es', 's'
stemmer = RegexpStemmer('ing$|ly$|ed$|es$|s$')

# Apply the stemmer to each word
stemmed_words = [stemmer.stem(word) for word in words]

# Print original words and their stemmed versions
for word in words:
    print(word + ' ==> ', stemmer.stem(word))

running ==>  runn
jumps ==>  jump
easily ==>  easi
bigger ==>  bigger
happiness ==>  happines
played ==>  play
playing ==>  play
flying ==>  fly
cars ==>  car
leaves ==>  leav
studying ==>  study
running ==>  runn
faster ==>  faster
joyful ==>  joyful
believes ==>  believ
swimming ==>  swimm
organization ==>  organization
relational ==>  relational
calculation ==>  calculation
connections ==>  connection


## Snowball Stemmer

In [32]:
from nltk.stem import SnowballStemmer

# Initialize the Snowball stemmer for English
stemmer = SnowballStemmer("english")

# Print original words and their stemmed versions
for word in words:
    print(word + ' ==> ' + stemmer.stem(word))


running ==> run
jumps ==> jump
easily ==> easili
bigger ==> bigger
happiness ==> happi
played ==> play
playing ==> play
flying ==> fli
cars ==> car
leaves ==> leav
studying ==> studi
running ==> run
faster ==> faster
joyful ==> joy
believes ==> believ
swimming ==> swim
organization ==> organ
relational ==> relat
calculation ==> calcul
connections ==> connect
