### Common Stemming Algorithms in Natural Language Processing"

#### Description:
- Stemming algorithms are used in natural language processing to reduce words to their base or root forms by removing common morphological suffixes. Here are some popular stemming algorithms:

- **Porter Stemmer**: A widely used stemming algorithm that effectively removes common English word suffixes.
- Lovins Stemmer
- Dawson Stemmer
- Krovetz Stemmer
- Xerox Stemmer
- N-Gram Stemmer
- Snowball Stemmer (an improved version of the Porter Stemmer)
- Lancaster Stemmer

## Workflow 
- Tokenization must be done first to prepare the text for lemmatization.
- **Tokenization → Stemming/Lemmatization**.

In [2]:
import nltk
import warnings
warnings.filterwarnings('ignore')

In [None]:
import nltk

# Download required NLTK resources
nltk.download('wordnet')  # Lexical database for English
nltk.download('maxent_ne_chunker')  # NER models for entity recognition
nltk.download('words')  # Corpus of English words
nltk.download('averaged_perceptron_tagger')  # POS tagger

### WordNet 

In [3]:
from nltk.corpus import wordnet

# Find synonyms of a word
synonyms = wordnet.synsets("happy")
print([syn.lemma_names() for syn in synonyms])

[['happy'], ['felicitous', 'happy'], ['glad', 'happy'], ['happy', 'well-chosen']]


### Download importance pakage. 

In [10]:
import nltk
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('maxent_ne_chunker')
nltk.download('words')  # Required for NE chunking
nltk.download('averaged_perceptron_tagger')  # Ensure POS tagger is downloaded
nltk.download('punkt')

[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\User\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [12]:
nltk.data.find('chunkers/maxent_ne_chunker')
nltk.data.find('corpora/words')
nltk.data.find('taggers/averaged_perceptron_tagger')
nltk.data.find('tokenizers/punkt')

FileSystemPathPointer('C:\\Users\\User\\AppData\\Roaming\\nltk_data\\tokenizers\\punkt')

In [14]:
import nltk
print(nltk.data.path)

['C:\\Users\\User/nltk_data', 'C:\\Users\\User\\AppData\\Local\\Programs\\Python\\Python311\\nltk_data', 'C:\\Users\\User\\AppData\\Local\\Programs\\Python\\Python311\\share\\nltk_data', 'C:\\Users\\User\\AppData\\Local\\Programs\\Python\\Python311\\lib\\nltk_data', 'C:\\Users\\User\\AppData\\Roaming\\nltk_data', 'C:\\nltk_data', 'D:\\nltk_data', 'E:\\nltk_data']


## For Words

In [23]:
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

# Step 1: Define a list of words
words = ['change', 'changing', 'changes', 'changed']

# Step 2: Initialize PorterStemmer
p = PorterStemmer()

# Step 3: Stem each word in the list
print(f"{'Original Word':<15} | {'Stemmed Word':<15}")
print("-" * 33)
for w in words:
    print(f"{w:<15} | {p.stem(w):<15}")

Original Word   | Stemmed Word   
---------------------------------
change          | chang          
changing        | chang          
changes         | chang          
changed         | chang          


## For sentence

In [22]:
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

# Define a sentence
sentence = "The constant flux of life necessitates embracing change, whether it's adapting to the changes around us or actively changing ourselves to meet new challenges."

# Tokenize the sentence into words
tokens = word_tokenize(sentence)

# Initialize PorterStemmer
stemmer = PorterStemmer()

# Apply stemming to each token and print in a structured format
print(f"{'Original Word':<15} | {'Stemmed Word':<15}")
print("-" * 33)
for token in tokens:
    print(f"{token:<15} | {stemmer.stem(token):<15}")

Original Word   | Stemmed Word   
---------------------------------
The             | the            
constant        | constant       
flux            | flux           
of              | of             
life            | life           
necessitates    | necessit       
embracing       | embrac         
change          | chang          
,               | ,              
whether         | whether        
it              | it             
's              | 's             
adapting        | adapt          
to              | to             
the             | the            
changes         | chang          
around          | around         
us              | us             
or              | or             
actively        | activ          
changing        | chang          
ourselves       | ourselv        
to              | to             
meet            | meet           
new             | new            
challenges      | challeng       
.               | .              


In [1]:
from nltk.stem import PorterStemmer, LancasterStemmer, SnowballStemmer
from nltk.tokenize import word_tokenize

# Example sentence
sentence = "running runner runs easily faster"

# Tokenize sentence
words = word_tokenize(sentence)

# 1. Porter Stemmer
porter = PorterStemmer()
porter_stems = [porter.stem(word) for word in words]
print("Porter Stemmer:", porter_stems)

# 2. Lancaster Stemmer
lancaster = LancasterStemmer()
lancaster_stems = [lancaster.stem(word) for word in words]
print("Lancaster Stemmer:", lancaster_stems)

# 3. Snowball Stemmer
snowball = SnowballStemmer("english")
snowball_stems = [snowball.stem(word) for word in words]
print("Snowball Stemmer:", snowball_stems)

Porter Stemmer: ['run', 'runner', 'run', 'easili', 'faster']
Lancaster Stemmer: ['run', 'run', 'run', 'easy', 'fast']
Snowball Stemmer: ['run', 'runner', 'run', 'easili', 'faster']


In [18]:
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer

# Define a sentence
sentence = "The constant flux of life necessitates embracing change, whether it's adapting to the changes around us or actively changing ourselves to meet new challenges."

# Tokenize the sentence into words
tokens = word_tokenize(sentence)

# Initialize PorterStemmer
stemmer = PorterStemmer()

### Tokenization best approach.

In [17]:
# Apply stemming to each token
print("Original Word | Stemmed Word")
print("----------------------------")
for token in tokens:
    print(f"{token:15} | {stemmer.stem(token)}")

Original Word | Stemmed Word
----------------------------
The             | the
constant        | constant
flux            | flux
of              | of
life            | life
necessitates    | necessit
embracing       | embrac
change          | chang
,               | ,
whether         | whether
it              | it
's              | 's
adapting        | adapt
to              | to
the             | the
changes         | chang
around          | around
us              | us
or              | or
actively        | activ
changing        | chang
ourselves       | ourselv
to              | to
meet            | meet
new             | new
challenges      | challeng
.               | .


### Using split function 

In [19]:
sen.split()

['Barack',
 'Obama',
 'was',
 'the',
 '44th',
 'President',
 'of',
 'the',
 'United',
 'States.']

In [20]:
sentence.split()

['The',
 'constant',
 'flux',
 'of',
 'life',
 'necessitates',
 'embracing',
 'change,',
 'whether',
 "it's",
 'adapting',
 'to',
 'the',
 'changes',
 'around',
 'us',
 'or',
 'actively',
 'changing',
 'ourselves',
 'to',
 'meet',
 'new',
 'challenges.']