### Stop Words in NLP

Stop words are very common words like the, is, in, on, and, to that appear in every sentence but add little meaning.In NLP, we often remove them because:
They create noise and donâ€™t help understand the main message.
They reduce vocabulary size and make models faster.
They improve accuracy for tasks like text classification, keyword extraction, and topic modeling.
They help search engines focus on the important words.
But stop words should not be removed when they influence meaning (e.g., not good, not bad) or in tasks like question answering.Overall, stop words are removed to make text cleaner, faster, and more meaningful for NLP models.

In [54]:
paragraph="""In today's fast-moving world, people are constantly looking for better ways to stay healthy and productive. 
While many individuals try to follow a balanced diet, the reality is that most of them struggle to maintain consistency. 
The pressure of work, family responsibilities, and daily travel often makes it difficult to focus on proper nutrition. 
Because of this, a large number of people depend on quick snacks and ready-made meals, even though they know these foods are not always the best choices. 
To improve their lifestyle, some people have started exploring options like millet-based foods, natural sweeteners, and homemade snacks. 
These alternatives are becoming popular because they are easy to prepare, gentle on the stomach, and provide long-lasting energy. 
At the same time, technology plays an important role in how people make food decisions. 
The recommendations they see on their phones, the reviews they read online, and the advertisements they watch on social media all influence what they buy. 
With so much information available, learning how to clean text, remove stop words, and analyze data helps researchers understand people's real preferences more accurately.
"""

In [17]:
from nltk.corpus import stopwords 
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\Lenovo\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [None]:
# stop words in english 
stopwords.words('english') 

In [32]:
#Converting the given paragraph into sentences using sent_tokenizer
from nltk.tokenize import sent_tokenize
sentences=sent_tokenize(paragraph)

In [10]:
sentences

["In today's fast-moving world, people are constantly looking for better ways to stay healthy and productive.",
 'While many individuals try to follow a balanced diet, the reality is that most of them struggle to maintain consistency.',
 'The pressure of work, family responsibilities, and daily travel often makes it difficult to focus on proper nutrition.',
 'Because of this, a large number of people depend on quick snacks and ready-made meals, even though they know these foods are not always the best choices.',
 'To improve their lifestyle, some people have started exploring options like millet-based foods, natural sweeteners, and homemade snacks.',
 'These alternatives are becoming popular because they are easy to prepare, gentle on the stomach, and provide long-lasting energy.',
 'At the same time, technology plays an important role in how people make food decisions.',
 'The recommendations they see on their phones, the reviews they read online, and the advertisements they watch on 

In [33]:
### Convert the given sentences into words  
# Verify each word whether it is stop word or not 
# if its not stop word apply stemming 
from nltk.tokenize import word_tokenize


In [34]:
sentences # with all the stop words

["In today's fast-moving world, people are constantly looking for better ways to stay healthy and productive.",
 'While many individuals try to follow a balanced diet, the reality is that most of them struggle to maintain consistency.',
 'The pressure of work, family responsibilities, and daily travel often makes it difficult to focus on proper nutrition.',
 'Because of this, a large number of people depend on quick snacks and ready-made meals, even though they know these foods are not always the best choices.',
 'To improve their lifestyle, some people have started exploring options like millet-based foods, natural sweeteners, and homemade snacks.',
 'These alternatives are becoming popular because they are easy to prepare, gentle on the stomach, and provide long-lasting energy.',
 'At the same time, technology plays an important role in how people make food decisions.',
 'The recommendations they see on their phones, the reviews they read online, and the advertisements they watch on 

## Portstemmer

In [45]:
from nltk.corpus import stopwords 
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
from nltk.stem import PorterStemmer 
stemming=PorterStemmer()
sentences=sent_tokenize(paragraph)
for i in range(len(sentences)):
   words=word_tokenize(sentences[i])
   # Verify each word whether it is stop word or not 
   # apply set to remove duplicates of stop words 
   # if its not stop word apply stemming 
   words=[stemming.stem(word) for word in words if word not in set(stopwords.words('english')) ]
   sentences[i]=' '.join(words) # converting into sentences

In [46]:
sentences # stop words are removed and words are stemmed

["in today 's fast-mov world , peopl constantli look better way stay healthi product .",
 'while mani individu tri follow balanc diet , realiti struggl maintain consist .',
 'the pressur work , famili respons , daili travel often make difficult focu proper nutrit .',
 'becaus , larg number peopl depend quick snack ready-mad meal , even though know food alway best choic .',
 'to improv lifestyl , peopl start explor option like millet-bas food , natur sweeten , homemad snack .',
 'these altern becom popular easi prepar , gentl stomach , provid long-last energi .',
 'at time , technolog play import role peopl make food decis .',
 'the recommend see phone , review read onlin , advertis watch social media influenc buy .',
 "with much inform avail , learn clean text , remov stop word , analyz data help research understand peopl 's real prefer accur ."]

## Snowball Stemmer

In [47]:
from nltk.corpus import stopwords 
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
from nltk.stem import SnowballStemmer 
snowball=SnowballStemmer('english') # using snowball stemming instead Porterstemming for better results
sentences=sent_tokenize(paragraph)
for i in range(len(sentences)):
   words=word_tokenize(sentences[i])
   # Verify each word whether it is stop word or not 
   # apply set to remove duplicates of stop words 
   # if its not stop word apply stemming 
   words=[snowball.stem(word) for word in words if word not in set(stopwords.words('english')) ]
   sentences[i]=' '.join(words) # converting into sentences

In [48]:
sentences

["in today 's fast-mov world , peopl constant look better way stay healthi product .",
 'while mani individu tri follow balanc diet , realiti struggl maintain consist .',
 'the pressur work , famili respons , daili travel often make difficult focus proper nutrit .',
 'becaus , larg number peopl depend quick snack ready-mad meal , even though know food alway best choic .',
 'to improv lifestyl , peopl start explor option like millet-bas food , natur sweeten , homemad snack .',
 'these altern becom popular easi prepar , gentl stomach , provid long-last energi .',
 'at time , technolog play import role peopl make food decis .',
 'the recommend see phone , review read onlin , advertis watch social media influenc buy .',
 "with much inform avail , learn clean text , remov stop word , analyz data help research understand peopl 's real prefer accur ."]

### With lemmatizer

In [60]:
from nltk.corpus import stopwords 
from nltk.tokenize import sent_tokenize
from nltk.tokenize import wordpunct_tokenize
lemmatizer=WordNetLemmatizer() # using Lemmatizer instead of stemming Porterstemming for better results
sentences=sent_tokenize(paragraph) 
for i in range(len(sentences)):
   words=wordpunct_tokenize(sentences[i])
   # Verify each word whether it is stop word or not `
   # apply set to remove duplicates of stop words 
   # if its not stop word apply stemming 
   words=[lemmatizer.lemmatize(word.lower(),pos='v') for word in words if word not in set(stopwords.words('english')) ]
   sentences[i]=' '.join(words) # converting into sentences

In [61]:
sentences

["in today ' fast - move world , people constantly look better ways stay healthy productive .",
 'while many individuals try follow balance diet , reality struggle maintain consistency .',
 'the pressure work , family responsibilities , daily travel often make difficult focus proper nutrition .',
 'because , large number people depend quick snack ready - make meals , even though know foods always best choices .',
 'to improve lifestyle , people start explore options like millet - base foods , natural sweeteners , homemade snack .',
 'these alternatives become popular easy prepare , gentle stomach , provide long - last energy .',
 'at time , technology play important role people make food decisions .',
 'the recommendations see phone , review read online , advertisements watch social media influence buy .',
 "with much information available , learn clean text , remove stop word , analyze data help researchers understand people ' real preferences accurately ."]