# Text Processing by removing Stop words


- **Definition**: Commonly used words (e.g., *the, is, and, of*) that carry little lexical meaning.
- **Purpose of removal**: Reduces noise, lowers dimensionality, and improves the quality of text‑based models.
- **Typical languages**: English, Spanish, etc. (each has its own stop‑word list).


In [5]:
#Speech of APJ Abdul Kalam on freedom

speech = """Dear friends, I am very happy to be with you all on this special occasion. Today, we celebrate freedom, a value that is deeply cherished in our hearts. Freedom is not just a word; it is a way of life that empowers us to dream, to innovate, and to create a better future for ourselves and our nation.
As we gather here, let us remember the sacrifices made by our forefathers who fought tirelessly for our independence. Their courage and determination have paved the way for us to enjoy the liberties we have today. It is our duty to honor their legacy by upholding the principles of freedom, equality, and justice in our daily lives.
Freedom also comes with responsibilities. It is essential that we use our freedom wisely, respecting the rights of others and contributing positively to society. Let us embrace freedom with a sense of purpose, working together to build a nation that thrives on unity, diversity, and progress.
On this day, let us renew our commitment to the ideals of freedom and strive to make our country a beacon of hope and opportunity for all. Together, we can create a future where every individual has the freedom to pursue their dreams and aspirations.
Freedom is the oxygen of the soul, and as we breathe in this freedom, let us also breathe out love, compassion, and understanding towards one another.
children are the future of our nation, and it is our responsibility to ensure that they grow up in an environment where freedom is cherished and protected. Let us teach them the value of freedom and the importance of using it for the greater good.
In conclusion, let us celebrate freedom with gratitude and pride. May we always remember that freedom is a precious gift that we must cherish and protect for ourselves and future generations.
nation is our collective home, and freedom is the foundation upon which we build our dreams. Let us work together to create a future where freedom flourishes and every citizen can live with dignity and respect.
india is a land of diversity, and our freedom is a testament to the strength that comes from embracing our differences. Let us celebrate this diversity and work towards a harmonious society where everyone can enjoy their freedoms without fear or discrimination.
Thank you, and Jai Hind!
"""


In [6]:
#download stop words import nltk
import nltk
nltk.download('stopwords')


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\manis\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [7]:
from nltk.corpus import stopwords # Import the stopwords corpus
stop_words = stopwords.words('english') # Get the set of English stop words
stop_words


['a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 "he'd",
 "he'll",
 'her',
 'here',
 'hers',
 'herself',
 "he's",
 'him',
 'himself',
 'his',
 'how',
 'i',
 "i'd",
 'if',
 "i'll",
 "i'm",
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it'd",
 "it'll",
 "it's",
 'its',
 'itself',
 "i've",
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'on

In [8]:
#import punkt for sentence tokenization
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')  # For NLTK 3.8+


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\manis\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\manis\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

In [17]:
from nltk.tokenize import sent_tokenize,word_tokenize
sentences = sent_tokenize(speech) # Tokenize the speech into sentences
sentences


['Dear friends, I am very happy to be with you all on this special occasion.',
 'Today, we celebrate freedom, a value that is deeply cherished in our hearts.',
 'Freedom is not just a word; it is a way of life that empowers us to dream, to innovate, and to create a better future for ourselves and our nation.',
 'As we gather here, let us remember the sacrifices made by our forefathers who fought tirelessly for our independence.',
 'Their courage and determination have paved the way for us to enjoy the liberties we have today.',
 'It is our duty to honor their legacy by upholding the principles of freedom, equality, and justice in our daily lives.',
 'Freedom also comes with responsibilities.',
 'It is essential that we use our freedom wisely, respecting the rights of others and contributing positively to society.',
 'Let us embrace freedom with a sense of purpose, working together to build a nation that thrives on unity, diversity, and progress.',
 'On this day, let us renew our commit

In [25]:
from nltk.stem import PorterStemmer
ps = PorterStemmer()
for i in range(len(sentences)):
    words=word_tokenize(sentences[i])
    words=[ps.stem(word) for word in words if word not in set(stop_words)]
    sentences[i]=' '.join(words)
  
  

In [27]:
sentences

['dear friend , happi special occa .',
 'today , celebr freedom , valu deepli cherish heart .',
 'freedom word ; way life empow us dream , innov , creat better futur nation .',
 'gather , let us rememb sacrif made forefath fought tirelessli independ .',
 'courag determin pave way us enjoy liberti today .',
 'duti honor legaci uphold principl freedom , equal , justic daili live .',
 'freedom also come respon .',
 'essenti use freedom wise , respect right contribut posit societi .',
 'let us embrac freedom sen purpo , work togeth build nation thrive uniti , diver , progress .',
 'day , let us renew commit ideal freedom strive make countri beacon hope opportun .',
 'togeth , creat futur everi individu freedom pursu dream aspir .',
 'freedom oxygen soul , breath freedom , let us also breath love , compass , understand toward one anoth .',
 'children futur nation , respon ensur grow environ freedom cherish protect .',
 'let us teach valu freedom import use greater good .',
 'conclu , let us

In [29]:
#Snowballstemmer

from nltk.stem import SnowballStemmer
ps = SnowballStemmer("english")
for i in range(len(sentences)):
    words=word_tokenize(sentences[i])
    words=[ps.stem(word) for word in words if word not in set(stop_words)]
    sentences[i]=' '.join(words)
  

In [30]:
sentences

['dear friend , happi special occa .',
 'today , celebr freedom , valu deepli cherish heart .',
 'freedom word ; way life empow us dream , innov , creat better futur nation .',
 'gather , let us rememb sacrif made forefath fought tireless independ .',
 'courag determin pave way us enjoy liberti today .',
 'duti honor legaci uphold principl freedom , equal , justic daili live .',
 'freedom also come respon .',
 'essenti use freedom wise , respect right contribut posit societi .',
 'let us embrac freedom sen purpo , work togeth build nation thrive uniti , diver , progress .',
 'day , let us renew commit ideal freedom strive make countri beacon hope opportun .',
 'togeth , creat futur everi individu freedom pursu dream aspir .',
 'freedom oxygen soul , breath freedom , let us also breath love , compass , understand toward one anoth .',
 'children futur nation , respon ensur grow environ freedom cherish protect .',
 'let us teach valu freedom import use greater good .',
 'conclu , let us c

In [34]:
#lemmatization


from nltk.stem import WordNetLemmatizer
lm = WordNetLemmatizer()
for i in range(len(sentences)):
    words=word_tokenize(sentences[i])
    words=[lm.lemmatize(word,'v') for word in words if word not in set(stop_words)]
    sentences[i]=' '.join(words)
  

In [35]:
sentences

['dear friend , happi special occa .',
 'today , celebr freedom , valu deepli cherish heart .',
 'freedom word ; way life empow us dream , innov , creat better futur nation .',
 'gather , let us rememb sacrif make forefath fight tireless independ .',
 'courag determin pave way us enjoy liberti today .',
 'duti honor legaci uphold principl freedom , equal , justic daili live .',
 'freedom also come respon .',
 'essenti use freedom wise , respect right contribut posit societi .',
 'let us embrac freedom sen purpo , work togeth build nation thrive uniti , diver , progress .',
 'day , let us renew commit ideal freedom strive make countri beacon hope opportun .',
 'togeth , creat futur everi individu freedom pursu dream aspir .',
 'freedom oxygen soul , breath freedom , let us also breath love , compass , understand toward one anoth .',
 'children futur nation , respon ensur grow environ freedom cherish protect .',
 'let us teach valu freedom import use greater good .',
 'conclu , let us ce