#### Stopwords


Stopwords are common words in a language that are often filtered out during natural language processing tasks because they typically don't carry significant meaning on their own. These words include articles (e.g., "a," "an," "the"), conjunctions (e.g., "and," "but," "or"), prepositions (e.g., "in," "on," "at"), and other frequently occurring words (e.g., "is," "are," "to").

In natural language processing (NLP), stopwords are often removed from text data before analysis or processing to improve the efficiency and accuracy of algorithms. By removing stopwords, we focus on the important words that convey the main meaning of the text.

NLTK (Natural Language Toolkit) provides a predefined list of stopwords for various languages, which can be used to filter out these common words from text data. This filtering process is often performed as a preprocessing step before tasks like text classification, sentiment analysis, or information retrieval.

In [1]:
## Speech Of DR APJ Abdul Kalam
paragraph = """
I envision three paths for India's future. Over our 3000-year history, numerous civilizations have invaded our land, imposing their rule and ideologies. Despite this, we have refrained from imposing our will on others, respecting their freedom.

My first vision is centered around preserving this freedom. Our struggle for independence in 1857 marked the beginning of our journey towards this ideal. It is imperative that we safeguard and nurture this freedom, as it is the cornerstone of our identity and dignity.

Moving forward, I envision India as a developed nation. Despite being one of the top five economies globally, we have hesitated to recognize our potential. It is time to shed this doubt and embrace our status as a developed, self-reliant nation.

Lastly, I believe India must assert itself on the global stage. Only by demonstrating strength—both militarily and economically—will we earn respect from the international community. I draw inspiration from the remarkable individuals I have had the privilege to work with, such as Dr. Vikram Sarabhai, Professor Satish Dhawan, and Dr. Brahm Prakash, who have shaped my perspective on leadership and progress."""

In [2]:
from nltk.stem import PorterStemmer

In [3]:
from nltk.corpus import stopwords

In [4]:
import nltk
nltk.download("stopwords")

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\mistr\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [5]:
stopwords.words("ENGLISH")

['i',
 'me',
 'my',
 'myself',
 'we',
 'our',
 'ours',
 'ourselves',
 'you',
 "you're",
 "you've",
 "you'll",
 "you'd",
 'your',
 'yours',
 'yourself',
 'yourselves',
 'he',
 'him',
 'his',
 'himself',
 'she',
 "she's",
 'her',
 'hers',
 'herself',
 'it',
 "it's",
 'its',
 'itself',
 'they',
 'them',
 'their',
 'theirs',
 'themselves',
 'what',
 'which',
 'who',
 'whom',
 'this',
 'that',
 "that'll",
 'these',
 'those',
 'am',
 'is',
 'are',
 'was',
 'were',
 'be',
 'been',
 'being',
 'have',
 'has',
 'had',
 'having',
 'do',
 'does',
 'did',
 'doing',
 'a',
 'an',
 'the',
 'and',
 'but',
 'if',
 'or',
 'because',
 'as',
 'until',
 'while',
 'of',
 'at',
 'by',
 'for',
 'with',
 'about',
 'against',
 'between',
 'into',
 'through',
 'during',
 'before',
 'after',
 'above',
 'below',
 'to',
 'from',
 'up',
 'down',
 'in',
 'out',
 'on',
 'off',
 'over',
 'under',
 'again',
 'further',
 'then',
 'once',
 'here',
 'there',
 'when',
 'where',
 'why',
 'how',
 'all',
 'any',
 'both',
 'each

In [9]:
stemmer = PorterStemmer()

In [10]:
sentence = nltk.sent_tokenize(paragraph)

In [12]:
type(sentence)

list

In [15]:
## Apply Stopwords And Filter And then Apply  PorterStemmer Stemming

for i in range(len(sentence)):
  words = nltk.word_tokenize(sentence[i])
  words = [stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
  sentence[i] = ' '.join(words) #converting all the list of words into sentence 

In [16]:
sentence

["i envis three path india 's futur .",
 'over 3000-year histori , numer civil invad land , impos rule ideolog .',
 'despit , refrain impos other , respect freedom .',
 'my first vision center around preserv freedom .',
 'our struggl independ 1857 mark begin journey toward ideal .',
 'it imper safeguard nurtur freedom , cornerston ident digniti .',
 'move forward , i envis india develop nation .',
 'despit one top five economi global , hesit recogn potenti .',
 'it time shed doubt embrac statu develop , self-reli nation .',
 'lastli , i believ india must assert global stage .',
 'onli demonstr strength—both militarili economically—wil earn respect intern commun .',
 'i draw inspir remark individu i privileg work , dr. vikram sarabhai , professor satish dhawan , dr. brahm prakash , shape perspect leadership progress .']

In [18]:
from nltk.stem import SnowballStemmer
snowballstemmer = SnowballStemmer('english')

In [19]:
## Apply Stopwords And Filter And then Apply Snowball Stemming

for i in range(len(sentence)):
  words = nltk.word_tokenize(sentence[i])
  words = [snowballstemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
  sentence[i] = ' '.join(words) ##converting all the list of words into sentence 

In [20]:
sentence

["i envis three path india 's futur .",
 'over 3000-year histori , numer civil invad land , impos rule ideolog .',
 'despit , refrain impos other , respect freedom .',
 'my first vision center around preserv freedom .',
 'our struggl independ 1857 mark begin journey toward ideal .',
 'it imper safeguard nurtur freedom , cornerston ident digniti .',
 'move forward , i envis india develop nation .',
 'despit one top five economi global , hesit recogn potenti .',
 'it time shed doubt embrac statu develop , self-reli nation .',
 'lastli , i believ india must assert global stage .',
 'onli demonstr strength—both militarili economically—wil earn respect intern commun .',
 'i draw inspir remark individu i privileg work , dr. vikram sarabhai , professor satish dhawan , dr. brahm prakash , shape perspect leadership progress .']

In [21]:
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

In [28]:
for i in range(len(sentence)):
  words = nltk.word_tokenize(sentence[i])
  words = [lemmatizer.lemmatize(word.lower(),pos='v') for word in words if word not in set(stopwords.words('english'))]
  sentence[i] = ' '.join(words) 

In [23]:
sentence

["envis three path india 's futur .",
 '3000-year histori , numer civil invad land , impos rule ideolog .',
 'despit , refrain impos , respect freedom .',
 'first vision center around preserv freedom .',
 'struggl independ 1857 mark begin journey toward ideal .',
 'imper safeguard nurtur freedom , cornerston ident digniti .',
 'move forward , envis india develop nation .',
 'despit one top five economi global , hesit recogn potenti .',
 'time shed doubt embrac statu develop , self-reli nation .',
 'lastli , believ india must assert global stage .',
 'onli demonstr strength—both militarili economically—wil earn respect intern commun .',
 'draw inspir remark individu privileg work , dr. vikram sarabhai , professor satish dhawan , dr. brahm prakash , shape perspect leadership progress .']