In [1]:
paragraph = """Imagine trying to find a hidden treasure, but the path is cluttered with useless debris—every step slowing you down, every distraction keeping you from reaching your goal. This is exactly what happens when we work with language data in Natural Language Processing. The beauty of the message, the power of the sentiment, the core of the communication—all buried under a pile of irrelevant words that add no meaning, no insight, no direction.

But there is hope! Just as we clear the path to reveal the treasure, so too must we cleanse our data to reveal the true essence of the language. And how do we do that? By removing the noise. By stripping away the words that are so frequent, so common, that they blur the lines of what truly matters. These words, often called stop words, are like static on a radio—always present, always there, but not the melody we seek.

When we remove stop words, we aren’t just cleaning up data; we are sharpening our focus. We are transforming our understanding of language from a chaotic mess to a clear and meaningful structure. In that clarity, we find the patterns that lead to predictions, the insights that drive decisions, and the connections that build knowledge.

So let us embrace the process of stop word removal, not as a mundane task, but as a powerful step toward discovering the heart of communication. Let us approach it with the excitement of uncovering hidden meaning, of distilling pure insight from raw language. Remember, it’s not just about what we remove, but about what we uncover in the process.

Every time we eliminate those unnecessary words, we get closer to understanding the true message, the real story. And in that understanding, we unlock the full potential of language—the most powerful tool we have to connect, to create, and to change the world."""

In [2]:
from nltk.corpus import stopwords

In [3]:
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to C:\Users\Karthick
[nltk_data]     Selvam\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [4]:
stopwords.words('english')

['i',
 'me',
 'my',
 'myself',
 'we',
 'our',
 'ours',
 'ourselves',
 'you',
 "you're",
 "you've",
 "you'll",
 "you'd",
 'your',
 'yours',
 'yourself',
 'yourselves',
 'he',
 'him',
 'his',
 'himself',
 'she',
 "she's",
 'her',
 'hers',
 'herself',
 'it',
 "it's",
 'its',
 'itself',
 'they',
 'them',
 'their',
 'theirs',
 'themselves',
 'what',
 'which',
 'who',
 'whom',
 'this',
 'that',
 "that'll",
 'these',
 'those',
 'am',
 'is',
 'are',
 'was',
 'were',
 'be',
 'been',
 'being',
 'have',
 'has',
 'had',
 'having',
 'do',
 'does',
 'did',
 'doing',
 'a',
 'an',
 'the',
 'and',
 'but',
 'if',
 'or',
 'because',
 'as',
 'until',
 'while',
 'of',
 'at',
 'by',
 'for',
 'with',
 'about',
 'against',
 'between',
 'into',
 'through',
 'during',
 'before',
 'after',
 'above',
 'below',
 'to',
 'from',
 'up',
 'down',
 'in',
 'out',
 'on',
 'off',
 'over',
 'under',
 'again',
 'further',
 'then',
 'once',
 'here',
 'there',
 'when',
 'where',
 'why',
 'how',
 'all',
 'any',
 'both',
 'each

In [5]:
from nltk.stem import PorterStemmer

In [6]:
stemmer=PorterStemmer()

In [7]:
sentences=nltk.sent_tokenize(paragraph)

In [8]:
sentences

['Imagine trying to find a hidden treasure, but the path is cluttered with useless debris—every step slowing you down, every distraction keeping you from reaching your goal.',
 'This is exactly what happens when we work with language data in Natural Language Processing.',
 'The beauty of the message, the power of the sentiment, the core of the communication—all buried under a pile of irrelevant words that add no meaning, no insight, no direction.',
 'But there is hope!',
 'Just as we clear the path to reveal the treasure, so too must we cleanse our data to reveal the true essence of the language.',
 'And how do we do that?',
 'By removing the noise.',
 'By stripping away the words that are so frequent, so common, that they blur the lines of what truly matters.',
 'These words, often called stop words, are like static on a radio—always present, always there, but not the melody we seek.',
 'When we remove stop words, we aren’t just cleaning up data; we are sharpening our focus.',
 'We ar

In [9]:
type(sentences)

list

In [14]:
## Apply Stopwords And Filter And then Apply Stemming

for i in range(len(sentences)):
    words=nltk.word_tokenize(sentences[i])
    words=[stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    sentences[i]=' '.join(words)# converting all the list of words into sentences

In [15]:
sentences

['imagin tri find hidden treasur , path clutter useless debris—everi step slow , everi distract keep reach goal .',
 'thi exactli happen work languag data natur languag process .',
 'the beauti messag , power sentiment , core communication—al buri pile irrelev word add mean , insight , direct .',
 'but hope !',
 'just clear path reveal treasur , must cleans data reveal true essenc languag .',
 'and ?',
 'by remov nois .',
 'by strip away word frequent , common , blur line truli matter .',
 'these word , often call stop word , like static radio—alway present , alway , melodi seek .',
 'when remov stop word , ’ clean data ; sharpen focu .',
 'we transform understand languag chaotic mess clear meaning structur .',
 'in clariti , find pattern lead predict , insight drive decis , connect build knowledg .',
 'so let us embrac process stop word remov , mundan task , power step toward discov heart commun .',
 'let us approach excit uncov hidden mean , distil pure insight raw languag .',
 'reme

In [10]:
from nltk.stem import SnowballStemmer
snowballstemmer=SnowballStemmer('english')

In [11]:
## Apply Stopwords And Filter And then Apply Snowball Stemming

for i in range(len(sentences)):
    words=nltk.word_tokenize(sentences[i])
    words=[snowballstemmer.stem(word) for word in words if word not in set(stopwords.words('english'))]
    sentences[i]=' '.join(words)# converting all the list of words into sentences

In [12]:
sentences

['imagin tri find hidden treasur , path clutter useless debris—everi step slow , everi distract keep reach goal .',
 'this exact happen work languag data natur languag process .',
 'the beauti messag , power sentiment , core communication—al buri pile irrelev word add mean , insight , direct .',
 'but hope !',
 'just clear path reveal treasur , must cleans data reveal true essenc languag .',
 'and ?',
 'by remov nois .',
 'by strip away word frequent , common , blur line truli matter .',
 'these word , often call stop word , like static radio—alway present , alway , melodi seek .',
 'when remov stop word , ’ clean data ; sharpen focus .',
 'we transform understand languag chaotic mess clear meaning structur .',
 'in clariti , find pattern lead predict , insight drive decis , connect build knowledg .',
 'so let us embrac process stop word remov , mundan task , power step toward discov heart communic .',
 'let us approach excit uncov hidden mean , distil pure insight raw languag .',
 're

In [13]:
from nltk.stem import WordNetLemmatizer
lemmatizer=WordNetLemmatizer()

In [16]:
## Apply Stopwords And Filter And then Apply Snowball Stemming

for i in range(len(sentences)):
    #sentences[i]=sentences[i].lower()
    words=nltk.word_tokenize(sentences[i])
    words=[lemmatizer.lemmatize(word.lower(),pos='v') for word in words if word not in set(stopwords.words('english'))]
    sentences[i]=' '.join(words)# converting all the list of words into sentences

In [17]:
sentences

['imagin tri find hide treasur , path clutter useless debris—everi step slow , everi distract keep reach goal .',
 'exact happen work languag data natur languag process .',
 'beauti messag , power sentiment , core communication—al buri pile irrelev word add mean , insight , direct .',
 'hope !',
 'clear path reveal treasur , must clean data reveal true essenc languag .',
 '?',
 'remov nois .',
 'strip away word frequent , common , blur line truli matter .',
 'word , often call stop word , like static radio—alway present , alway , melodi seek .',
 'remov stop word , ’ clean data ; sharpen focus .',
 'transform understand languag chaotic mess clear mean structur .',
 'clariti , find pattern lead predict , insight drive decis , connect build knowledg .',
 'let us embrac process stop word remov , mundan task , power step toward discov heart communic .',
 'let us approach excit uncov hide mean , distil pure insight raw languag .',
 'rememb , ’ remov , uncov process .',
 'everi time elimin u