## Stopwords

#### In the below paragraph we have words like "to", "the", "of" which are not useful for analysis.
#### We will remove these stop words using nltk library.

In [None]:
paragraph = """ At its core, NLP is a branch of artificial intelligence that focuses on enabling computers to understand,
interpret, and respond to human language in a meaningful way. Whether it's text or speech, 
NLP gives machines the ability to interact with us just like another human might.
Think about how you use Siri, Google Assistant, or Alexa. Ever typed a sentence into Google Translate or spoken a message that was transcribed into text? All of this is made possible through NLP.
So, how does it work?
NLP combines linguistics, machine learning, and statistics. It breaks down language into parts—like tokens, syntax, semantics, and context—so computers can analyze them.
Tools like NLTK, spaCy, and models like GPT (yes, the same tech powering ChatGPT!) are used for tasks such as translation, summarization, and answering questions.
Of course, NLP isn’t perfect. Human language is complex—it’s full of sarcasm, slang, cultural nuances, and ambiguity. But with advances in deep learning and big data, NLP is rapidly improving.
In the coming years, NLP will become even more integrated into our lives—powering better communication tools, smarter assistants, and more intuitive ways of interacting with technology.
To conclude, NLP is not just about teaching machines how to read or talk—it’s about bridging the gap between humans and machines, making technology feel more natural and human-centric.
Thank you!
"""

In [None]:
## Importing the necessary library
from nltk.corpus import stopwords

In [None]:
## Downloading the stopwords dataset
import nltk
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/udmdev/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [None]:
## There will be different languages stopwords available in nltk library.
## Here you also can create your own list of stopwords., 
## in the below words, few words like not, we can't remove this kind of words, it represnet the negative meaning.
## Let's see what are the stopwords available in nltk library.
stopwords.words('english')

['a',
 'about',
 'above',
 'after',
 'again',
 'against',
 'ain',
 'all',
 'am',
 'an',
 'and',
 'any',
 'are',
 'aren',
 "aren't",
 'as',
 'at',
 'be',
 'because',
 'been',
 'before',
 'being',
 'below',
 'between',
 'both',
 'but',
 'by',
 'can',
 'couldn',
 "couldn't",
 'd',
 'did',
 'didn',
 "didn't",
 'do',
 'does',
 'doesn',
 "doesn't",
 'doing',
 'don',
 "don't",
 'down',
 'during',
 'each',
 'few',
 'for',
 'from',
 'further',
 'had',
 'hadn',
 "hadn't",
 'has',
 'hasn',
 "hasn't",
 'have',
 'haven',
 "haven't",
 'having',
 'he',
 "he'd",
 "he'll",
 'her',
 'here',
 'hers',
 'herself',
 "he's",
 'him',
 'himself',
 'his',
 'how',
 'i',
 "i'd",
 'if',
 "i'll",
 "i'm",
 'in',
 'into',
 'is',
 'isn',
 "isn't",
 'it',
 "it'd",
 "it'll",
 "it's",
 'its',
 'itself',
 "i've",
 'just',
 'll',
 'm',
 'ma',
 'me',
 'mightn',
 "mightn't",
 'more',
 'most',
 'mustn',
 "mustn't",
 'my',
 'myself',
 'needn',
 "needn't",
 'no',
 'nor',
 'not',
 'now',
 'o',
 'of',
 'off',
 'on',
 'once',
 'on

In [None]:
##Like as said before we have different languages stopwords available in nltk library.
stopwords.words('german')

['aber',
 'alle',
 'allem',
 'allen',
 'aller',
 'alles',
 'als',
 'also',
 'am',
 'an',
 'ander',
 'andere',
 'anderem',
 'anderen',
 'anderer',
 'anderes',
 'anderm',
 'andern',
 'anderr',
 'anders',
 'auch',
 'auf',
 'aus',
 'bei',
 'bin',
 'bis',
 'bist',
 'da',
 'damit',
 'dann',
 'der',
 'den',
 'des',
 'dem',
 'die',
 'das',
 'dass',
 'daß',
 'derselbe',
 'derselben',
 'denselben',
 'desselben',
 'demselben',
 'dieselbe',
 'dieselben',
 'dasselbe',
 'dazu',
 'dein',
 'deine',
 'deinem',
 'deinen',
 'deiner',
 'deines',
 'denn',
 'derer',
 'dessen',
 'dich',
 'dir',
 'du',
 'dies',
 'diese',
 'diesem',
 'diesen',
 'dieser',
 'dieses',
 'doch',
 'dort',
 'durch',
 'ein',
 'eine',
 'einem',
 'einen',
 'einer',
 'eines',
 'einig',
 'einige',
 'einigem',
 'einigen',
 'einiger',
 'einiges',
 'einmal',
 'er',
 'ihn',
 'ihm',
 'es',
 'etwas',
 'euer',
 'eure',
 'eurem',
 'euren',
 'eurer',
 'eures',
 'für',
 'gegen',
 'gewesen',
 'hab',
 'habe',
 'haben',
 'hat',
 'hatte',
 'hatten',
 '

Now will apply all differnet kind of steeming and lemmatization and check the result

In [None]:
## First will apply stemming
from nltk.stem import PorterStemmer

In [7]:
stemmer = PorterStemmer()

In [8]:
sentences = nltk.sent_tokenize(paragraph)

In [10]:
## Let's traverse through all the sentences, first apply a stopwords, 
# and whichever words are not present in the stop words, will take that and apply stemming


## Apply stowards and filter and then apply stemming
for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [stemmer.stem(word) for word in words if word not in set(stopwords.words('english'))] # using set to remove duplicates
    sentences[i] = ' '.join(words) # Converting all the words into a sentences


In [None]:
## Now we can see the senetences in which stopwords are removed and stemming is applied
sentences

['at core , nlp branch artifici intellig focus enabl comput understand , interpret , respond human languag meaning way .',
 "whether 's text speech , nlp give machin abil interact us like anoth human might .",
 'think use siri , googl assist , alexa .',
 'ever type sentenc googl translat spoken messag transcrib text ?',
 'all made possibl nlp .',
 'so , work ?',
 'nlp combin linguist , machin learn , statist .',
 'it break languag parts—lik token , syntax , semant , context—so comput analyz .',
 'tool like nltk , spaci , model like gpt ( ye , tech power chatgpt ! )',
 'use task translat , summar , answer question .',
 'of cours , nlp ’ perfect .',
 'human languag complex—it ’ full sarcasm , slang , cultur nuanc , ambigu .',
 'but advanc deep learn big data , nlp rapidli improv .',
 'in come year , nlp becom even integr lives—pow better commun tool , smarter assist , intuit way interact technolog .',
 'to conclud , nlp teach machin read talk—it ’ bridg gap human machin , make technolog 

In [12]:
## Now we will apply snowball stemming
from nltk.stem import SnowballStemmer
snowballStemmer = SnowballStemmer("english")

In [13]:
## Apply snowball stemming
for i in range(len(sentences)):
    words = nltk.word_tokenize(sentences[i])
    words = [snowballStemmer.stem(word) for word in words if word not in set(stopwords.words('english'))] # using set to remove duplicates
    sentences[i] = ' '.join(words) # Converting all the words into a sentences

In [14]:
sentences

['core , nlp branch artifici intellig focus enabl comput understand , interpret , respond human languag mean way .',
 "whether 's text speech , nlp give machin abil interact us like anoth human might .",
 'think use siri , googl assist , alexa .',
 'ever type sentenc googl translat spoken messag transcrib text ?',
 'made possibl nlp .',
 ', work ?',
 'nlp combin linguist , machin learn , statist .',
 'break languag parts—lik token , syntax , semant , context—so comput analyz .',
 'tool like nltk , spaci , model like gpt ( ye , tech power chatgpt ! )',
 'use task translat , summar , answer question .',
 'cour , nlp ’ perfect .',
 'human languag complex—it ’ full sarcasm , slang , cultur nuanc , ambigu .',
 'advanc deep learn big data , nlp rapid improv .',
 'come year , nlp becom even integr lives—pow better commun tool , smarter assist , intuit way interact technolog .',
 'conclud , nlp teach machin read talk—it ’ bridg gap human machin , make technolog feel natur human-centr .',
 'tha

In [None]:
## Will aplly lemmatization
from nltk.stem import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()

In [16]:
## Apply lemmatization
for i in range(len(sentences)):
    sentences[i] = sentences[i].lower() # Converting all the letters into small
    words = nltk.word_tokenize(sentences[i])
    words = [lemmatizer.lemmatize(word, pos='v') for word in words if word not in set(stopwords.words('english'))] # using set to remove duplicates
    sentences[i] = ' '.join(words) # Converting all the words into a sentences

In [17]:
sentences

['core , nlp branch artifici intellig focus enabl comput understand , interpret , respond human languag mean way .',
 "whether 's text speech , nlp give machin abil interact us like anoth human might .",
 'think use siri , googl assist , alexa .',
 'ever type sentenc googl translat speak messag transcrib text ?',
 'make possibl nlp .',
 ', work ?',
 'nlp combin linguist , machin learn , statist .',
 'break languag parts—lik token , syntax , semant , context—so comput analyz .',
 'tool like nltk , spaci , model like gpt ( ye , tech power chatgpt ! )',
 'use task translat , summar , answer question .',
 'cour , nlp ’ perfect .',
 'human languag complex—it ’ full sarcasm , slang , cultur nuanc , ambigu .',
 'advanc deep learn big data , nlp rapid improv .',
 'come year , nlp becom even integr lives—pow better commun tool , smarter assist , intuit way interact technolog .',
 'conclud , nlp teach machin read talk—it ’ bridg gap human machin , make technolog feel natur human-centr .',
 'than