With the Python programming language, you have a myriad of options to use in order to remove stop words from strings. You can either use one of the several natural language processing libraries such as **NLTK, SpaCy, Gensim, TextBlob**, etc., or if you need full control on the stop words that you want to remove, you can write your own custom script.

Here in this current notebook, you will see a number of different the approaches, depending on the NLP library you're using.

**A) Stop Words with NLTK**

**B) Stop Words with Gensim**

**C) Stop Words with SpaCy**

**D) Custom Script to Remove Stop Words**

## A) Method 1: Using Python's NLTK Library

The NLTK library is one of the oldest and most commonly used Python libraries for Natural Language Processing. NLTK supports stop word removal, and you can find the list of stop words in the corpus module. To remove stop words from a sentence, you can divide your text into words and then remove the word if it exits in the list of stop words provided by NLTK.

*Let's see a below code:*

In [1]:
#import library
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
from nltk.tokenize import word_tokenize

# text data
text = 'This is a sample sentence, showing off the stop words filtration.'

# tokenized
text_tokens = word_tokenize(text)
print('\nTokens in text : ', text_tokens)

# applied the tokens in our data
tokens_without_stopwords = [word for word in text_tokens if not word in stopwords.words()]
print('Tokens without stopwords : ', tokens_without_stopwords)

print('\nOriginal text : ', text)
filtered_sentence = (" ").join(tokens_without_stopwords)
print('Filtered text : ', filtered_sentence)

[nltk_data] Downloading package stopwords to /usr/share/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!

Tokens in text :  ['This', 'is', 'a', 'sample', 'sentence', ',', 'showing', 'off', 'the', 'stop', 'words', 'filtration', '.']
Tokens without stopwords :  ['This', 'sample', 'sentence', ',', 'showing', 'stop', 'words', 'filtration', '.']

Original text :  This is a sample sentence, showing off the stop words filtration.
Filtered text :  This sample sentence , showing stop words filtration .


In the above code, we first import the **stopwords** collection from the **nltk.corpus** module. Next, we import the **word_tokenize()** method from the nltk.tokenize class. We then create a variable text, which contains a simple sentence. The sentence in the text variable is tokenized (divided into words) using the word_tokenize() method. Next, we iterate through all the words in the text_tokens list and checks if the word exists in the stop words collection or not. If the word doesn't exist in the stopword collection, it is returned and appended to the **tokens_without_sw** list. The tokens_without_sw list is then printed.

- You can see that the words **is, a, off,** and **the** have been removed from the sentence.

### A-1) Adding or Removing Stop Words in NLTK's Default Stop Word List
You can add or remove stop words as per your choice to the existing collection of stop words in NLTK. Before removing or adding stop words in NLTK, let's see the list of all the English stop words supported by NLTK

**A-1-a) Adding Stop Words to Default NLTK Stop Word List**

**A-1-b) Removing Stop Words from Default NLTK Stop Word List**

In [2]:
# list of stop words in English language
print(stopwords.words('english'))

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', '

### A-1-a) Adding Stop Words to Default NLTK Stop Word List
To add a word to NLTK stop words collection, first create an object from the **stopwords.words('english')** list. Next, use the **append()** method on the list to add any word to the list.

The following script adds the word play to the NLTK stop word collection. Again, we remove all the words from our text variable to see if the word **sample** is removed or not.

In [3]:
all_stopwords = stopwords.words('english')

# added our own stopword "sample" in list of pre-defined stopwords
all_stopwords.append('sample')

# tokenized
text_tokens = word_tokenize(text)
print('\nTokens in text : ', text_tokens)

# applied the tokens in our data
tokens_without_stopwords = [word for word in text_tokens if not word in all_stopwords]
print('Tokens without stopwords : ', tokens_without_stopwords)

print('\nOriginal text : ', text)
filtered_sentence = (" ").join(tokens_without_stopwords)
print('Filtered text : ', filtered_sentence)


Tokens in text :  ['This', 'is', 'a', 'sample', 'sentence', ',', 'showing', 'off', 'the', 'stop', 'words', 'filtration', '.']
Tokens without stopwords :  ['This', 'sentence', ',', 'showing', 'stop', 'words', 'filtration', '.']

Original text :  This is a sample sentence, showing off the stop words filtration.
Filtered text :  This sentence , showing stop words filtration .


- The output shows that the word **sample** has been removed.

You can also add a list of words to the stopwords.words list using the append method, as shown below:

In [4]:
# added our own stopwords in list of pre-defined stopwords
sw_list = ['Kunal', 'Kolhe', 'article']
all_stopwords.extend(sw_list)

# text data
text_data = 'This is how we are making our processed content more efficient by removing words that do \
not contribute to any future operations. This article is contributed by Kunal Kolhe.'

# tokenized
text_tokens = word_tokenize(text_data)
print('\nTokens in text : ', text_tokens)

# applied the tokens in our data
tokens_without_stopwords = [word for word in text_tokens if not word in all_stopwords]
print('Tokens without stopwords : ', tokens_without_stopwords)

print('\nOriginal text : ', text_data)
filtered_sentence = (" ").join(tokens_without_stopwords)
print('Filtered text : ', filtered_sentence)


Tokens in text :  ['This', 'is', 'how', 'we', 'are', 'making', 'our', 'processed', 'content', 'more', 'efficient', 'by', 'removing', 'words', 'that', 'do', 'not', 'contribute', 'to', 'any', 'future', 'operations', '.', 'This', 'article', 'is', 'contributed', 'by', 'Kunal', 'Kolhe', '.']
Tokens without stopwords :  ['This', 'making', 'processed', 'content', 'efficient', 'removing', 'words', 'contribute', 'future', 'operations', '.', 'This', 'contributed', '.']

Original text :  This is how we are making our processed content more efficient by removing words that do not contribute to any future operations. This article is contributed by Kunal Kolhe.
Filtered text :  This making processed content efficient removing words contribute future operations . This contributed .


- The script above adds two words **Kunal, Kolhe** and **article** to the stopwords.word

### A-1-b) Removing Stop Words from Default NLTK Stop Word List
Since stopwords.word('english') is merely a list of items, you can remove items from this list like any other list. The simplest way to do so is via the **remove()** method. This is helpful for when your application needs a stop word to not be removed. For example, you may need to keep the word not in a sentence to know when a statement is being negated.
- The following script removes the stop word **not** from the default list of stop words in NLTK:

In [5]:
all_stopwords = stopwords.words('english')

# added our own stopword "off"
all_stopwords.remove('off')

# tokenized
text_tokens = word_tokenize(text)
print('\nTokens in text : ', text_tokens)

# applied the tokens in our data
tokens_without_stopwords = [word for word in text_tokens if not word in all_stopwords]
print('Tokens without stopwords : ', tokens_without_stopwords)

print('\nOriginal text : ', text)
filtered_sentence = (" ").join(tokens_without_stopwords)
print('Filtered text : ', filtered_sentence)


Tokens in text :  ['This', 'is', 'a', 'sample', 'sentence', ',', 'showing', 'off', 'the', 'stop', 'words', 'filtration', '.']
Tokens without stopwords :  ['This', 'sample', 'sentence', ',', 'showing', 'off', 'stop', 'words', 'filtration', '.']

Original text :  This is a sample sentence, showing off the stop words filtration.
Filtered text :  This sample sentence , showing off stop words filtration .


- From the output, you can see that the word **off** has not been removed from the input sentence.

## B) Method 2: Using Python's Gensim Library
The Gensim library is another extremely useful library for removing stop words from a string in Python. All you have to do is to import the remove_stopwords() method from the **gensim.parsing.preprocessin**g module. Next, you need to pass your sentence from which you want to remove stop words, to the **remove_stopwords()** method which returns text string without the stop words.

*Let's see a below code:*

In [6]:
# import library
from gensim.parsing.preprocessing import remove_stopwords

# text data
text = 'This is a sample sentence, showing off the stop words filtration.'

# applied stop words on text
filtered_sentence = remove_stopwords(text)

print('\nOriginal text : ', text)
print('Filtered text : ', filtered_sentence)


Original text :  This is a sample sentence, showing off the stop words filtration.
Filtered text :  This sample sentence, showing stop words filtration.


It is important to mention that the output after removing stop words using the NLTK and Gensim libraries is different. For example, the Gensim library considered the word however to be a stop word while NLTK did not, and hence didn't remove it. This shows that there is no hard and fast rule as to what a stop word is and what it isn't. It all depends upon the task that you are going to perform.

### B-1) Adding and Removing Stop Words in Default Gensim Stop Words List
You can add or remove stop words as per your choice to the existing collection of stop words in Gensim. Before removing or adding stop words in Gensim, let's see the list of all the English stop words supported by Gnsim.

**B-1-a) Adding Stop Words to Default Gensim Stop Words List**

**B-1-b) Removing Stop Words from Default Gensim Stopword List**

In [7]:
import gensim
# list of stop words in English language
all_stopwords = gensim.parsing.preprocessing.STOPWORDS
print(all_stopwords)

frozenset({'into', 'doing', 'too', 'most', 'less', 'call', 'least', 'who', 'twelve', 'part', 'whence', 'what', 'yourselves', 'hence', 'in', 'and', 'amount', 'eleven', 'by', 'how', 'per', 'yourself', 'never', 'many', 'fill', 'couldnt', 'neither', 'formerly', 'hundred', 'however', 'herself', 'anything', 'km', 'name', 'ltd', 'your', 'anywhere', 'anyone', 'others', 'their', 'three', 'such', 'thence', 'them', 'ie', 'unless', 'several', 'whoever', 'behind', 'whatever', 'don', 'still', 'seemed', 'would', 'only', 'does', 'more', 'any', 'thick', 'hasnt', 'made', 'further', 'mine', 'empty', 'inc', 'give', 'also', 'everything', 'hereafter', 'upon', 'back', 'except', 'across', 'whereupon', 'always', 'quite', 'now', 'after', 'nor', 'those', 'either', 'will', 'noone', 'thereafter', 'anyhow', 'himself', 'an', 'myself', 'although', 'these', 'nevertheless', 'that', 'my', 'our', 'somewhere', 'often', 'from', 'interest', 'while', 'same', 'within', 'amongst', 'whole', 'forty', 'various', 'they', 'alone', 

### B-1-a) Adding Stop Words to Default Gensim Stop Words List
To access the list of Gensim stop words, you need to import the frozen set **STOPWORDS** from the **gensim.parsing.preprocessong** package. A frozen set in Python is a type of set which is immutable. You cannot add or remove elements in a frozen set. Hence, to add an element, you have to apply the **union** function on the frozen set and pass it the set of new stop words. The union method will return a new set which contains your newly added stop words, as shown below.

*See below code for stop words in Gensim:*

In [8]:
# import library
from gensim.parsing.preprocessing import STOPWORDS

# added our own stopword "sample" and "filtration" in list of pre-defined stopwords
all_stopwords_gensim = STOPWORDS.union(set(['sample', 'filtration']))

# tokenized
text_tokens = word_tokenize(text)
print('\nTokens in text : ', text_tokens)

# applied the tokens in our data
tokens_without_stopwords = [word for word in text_tokens if not word in all_stopwords_gensim]
print('Tokens without stopwords : ', tokens_without_stopwords)

print('\nOriginal text : ', text)
filtered_sentence = (" ").join(tokens_without_stopwords)
print('Filtered text : ', filtered_sentence)


Tokens in text :  ['This', 'is', 'a', 'sample', 'sentence', ',', 'showing', 'off', 'the', 'stop', 'words', 'filtration', '.']
Tokens without stopwords :  ['This', 'sentence', ',', 'showing', 'stop', 'words', '.']

Original text :  This is a sample sentence, showing off the stop words filtration.
Filtered text :  This sentence , showing stop words .


- From the output above, you can see that the words **sample** and **filtration** have been treated as stop words and consequently have been removed from the input sentence.

### B-2-a) Remove Stop Words to Default Gensim Stop Words List
To remove stop words from Gensim's list of stop words, you have to call the **difference()** method on the frozen set object, which contains the list of stop words. You need to pass a set of stop words that you want to remove from the **frozen set** to the difference() method. The difference() method returns a set which contains all the stop words except those passed to the difference() method.

*The following script removes the word **off** from the set of stop words in Gensim*

In [9]:
# import library
from gensim.parsing.preprocessing import STOPWORDS

all_stopwords_gensim = STOPWORDS
# added our own stopword "off"
sw_list = {"off"}
all_stopwords_gensim = STOPWORDS.difference(sw_list)

# text data
text = 'This is a sample sentence, showing off the stop words filtration.'

# tokenized
text_tokens = word_tokenize(text)
print('\nTokens in text : ', text_tokens)

# applied the tokens in our data
tokens_without_stopwords = [word for word in text_tokens if not word in all_stopwords_gensim]
print('Tokens without stopwords : ', tokens_without_stopwords)

print('\nOriginal text : ', text)
filtered_sentence = (" ").join(tokens_without_stopwords)
print('Filtered text : ', filtered_sentence)


Tokens in text :  ['This', 'is', 'a', 'sample', 'sentence', ',', 'showing', 'off', 'the', 'stop', 'words', 'filtration', '.']
Tokens without stopwords :  ['This', 'sample', 'sentence', ',', 'showing', 'off', 'stop', 'words', 'filtration', '.']

Original text :  This is a sample sentence, showing off the stop words filtration.
Filtered text :  This sample sentence , showing off stop words filtration .


- Since the word **off** has now been removed from the stop word set, you can see that it has not been removed from the input sentence after stop word removal.

## C) Method 3: Using the SpaCy Library
The SpaCy library in Python is yet another extremely useful language for natural language processing in Python.

- Need to install **SpaCy library** with **language model** as per business case. Several models exist in SpaCy for different languages. We will be installing the English language model. Execute the following command in your terminal:

In [10]:
# Spacy installation command
!pip install -U spacy

Collecting spacy
  Downloading spacy-3.0.3-cp37-cp37m-manylinux2014_x86_64.whl (12.7 MB)
[K     |████████████████████████████████| 12.7 MB 5.6 MB/s eta 0:00:01
Collecting catalogue<2.1.0,>=2.0.1
  Downloading catalogue-2.0.1-py3-none-any.whl (9.6 kB)
Collecting thinc<8.1.0,>=8.0.0
  Downloading thinc-8.0.1-cp37-cp37m-manylinux2014_x86_64.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 7.7 MB/s eta 0:00:01
Collecting srsly<3.0.0,>=2.4.0
  Downloading srsly-2.4.0-cp37-cp37m-manylinux2014_x86_64.whl (456 kB)
[K     |████████████████████████████████| 456 kB 7.5 MB/s eta 0:00:01
Installing collected packages: catalogue, srsly, thinc, spacy
  Attempting uninstall: catalogue
    Found existing installation: catalogue 1.0.0
    Uninstalling catalogue-1.0.0:
      Successfully uninstalled catalogue-1.0.0
  Attempting uninstall: srsly
    Found existing installation: srsly 1.0.5
    Uninstalling srsly-1.0.5:
      Successfully uninstalled srsly-1.0.5
  Attempting uninstall: thin

In [11]:
# English language model 'en' installation command
!python -m spacy download en

[38;5;3m⚠ As of spaCy v3.0, shortcuts like 'en' are deprecated. Pleaseuse the
full pipeline package name 'en_core_web_sm' instead.[0m
Collecting en-core-web-sm==3.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0-py3-none-any.whl (13.7 MB)
[K     |████████████████████████████████| 13.7 MB 6.2 MB/s eta 0:00:01
Installing collected packages: en-core-web-sm
  Attempting uninstall: en-core-web-sm
    Found existing installation: en-core-web-sm 2.3.1
    Uninstalling en-core-web-sm-2.3.1:
      Successfully uninstalled en-core-web-sm-2.3.1
Successfully installed en-core-web-sm-3.0.0
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [12]:
# load library
import spacy

# load language model
sp = spacy.load('en_core_web_sm')

# load stop words in Spacy
all_stopwords = sp.Defaults.stop_words

# text data
text = 'This is a sample sentence, showing off the stop words filtration.'

# tokenized
text_tokens = word_tokenize(text)
print('\nTokens in text : ', text_tokens)

# applied the tokens in our data
tokens_without_stopwords= [word for word in text_tokens if not word in all_stopwords]
print('Tokens without stopwords : ', tokens_without_stopwords)

print('\nOriginal text : ', text)
filtered_sentence = (" ").join(tokens_without_stopwords)
print('Filtered text : ', filtered_sentence)


Tokens in text :  ['This', 'is', 'a', 'sample', 'sentence', ',', 'showing', 'off', 'the', 'stop', 'words', 'filtration', '.']
Tokens without stopwords :  ['This', 'sample', 'sentence', ',', 'showing', 'stop', 'words', 'filtration', '.']

Original text :  This is a sample sentence, showing off the stop words filtration.
Filtered text :  This sample sentence , showing stop words filtration .


- In the code above we first load the language model and store it in the sp variable. The **sp.Default.stop_words** is a set of default stop words for English language model in SpaCy.

### C-1)Adding and Removing Stop Words in SpaCy Default Stop Word List

**C-1-a) Adding Stop Words to Default SpaCy Stop Words List**

**C-1-b) Removing Stop Words from Default SpaCy Stop Words List**

Like the other NLP and Gensim, you can also add or remove stop words from the default stop word list in Spacy. Before that, let's look on a list of all the existing stop words in SpaCy

In [13]:
# length of stop words in Spacy
print('Length of stop words in Spacy :', len(all_stopwords))
# stop words in Spacy
print('Stop words in Spacy : ', all_stopwords)

Length of stop words in Spacy : 326
Stop words in Spacy :  {'into', 'doing', 'most', 'too', 'less', 'call', 'least', 'who', 'twelve', 'part', 'whence', "'ve", 'what', 'yourselves', 'hence', 'in', 'and', 'amount', 'eleven', 'by', 'how', 'per', 'yourself', 'never', 'many', 'neither', 'formerly', 'hundred', 'however', 'herself', 'anything', 'name', 'your', 'anywhere', 'anyone', 'others', 'their', 'three', 'such', 'thence', 'them', 'unless', 'several', 'whoever', 'behind', "n't", 'whatever', 'still', 'seemed', 'would', 'does', 'only', 'any', 'more', 'made', 'further', 'mine', 'empty', 'give', 'also', 'n‘t', 'everything', 'hereafter', 'upon', 'back', 'except', 'across', 'whereupon', 'always', 'quite', 'now', 'after', 'nor', "'re", 'those', 'either', 'will', 'noone', 'thereafter', 'anyhow', 'himself', 'an', 'myself', 'although', 'nevertheless', 'these', 'that', 'my', 'our', '‘ll', 'somewhere', 'often', 'from', 'while', 'same', 'within', 'amongst', 'whole', '’d', 'forty', 'various', 'they', '

### C-1-a) Adding Stop Words to Default SpaCy Stop Words List
The SpaCy stop word list is basically a set of strings. You can add a new word to the set like you would add any new item to a **set**.

In [14]:
import spacy
sp = spacy.load('en_core_web_sm')

# load Spacy stop words and add our own stop word "sample"
all_stopwords = sp.Defaults.stop_words
all_stopwords.add("sample")

# text data
text = 'This is a sample sentence, showing off the stop words filtration.'

# tokenized
text_tokens = word_tokenize(text)
print('\nTokens in text : ', text_tokens)

# applied the tokens in our data
tokens_without_stopwords1 = [word for word in text_tokens if not word in all_stopwords]
print('Tokens without stopwords : ', tokens_without_stopwords1)

print('\nOriginal text : ', text)
filtered_sentence = (" ").join(tokens_without_stopwords1)
print('Filtered text : ', filtered_sentence)


Tokens in text :  ['This', 'is', 'a', 'sample', 'sentence', ',', 'showing', 'off', 'the', 'stop', 'words', 'filtration', '.']
Tokens without stopwords :  ['This', 'sentence', ',', 'showing', 'stop', 'words', 'filtration', '.']

Original text :  This is a sample sentence, showing off the stop words filtration.
Filtered text :  This sentence , showing stop words filtration .


- The output shows that the word **sample** has been removed from the input sentence.

You can also add multiple words to the list of stop words in SpaCy as shown below. The following script adds **Kunal** and **Kolhe** to the list of stop words in SpaCy

In [15]:
import spacy
sp = spacy.load('en_core_web_sm')

# load Spacy stop words and add our own stop word "sample"
all_stopwords = sp.Defaults.stop_words

all_stopwords = sp.Defaults.stop_words
all_stopwords |= {"Kunal","Kolhe",}

# text data
text_data = 'This is how we are making our processed content more efficient by removing words that do \
not contribute to any future operations. This article is contributed by Kunal Kolhe.'

# tokenized
text_tokens = word_tokenize(text_data)
print('\nTokens in text : ', text_tokens)

# applied the tokens in our data
tokens_without_stopwords1 = [word for word in text_tokens if not word in all_stopwords]
print('Tokens without stopwords : ', tokens_without_stopwords1)

print('\nOriginal text : ', text_data)
filtered_sentence = (" ").join(tokens_without_stopwords1)
print('Filtered text : ', filtered_sentence)


Tokens in text :  ['This', 'is', 'how', 'we', 'are', 'making', 'our', 'processed', 'content', 'more', 'efficient', 'by', 'removing', 'words', 'that', 'do', 'not', 'contribute', 'to', 'any', 'future', 'operations', '.', 'This', 'article', 'is', 'contributed', 'by', 'Kunal', 'Kolhe', '.']
Tokens without stopwords :  ['This', 'making', 'processed', 'content', 'efficient', 'removing', 'words', 'contribute', 'future', 'operations', '.', 'This', 'article', 'contributed', '.']

Original text :  This is how we are making our processed content more efficient by removing words that do not contribute to any future operations. This article is contributed by Kunal Kolhe.
Filtered text :  This making processed content efficient removing words contribute future operations . This article contributed .


- The ouput shows tha the words **Kunal** and **Kolhe** both have been removed from the input sentence.

### C-1-b) Removing Stop Words from Default SpaCy Stop Words List
To remove a word from the set of stop words in SpaCy, you can pass the word to remove to the **remove** method of the set.

The following script removes the word not from the set of stop words in SpaCy:

In [16]:
# import library
import spacy
sp = spacy.load('en_core_web_sm')

# load stop words
all_stopwords = sp.Defaults.stop_words
all_stopwords.remove('is')

# text data
text = 'This is a sample sentence, showing off the stop words filtration.'

# tokenized
text_tokens = word_tokenize(text)
print('\nTokens in text : ', text_tokens)

# applied the tokens in our data
tokens_without_stopwords = [word for word in text_tokens if not word in all_stopwords]
print('Tokens without stopwords : ', tokens_without_stopwords)

print('\nOriginal text : ', text)
filtered_sentence = (" ").join(tokens_without_stopwords)
print('Filtered text : ', filtered_sentence)


Tokens in text :  ['This', 'is', 'a', 'sample', 'sentence', ',', 'showing', 'off', 'the', 'stop', 'words', 'filtration', '.']
Tokens without stopwords :  ['This', 'is', 'sentence', ',', 'showing', 'stop', 'words', 'filtration', '.']

Original text :  This is a sample sentence, showing off the stop words filtration.
Filtered text :  This is sentence , showing stop words filtration .


- In the output, you can see that the word **is** has not been removed from the input sentence.

## D) Method 4: Using Custom Script to Remove Stop Words
- If you want full control over stop word removal, you can write your own script to remove stop words from your string.

The first step in this regard is to define a list of words that you want treated as stop words. Let's create a list of some of the most commonly used stop words

In [17]:
# stop word list (user defined)
my_stopwords_list = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 
                'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 
                'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 
                'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 
                'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 
                'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 
                'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 
                'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 
                'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 
                'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 
                'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', 
                "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 
                'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 
                'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", 'won', "won't", 'wouldn', "wouldn't"
               ]

# define a function that will accept a string as a parameter and will return the sentence without the stop words
def remove_mystopwords_fun(sentence):
    tokens = sentence.split(" ")
    tokens_filtered_a = [word for word in tokens if not word in my_stopwords_list]
    return (" ").join(tokens_filtered_a)

# remove stop words from a sample sentence
text = "Today is the 330th day since India implemented a nationwide lockdown to help curb the novel coronavirus pandemic.\
India's tally of COVID-19 cases rose to 1,09,50,201 with 12,881 new infections being reported in a day,while the recoveries \
have surged to1,06,56,845, according to Union Health Ministry data updated on Thursday. The death toll increased to1,56,014"

filtered_text_2 = remove_mystopwords_fun(text)
print('Original text : ', text)
print('\nFiltered text : ', filtered_text_2)

Original text :  Today is the 330th day since India implemented a nationwide lockdown to help curb the novel coronavirus pandemic.India's tally of COVID-19 cases rose to 1,09,50,201 with 12,881 new infections being reported in a day,while the recoveries have surged to1,06,56,845, according to Union Health Ministry data updated on Thursday. The death toll increased to1,56,014

Filtered text :  Today 330th day since India implemented nationwide lockdown help curb novel coronavirus pandemic.India's tally COVID-19 cases rose 1,09,50,201 12,881 new infections reported day,while recoveries surged to1,06,56,845, according Union Health Ministry data updated Thursday. The death toll increased to1,56,014


- Thanks for visiting. if you learned somethings new today, upvote and feel free to comments.