# Stop Words

Stop Words are words which do not contain important significance to be used in Search Queries. Usually, these words are filtered out from search queries because they return a vast amount of unnecessary information. Each programming language will give its own list of stop words to use. Mostly they are words that are commonly used in the English language such as 'as, the, be, are' etc.

Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these *stop words*, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 305 English stop words.

In [1]:
import spacy
nlp = spacy.load('en')

In [2]:
print(nlp.Defaults.stop_words)

{'themselves', 'just', 'down', 'such', 'except', 'each', 'anyhow', 'using', 'a', 'of', 'when', 'whenever', "'m", 'who', 'herself', 'wherever', 'seeming', 'doing', 'do', 'via', 'herein', 'by', 'his', 'might', 'between', '‘re', 'part', 'became', 'through', 'your', 'our', 'for', 'already', 'it', 'too', 'therein', 'done', 'now', 'after', 'everyone', 'others', 'thereupon', 'below', 'whatever', 'n’t', 'further', 'someone', 'amongst', 'amount', 'noone', 'nothing', 'toward', 'if', 'few', 'they', 'whereupon', 'but', '‘ve', 'whose', 'had', 'empty', 'back', 'please', 'their', 'cannot', 'itself', 'whereby', 'be', 'yourselves', 'have', 'then', 'as', 'serious', 'third', 'either', 'more', 'call', 'you', 'becoming', 'own', 'well', 'why', 'anything', 'get', 'and', 'due', 'seem', 'from', 'hence', 'there', 'hereupon', 'or', 'does', 'neither', 're', 'ten', 'never', 'onto', 'not', 'thereby', '‘ll', 'used', 'once', 'only', 'us', 'been', 'thus', 'still', 'twelve', 'in', 'that', 'fifty', 'other', 'sometimes',

In [3]:
len(nlp.Defaults.stop_words)

326

In [5]:
nlp.vocab['is'].is_stop

True

In [6]:
nlp.vocab['mystery'].is_stop

False

In [7]:
# Adding stopwords of your own

nlp.Defaults.stop_words.add('btw')

In [10]:
nlp.vocab['btw'].is_stop = True

In [11]:
nlp.vocab['btw'].is_stop

True

In [12]:
len(nlp.Defaults.stop_words)

327

In [13]:
# Remove a stop word

nlp.Defaults.stop_words.remove('beyond')

In [14]:
nlp.vocab['beyond'].is_stop = False

In [15]:
nlp.vocab['beyond'].is_stop

False