# Stop Words
Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these *stop words*, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 305 English stop words.

In [0]:
# Perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
# Print the set of spaCy's default stop words (remember that sets are unordered):
print(nlp.Defaults.stop_words)
# or alternatively
print(spacy.lang.en.stop_words.STOP_WORDS)

{'since', 'yours', 'into', 'most', 'then', 'cannot', 'being', 'than', 'itself', 'rather', 'twenty', 'few', 'were', 'up', 'yourselves', 'hereafter', 'his', 'much', 'serious', 'nothing', 'thereupon', 'top', 'almost', 'of', "'ll", 'formerly', 'perhaps', 'thereafter', 'it', 'nobody', 'really', 'somewhere', 'toward', 'two', 'them', 'across', 'themselves', 'often', 'only', 'although', 'made', 'quite', 'via', 'we', 'its', 'the', 'give', 'three', 'to', 'between', 'never', 'from', 'what', 'still', 'get', 'not', 'will', 'something', 'because', 'am', 'among', 'either', 'here', 'might', 'beside', 'nowhere', 'around', 'they', 'whenever', 'sixty', 'well', 'about', 'whose', 'i', 'off', 'see', 'become', 'thus', "'d", 'while', '‘d', 'him', 'or', 'would', 'that', 'is', 'someone', 'anywhere', "'s", 'first', 'once', 'towards', 'call', 'go', 'nine', 'due', 'sometimes', 'this', 'four', 'in', 'ten', 'many', 'same', 'least', 'noone', 'seem', 'how', 'nor', 'side', 'another', 'are', 'six', 'so', 'until', 'can',

In [0]:
len(nlp.Defaults.stop_words)

305

## To see if a word is a stop word

In [0]:
nlp.vocab['myself'].is_stop

True

In [0]:
nlp.vocab['mystery'].is_stop

False

## To add a stop word
There may be times when you wish to add a stop word to the default set. Perhaps you decide that `'btw'` (common shorthand for "by the way") should be considered a stop word.

In [0]:
# Add the word to the set of stop words. Use lowercase!
nlp.Defaults.stop_words.add('btw')

# Set the stop_word tag on the lexeme, it is actually a container that can store word as word type and not word token
# hence a lexeme can't have pos tag, lemmas and other stuff 
nlp.vocab['btw'].is_stop = True

In [21]:
len(nlp.Defaults.stop_words)

327

In [22]:
nlp.vocab['btw'].is_stop

True

When adding stop words, always use lowercase. Lexemes are converted to lowercase before being added to **vocab**

## To remove a stop word
Alternatively, you may decide that `'beyond'` should not be considered a stop word.

In [0]:
# Remove the word from the set of stop words
nlp.Defaults.stop_words.remove('beyond')

# Remove the stop_word tag from the lexeme
nlp.vocab['beyond'].is_stop = False

In [0]:
len(nlp.Defaults.stop_words)

305

In [0]:
nlp.vocab['beyond'].is_stop

False