# Stop Words
Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these *stop words*, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 326 English stop words.

In [1]:
# Perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
# Print the set of spaCy's default stop words (remember that sets are unordered):
print(nlp.Defaults.stop_words)

{'hundred', 'into', 'thereupon', 'became', 'keep', 'whose', 'amongst', 'none', 'eleven', 'they', '‘m', 'back', 'eight', 'please', 'were', 'former', '’m', 'with', 'n’t', 'ours', 'done', 'behind', 'often', 'before', 'some', 'yet', 'now', 'for', "'s", 'also', 'upon', 'still', 'thru', 'once', 'whom', 'sometimes', 'either', 'together', 'by', 'being', 'throughout', 'next', 'hers', 'used', 'well', 'go', 'is', 'this', 'there', 'first', 'mostly', 'among', 'while', 'same', 'thus', 'do', 'few', 'see', 'itself', 'besides', 'whereby', 'than', 'twenty', "'d", 'almost', 'sixty', 'too', 'which', 'after', "n't", 'all', 'formerly', '’ve', 'further', "'m", 'again', 'nobody', 'beyond', 'serious', 'four', 'thence', 'meanwhile', 'bottom', 'already', 'whereafter', '‘ll', 'top', 'become', 'else', 'around', 'hence', 'himself', 'move', 'must', 'that', 'whoever', 'onto', 'six', 'without', 'so', 'above', 'nothing', 'may', 'many', 'thereby', 'yours', 'when', 're', 'an', 'nine', 'ca', 'everyone', 'what', 'in', 'whe

In [3]:
len(nlp.Defaults.stop_words)

326

## To see if a word is a stop word

In [4]:
nlp.vocab['myself'].is_stop

True

In [6]:
nlp.vocab['mystery'].is_stop

False

## To add a stop word
There may be times when you wish to add a stop word to the default set. Perhaps you decide that `'btw'` (common shorthand for "by the way") should be considered a stop word.

In [8]:
# Add the word to the set of stop words. Use lowercase!
nlp.Defaults.stop_words.add('btw')

# Set the stop_word tag on the lexeme
nlp.vocab['btw'].is_stop = True

In [9]:
len(nlp.Defaults.stop_words)

327

In [10]:
nlp.vocab['btw'].is_stop

True

<font color=green>When adding stop words, always use lowercase. Lexemes are converted to lowercase before being added to **vocab**.</font>

## To remove a stop word
Alternatively, you may decide that `'beyond'` should not be considered a stop word.

In [11]:
# Remove the word from the set of stop words
nlp.Defaults.stop_words.remove('beyond')

# Remove the stop_word tag from the lexeme
nlp.vocab['beyond'].is_stop = False

In [12]:
len(nlp.Defaults.stop_words)

326

In [13]:
nlp.vocab['beyond'].is_stop

False

Great! Now you should be able to access spaCy's default set of stop words, and add or remove stop words as needed.
## Next up: Vocabulary and Matching