# Stop Words
Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these *stop words*, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 305 English stop words.

In [1]:
# Perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
# Print the set of spaCy's default stop words (remember that sets are unordered):
print(nlp.Defaults.stop_words)

{'same', 'before', 'full', 'back', 'empty', 'due', 'whoever', 'although', 'take', 'i', 'seems', 'herself', 'though', 'all', 'our', 'various', 'whose', 'does', 'nine', 'you', 'twelve', 'nor', 'four', 'us', 'whom', 'sometimes', 'almost', 'were', 'both', 'at', 'also', 'alone', 'fifteen', 'such', "'d", 'can', '‘ve', 'please', 'less', 'too', 'whereupon', 'therefore', 'these', 'any', 'but', 'anyone', 'whence', 'him', 'well', 'either', 'first', '‘s', 'rather', 'else', 'least', 'an', 'to', 'along', 'am', 'beforehand', 'via', 'twenty', 'everything', 'third', 'as', 'is', 'within', 'bottom', 'whole', 'herein', 'made', 'thereafter', 'becomes', 'because', 'they', 'namely', "'re", 'anyhow', 'whatever', 'should', 'beyond', 'ever', 'above', 'top', 'move', '’m', 'your', 'may', 'afterwards', 'ourselves', 'hereafter', 'towards', 'be', 'already', 'seem', 'besides', 'further', 'doing', 'yourselves', 'hers', 'very', 'why', 'few', 'onto', 'hereby', 'hence', 'moreover', 'show', 'give', 'perhaps', 'another', '

In [3]:
# to see the count
len(nlp.Defaults.stop_words)

326

## To see if a word is a stop word

In [4]:
nlp.vocab['myself'].is_stop

True

In [5]:
nlp.vocab['mystery'].is_stop

False

## To add a stop word
There may be times when you wish to add a stop word to the default set. Perhaps you decide that `'btw'` (common shorthand for "by the way") should be considered a stop word.

In [6]:
# Add the word to the set of stop words. Use lowercase!
nlp.Defaults.stop_words.add('btw')

# Set the stop_word tag on the lexeme
nlp.vocab['btw'].is_stop = True

In [7]:
# Check the length now
len(nlp.Defaults.stop_words)

327

In [8]:
nlp.vocab['btw'].is_stop

True

<font color=blue>When adding stop words, always use lowercase. Lexemes are converted to lowercase before being added to **vocab**.</font>

## To remove a stop word
Alternatively, you may decide that `'beyond'` should not be considered a stop word.

In [9]:
# Remove the word from the set of stop words
nlp.Defaults.stop_words.remove('beyond')

# Remove the stop_word tag from the lexeme
nlp.vocab['beyond'].is_stop = False

In [12]:
# Check the length again
len(nlp.Defaults.stop_words)

326

In [13]:
nlp.vocab['beyond'].is_stop

False