# Stop Words
Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these *stop words*, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 305 English stop words.

In [1]:
# Perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

In [2]:
# Print the set of spaCy's default stop words (remember that sets are unordered):
print(nlp.Defaults.stop_words)

{'yet', 'among', 'amongst', 'least', 'this', 'hereafter', '’d', 'n‘t', 'when', 'see', 'from', 'or', 'ours', 'thereafter', 'onto', 'you', 'more', 'many', 'she', 'too', 'beyond', 'has', 'forty', 'everything', 'somewhere', '‘re', 'nowhere', 'ourselves', 'thereupon', 'still', 'neither', 'where', 'cannot', 'again', 'move', 'anywhere', 'him', 'any', 'it', 'not', 'front', 'few', 'if', 'either', '‘m', 'also', 'her', 'after', 'became', 'otherwise', 'well', 'together', 'yours', 'made', 'less', 'over', 'wherein', 'a', 'toward', 'do', 'on', 'were', 'he', 'however', 'am', 'below', 'how', 'whereupon', 'hereby', '’ll', 'mine', 'sometimes', 'wherever', 'whence', 'seeming', 'n’t', 'towards', 'others', "'s", 'empty', 'something', 'rather', 'please', 'just', 'nevertheless', 'done', 'an', '‘ll', 'quite', 'afterwards', '’s', '’re', 'almost', 'had', 'up', 'last', 'sixty', 'everyone', 'so', 'four', "'re", 'name', 'why', 'himself', 'via', 'between', 'unless', 'throughout', '’m', 'former', 'namely', 'behind', 

In [3]:
len(nlp.Defaults.stop_words)

326

## To see if a word is a stop word

In [4]:
nlp.vocab['myself'].is_stop

True

In [5]:
nlp.vocab['mystery'].is_stop

False

## To add a stop word
There may be times when you wish to add a stop word to the default set. Perhaps you decide that `'btw'` (common shorthand for "by the way") should be considered a stop word.

In [6]:
# Add the word to the set of stop words. Use lowercase!
nlp.Defaults.stop_words.add('btw')

# Set the stop_word tag on the lexeme
nlp.vocab['btw'].is_stop = True

In [7]:
len(nlp.Defaults.stop_words)

327

In [None]:
nlp.vocab['btw'].is_stop

True

<font color=green>When adding stop words, always use lowercase. Lexemes are converted to lowercase before being added to **vocab**.</font>

## To remove a stop word
Alternatively, you may decide that `'beyond'` should not be considered a stop word.

In [8]:
# Remove the word from the set of stop words
nlp.Defaults.stop_words.remove('beyond')

# Remove the stop_word tag from the lexeme
nlp.vocab['beyond'].is_stop = False

In [9]:
len(nlp.Defaults.stop_words)

326

In [10]:
nlp.vocab['beyond'].is_stop

False