# Stop Words
Words like "a" and "the" appear so frequently that they don't require tagging as thoroughly as nouns, verbs and modifiers. We call these *stop words*, and they can be filtered from the text to be processed. spaCy holds a built-in list of some 305 English stop words.

In [1]:
# Perform standard imports:
import spacy
nlp = spacy.load('en_core_web_sm')

In [3]:
# Print the set of spaCy's default stop words (remember that sets are unordered):
print(nlp.Defaults.stop_words)

{'among', 'everything', 'herself', 'it', 'wherein', 'they', 'may', 'but', 'thereafter', 'per', "n't", 'meanwhile', 'one', 'six', "'ve", 'thus', 'whether', 'although', 'be', 'whom', 'everywhere', 'few', 'must', 'does', 'two', 'five', 'against', '‘d', 'make', 'whole', 'give', 'several', 'unless', 'the', 'go', 'eleven', 'down', 'when', '’ve', 'often', 'his', 'anyone', 'would', 'further', 'own', 'to', 'how', 'somewhere', 'least', 'sometimes', 'these', 'many', 'say', 'just', '‘m', 'some', 'side', 'across', 'in', 'who', 'you', 'anyhow', 'after', 'her', 'hundred', 'seeming', 'well', 'now', 'show', 'quite', 'twenty', 'whereafter', 'above', 'then', 'whither', 'whereby', 'full', 'below', 'almost', 'someone', 'latter', 'made', 'more', 'whereas', 'any', 'something', '’s', '‘s', 'ten', 'had', 'yourself', 'should', 'indeed', 'thereupon', 'can', 'become', 'yourselves', 'once', 'beyond', 'again', 'amount', 'used', 'or', 'been', 'into', 'could', 'this', 'former', 'using', 'themselves', 'we', 'up', 'dur

In [4]:
len(nlp.Defaults.stop_words)

326

## To see if a word is a stop word

In [5]:
nlp.vocab['myself'].is_stop

True

In [6]:
nlp.vocab['mystery'].is_stop

False

## To add a stop word
There may be times when you wish to add a stop word to the default set. Perhaps you decide that `'btw'` (common shorthand for "by the way") should be considered a stop word.

In [7]:
# Add the word to the set of stop words. Use lowercase!
nlp.Defaults.stop_words.add('btw')

# Set the stop_word tag on the lexeme
nlp.vocab['btw'].is_stop = True

In [8]:
len(nlp.Defaults.stop_words)

327

In [9]:
nlp.vocab['btw'].is_stop

True

<font color=green>When adding stop words, always use lowercase. Lexemes are converted to lowercase before being added to **vocab**.</font>

## To remove a stop word
Alternatively, you may decide that `'beyond'` should not be considered a stop word.

In [10]:
# Remove the word from the set of stop words
nlp.Defaults.stop_words.remove('beyond')

# Remove the stop_word tag from the lexeme
nlp.vocab['beyond'].is_stop = False

In [11]:
len(nlp.Defaults.stop_words)

326

In [12]:
nlp.vocab['beyond'].is_stop

False