# Spelling Check in Python
This notebook demonstrates the use of Python packages to do spelling check/autocorrect. These packages include:
 - `contextualSpellCheck`
 - `autocorrect`
 - `textblob`
 
Each of these packages can be pip installed via:
 - `pip install contextualSpellCheck`
 - `pip install autocorrect`
 - `pip install textblob`
     - `textblob` requires the additional code: `python -m textblob.download_corpora` (worked on INL network for me)

## contextualSpellCheck
This package is used with spaCy in a spaCy pipeline. The focus is on Out of Vocabulary word or non-word error correction using the BERT model. It attempts to use the context when correcting spelling errors. It looks like there is development activity on this package.

There are two ways of including `contextualSpellCheck` in a spaCy pipeline. This package can check spelling, however, an additional step needs to be performed if the tokens should reflect the spell checking.

See also:

https://github.com/R1j1t/contextualSpellCheck

### Load contextualSpellCheck in a spaCy pipeline

In [1]:
import contextualSpellCheck
import spacy
nlp = spacy.load("en_core_web_sm")
print(f"nlp.pipe_names before adding contextualSpellCheck: {nlp.pipe_names}")
contextualSpellCheck.add_to_pipe(nlp)
print(f"\nnlp.pipe_names after  adding contextualSpellCheck: {nlp.pipe_names}")
doc = nlp("Income was $9.4 milion compared to the prior year of $2.7 milion")
print(f"\ncontextualSpellCheck added as extension: {doc._.contextual_spellCheck}")
print(f"\nspell check performed: {doc._.performed_spellCheck}")
print(f"\nsuggestions: {doc._.suggestions_spellCheck}")
print(f"\nspell check output (str): {doc._.outcome_spellCheck}")
print(f"\nspell check score: {doc._.score_spellCheck}")
print("\nTokens from spacy doc:")
for token in doc:
    print(f"    {token.text}")

nlp.pipe_names before adding contextualSpellCheck: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

nlp.pipe_names after  adding contextualSpellCheck: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner', 'contextual spellchecker']

contextualSpellCheck added as extension: True

spell check performed: True

suggestions: {milion: 'million', milion: 'million'}

spell check output (str): Income was $9.4 million compared to the prior year of $2.7 million

spell check score: {milion: [('million', 0.42275), (',', 0.23874), ('mi', 0.13952), ('billion', 0.10526), ('trillion', 0.01076), ('%', 0.00577), ('Mi', 0.00542), ('##M', 0.00512), ('.', 0.00276), ('Million', 0.00268)], milion: [('.', 0.99228), (';', 0.00648), ('!', 0.00054), ('million', 0.0003), ('billion', 0.00014), ('?', 5e-05), ('...', 2e-05), ('|', 2e-05), ('%', 1e-05), (',', 1e-05)]}

Tokens from spacy doc:
    Income
    was
    $
    9.4
    milion
    compared
    to
    the
    prior
    

### Add contextualSpellCheck to spaCy pipeline manually

In [2]:
import spacy
import contextualSpellCheck

nlp = spacy.load("en_core_web_sm")
print(f"nlp.pipe_names before adding contextualSpellCheck: {nlp.pipe_names}")
nlp.add_pipe("contextual spellchecker")
print(f"\nnlp.pipe_names after adding contextualSpellCheck: {nlp.pipe_names}")
doc = nlp("Income was $9.4 milion compared to the prior year of $2.7 milion")
print(f"\ncontextualSpellCheck added as extension: {doc._.contextual_spellCheck}")
print(f"\nspell check performed: {doc._.performed_spellCheck}")
print(f"\nsuggestions: {doc._.suggestions_spellCheck}")
print(f"\nspell check output (str): {doc._.outcome_spellCheck}")
print(f"\nspell check score: {doc._.score_spellCheck}")
print("\nTokens from spacy doc:")
for token in doc:
    print(f"    {token.text}")

nlp.pipe_names before adding contextualSpellCheck: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']

nlp.pipe_names after adding contextualSpellCheck: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner', 'contextual spellchecker']

contextualSpellCheck added as extension: True

spell check performed: True

suggestions: {milion: 'million', milion: 'million'}

spell check output (str): Income was $9.4 million compared to the prior year of $2.7 million

spell check score: {milion: [('million', 0.42275), (',', 0.23874), ('mi', 0.13952), ('billion', 0.10526), ('trillion', 0.01076), ('%', 0.00577), ('Mi', 0.00542), ('##M', 0.00512), ('.', 0.00276), ('Million', 0.00268)], milion: [('.', 0.99228), (';', 0.00648), ('!', 0.00054), ('million', 0.0003), ('billion', 0.00014), ('?', 5e-05), ('...', 2e-05), ('|', 2e-05), ('%', 1e-05), (',', 1e-05)]}

Tokens from spacy doc:
    Income
    was
    $
    9.4
    milion
    compared
    to
    the
    prior
    y

## autocorrect
This package takes in raw text and returns a string. The documentation is almost nonexistant but it is straightforward to use.

See also:

https://github.com/filyp/autocorrect

In [3]:
import autocorrect
spell = autocorrect.Speller()
text = "I'm not sleapy and tehre is no place I'm giong to."
corrected_text = spell(text)
print(type(corrected_text))
print(corrected_text)

<class 'str'>
I'm not sleepy and there is no place I'm going to.


## textblob
This package can perform a number of NLP tasks including:
- POS tagging
- Noun phrase extraction
- Sentiment analysis
- Tokenization
- Word inflection and lemmatization
- WordNet integration
- WordLists
- Spelling correction
- Word and noun phrase frequencies
- Parsing
- n-grams
- Start and end indices of sentences

The initial text string is converted into a textblob object and would need to be converted back to a string to pass into spaCy.

See also:

https://github.com/sloria/textblob

In [4]:
from textblob import TextBlob
from textblob import Sentence
text = "A sentencee to checkk!"
# using a TextBlob
blob = TextBlob(text)
result = blob.correct()
print(type(result))
print(result)
print(str(result))
# cast to string for use in spaCy
result2 = str(result)
# using a Sentence
sentence = Sentence(text)
print(str(sentence.correct()))

<class 'textblob.blob.TextBlob'>
A sentence to check!
A sentence to check!
A sentence to check!
