# Solution 4 Improved

## Spell Checking using the [SymSpell delete spell checking algorithm](https://github.com/wolfgarbe/SymSpell) (Implemented by myself)

### Extract words from 'en_50k.txt' obtained from the [FrequencyWords](https://github.com/hermitdave/FrequencyWords/tree/master/content/2018/en) project and a list of common mispellings from [Wikipedia and Roger Mitton](https://norvig.com/ngrams/)

In [1]:
DICT = {}
with open('en_50k.txt', encoding="utf8") as f:
    content = f.readlines()
content = [x.strip() for x in content]
for x in content:
    s,c = x.split()
    DICT[s] = int(c)

COMMON = {}
with open('spell-errors.txt', encoding="utf8") as f:
    content = f.readlines()
content = [x.strip() for x in content]
for x in content:
    sd,sl = x.split(':')
    sl = sl.split(',')
    for y in sl:
        COMMON[y] = sd

MAP1 = {}
MAP2 = {}

### Created a function to generate a set of all words obtained by deleting one character from the correction word

In [2]:
def delete(word):
    splits     = [(word[:i], word[i:]) for i in range(len(word) + 1)]
    deletes    = [L + R[1:] for L, R in splits if R]
    return set(deletes)

### Created a function to populate the MAP1 and MAP2 dictionaries, which map words obtained by deleting 1/2 characters to their original words

In [3]:
def populate(word):
    set1 = delete(word)
    set2 = set([e2 for e1 in set1 for e2 in delete(e1)])
    for x in set1:
        MAP1.setdefault(x,[]).append(word)
    for x in set2:
        MAP2.setdefault(x,[]).append(word)

### Populating the MAP1 and MAP2 dictionaries

In [4]:
for w in DICT.keys():
    populate(w)

### Creating the function to take input

#### Here we are generating all the words obtainable by deleting one or two characters from the input word. Then we find the mapped list of the given words in the 'MAP1' and 'MAP2' dictionaries. Then we find the most common words from the aforementioned list.

In [5]:
def correction(inp):
    a = None
    b = None
    if inp in COMMON.keys():
        return COMMON[inp]
    if inp in DICT.keys():
        return inp
    s1 = list(delete(inp))+[inp]
    s2 = []
    for x in s1:
        if x in MAP1.keys():
            s2 += MAP1[x]
    if len(s2):
        a = max(s2, key=DICT.get)
        return a
    
    s1 = set(s1)
    s2 = set([e2 for e1 in s1 for e2 in delete(e1)])
    s3 = list(s1|s2)+[inp]
    s4 = []
    for x in s3:
        if x in MAP2.keys():
            s4 += MAP2[x]
    if len(s4):
        b = max(s4, key=DICT.get)
        return b
    
    return inp

### Testing the correction function (returns correction or the same word if nothing is found)

In [6]:
print("correction('wistle') ->", correction('wistle'))
print("correction('redifulous') ->", correction('redifulous'))
print("correction('explicitly') ->", correction('explicitly'))
print("correction('delinqent') ->", correction('delinqent'))
print("correction('nistages') ->", correction('nistages'))
print("correction('preprocessdsj') ->", correction('preprocessdsj'))
print("correction('dogsjh') ->", correction('dogsjh'))

correction('wistle') -> whistle
correction('redifulous') -> ridiculous
correction('explicitly') -> explicitly
correction('delinqent') -> delinquent
correction('nistages') -> mistakes
correction('preprocessdsj') -> preprocessdsj
correction('dogsjh') -> doings


### Return string with words in the same case as input

In [7]:
import re

def correct_text(text):
    return re.sub('[a-zA-Z]+', correct_match, text)

def correct_match(match):
    word = match.group()
    return case_of(word)(correction(word.lower()))

def case_of(text):
    return (str.upper if text.isupper() else
            str.lower if text.islower() else
            str.title if text.istitle() else str)

### Testing of the correct_text function

In [8]:
correct_text('Spellink Errurs IN somethink. Whutever; unusuel mistakez?')

'Spelling Errors IN something. Whatever; unusual mistakes?'