# Exercise 4.2: Spell Check
Kevin King (Kevin.M.King.24@dartmouth.edu)<br>
Dartmouth College, LING48, Spring 2023

Link to sample code: http://norvig.com/spell-correct.html

My Implementation (`check_spelling` function below): 
* Starts by splitting the input text using a regular expression to find all the words in the sequence. It converts each word into lowercase so that the spell-checker is case-insensitive. The resulting variable `words` is a list of the words extracted from the input text.
* We then loop through each of word in the list of extracted words.
* We check if each word exists in the dictionary WORDS, which we assume contains correctly spelled (or valid) words. If not, then we call the `candidates()` function to generate a list of possible corrections for the misspelled word. 
* To test, I provided a way to prompt users for a sentence, and I also directly inputted the sample sentences that were given to us in the instructions.

#### Sample code from link

In [19]:
import re
from collections import Counter

In [20]:
def words(text): return re.findall(r'\w+', text.lower())

WORDS = Counter(words(open('cim-sentences.txt').read()))

def P(word, N=sum(WORDS.values())): 
    "Probability of `word`."
    return WORDS[word] / N

def correction(word): 
    "Most probable spelling correction for word."
    return max(candidates(word.lower()), key=P)

def candidates(word): 
    "Generate possible spelling corrections for word."
    return (known([word]) or known(edits1(word)) or known(edits2(word)) or [word])

def known(words): 
    "The subset of `words` that appear in the dictionary of WORDS."
    return set(w for w in words if w in WORDS)

def edits1(word):
    "All edits that are one edit away from `word`."
    letters    = 'abcdefghijklmnopqrstuvwxyz'
    splits     = [(word[:i], word[i:])    for i in range(len(word) + 1)]
    deletes    = [L + R[1:]               for L, R in splits if R]
    transposes = [L + R[1] + R[0] + R[2:] for L, R in splits if len(R)>1]
    replaces   = [L + c + R[1:]           for L, R in splits if R for c in letters]
    inserts    = [L + c + R               for L, R in splits for c in letters]
    return set(deletes + transposes + replaces + inserts)

def edits2(word): 
    "All edits that are two edits away from `word`."
    return (e2 for e1 in edits1(word) for e2 in edits1(e1))



#### Function: `check_spelling`

In [24]:
def check_spelling(input_text):
    # Split input text into words using a regular expression
    words = re.findall(r'\w+', input_text.lower())

    # Check each of the extracted words for possible misspellings
    for word in words:
        # if a word is not in the dictionary of WORDS
        if word not in WORDS:
            # possible corrections for a misspelled word
            possible_corrections = candidates(word)
            print("== Possible misspelling ==\n", word + ":", possible_corrections)

#### Prompt user for a sentence

In [21]:
# Prompt user for a sentence in Cook Islands Maori
input_text = input("Please write a sentence in Cook Islands Maori and press ENTER to check the spelling: \n")
check_spelling(input_text)

Please write a sentence in Cook Islands Maori and press ENTER to check the spelling: 
 Kia orana kotoo mai i Rarotoga!


== Possible misspelling ==
 kotoo: {'kotou'}
== Possible misspelling ==
 rarotoga: {'rarotonga'}


#### Sample sentences for testing (given in instructions)

In [25]:
sample1 = "Kia orana kotoo mai i Rarotoga!"
print("Sample Input 1: " + sample1)
check_spelling(sample1)

print("\n")

sample2 = "Kua aere au ki Mauke."
print("Sample Input 2: " + sample2)
check_spelling(sample2)

Sample Input 1: Kia orana kotoo mai i Rarotoga!
['kia', 'orana', 'kotoo', 'mai', 'i', 'rarotoga']
== Possible misspelling ==
 kotoo: {'kotou'}
== Possible misspelling ==
 rarotoga: {'rarotonga'}


Sample Input 2: Kua aere au ki Mauke.
['kua', 'aere', 'au', 'ki', 'mauke']
== Possible misspelling ==
 aere: {'qaere', 'tere', 'mere', 'rere'}
== Possible misspelling ==
 mauke: {'maquke'}
