# Fuzzy Matching

*Fuzzy Matching* or *Approximate Matching* is a name for any process by which we match some input query with a set of valid suggestions based on similarity. An example of fuzzy matching which you are all probably familiar with is spell check or auto-correct. In that case the typed word is the query which are approximately matched with correctly spelled words. There are several different algorithms that can preform fuzzy matching---the one we will be looking at is called an "n-gram index".

## N-grams

An *n-gram* is a sub-string with length *n* of some larger string. The idea behind the n-gram index is to create a table of substrings and the valid whole strings that contain those substrings. When a query is made, it will be broken down into n-grams as well, and a list of possible suggestions will be generated by looking up those n-grams in the n-gram index. Suggestions that share more and longer n-grams with the query are more likely to be the intended word or phrase.

Example: The word 'stop' has the following n-grams: ['stop', 'sto', 'top', 'st', 'to', 'op', 's', 't', 'o', 'p']

A query 'tsop' has several n-grams in common: ['op', 's', 't', 'o', 'p'] (besides ['tsop', 'tso', 'sop', 'ts, 'so'])

Aside: This is not the algorithm typically used for spell checking, though it has some similarities. The advantage of using an n-gram index is that it can be used for partial queries and does not require any information about the relative frequency of words or letters. This makes it a decent candidate for auto-completion rather than correction.

## Implementation

In [None]:
# You may need to replace this filename with your own list of words, one word per line
with open('/usr/share/dict/words', 'r') as wordfile:
    words = [word.strip() for word in wordfile.readlines()]

In [7]:
def ngrams(word: str) -> list[str]:
    return [word[i:i+n] for n in range(len(word), 0, -1) for i in range(0, len(word) - n + 1)]

In [8]:
ngrams('stop')

['stop', 'sto', 'top', 'st', 'to', 'op', 's', 't', 'o', 'p']

In [None]:
def build_index(words: list[str) -> dict[str, list]:
    pass

In [None]:
def fuzzy_match(query: str, index: dict[str, list]) -> list[str]:
    pass