# `spell_checker`

This is a simple `.ipynb` file meant to quickly demonstrate the capabilities of this module.

First of all, we must add the current directory to `PATH` in order to import the `spell_checker` module:

In [1]:
from os import getcwd
from sys import path as sys_path
sys_path.append(getcwd())
from spell_checker import *

Then, we must import modules of interest to perform this demonstration properly:

In [2]:
from time import time

With that out of the way, we can get down to business:

## Basic testing

In [3]:
a = CharacterTree("abacate","mamão","maniçoba","queijo")
print("Árvore criada com as palavras \"abacate\", \"mamão\",\"maniçoba\" e \"queijo\" nela.")
print("Há maniçoba nela? Resposta: {}.".format("maniçoba" in a))
print("Há abacate nela?  Resposta: {}.".format("abacate" in a))
print("E aba?            Resposta: {}.".format("aba" in a))
print("Então, adicionemos aba.")
a.insert("aba")
print("A palavra aba adicionada.")
print("Há aba na árvore? Resposta: {}.".format("aba" in a))
print("Perfeito.")

Árvore criada com as palavras "abacate", "mamão","maniçoba" e "queijo" nela.
Há maniçoba nela? Resposta: True.
Há abacate nela?  Resposta: True.
E aba?            Resposta: False.
Então, adicionemos aba.
A palavra aba adicionada.
Há aba na árvore? Resposta: True.
Perfeito.


## Spell-checking capabilities

After the afore-executed basic testing, it's time to get down to more interesting stuff, such as the actual spell-checking capability of this module. Although many more improvements can be made, the basic workings of the module shall remain as displayed below.

Under the `dictionaries` folder, a PT-BR dictionary can be found. It was downloaded from [@pythonbr/palavras](https://github.com/pythonprobr/palavras), and different versions were made from it. The ones that shall be used under this section are `palavras.txt` (the whole dictionary) and `palavras_sample.txt`. The sample file was made by randomly picking out 20% of the lines of the original file through the following function:

In [4]:
def sample_file(path,new_path,percentage=.20):
    """Randomly selects lines from files.
    
    Parameters
    ----------
    path : str (path)
        The path of the file to be sampled.
    new_path : str (path)
        The path of the file to be created with the chosen lines.
    percentage : float (default = .20)
        The percentage of lines to be chosen.
    """
    sample = []
    with open(path,"r",encoding="utf-8") as file:
        line = file.readline()
        while line != "":
            if random() < percentage: sample.append(line)
            line = file.readline()
    with open(new_path,"w",encoding="utf-8") as file:
        for item in sample: file.write(item)

### Loading the dictionary in a CharacterTree object

This process, in my computer, takes roughly 30 seconds; your mileage, however, may vary.

In [5]:
start_time = time()
print("Executing from_csv()...",end=" ")
pt_br = from_csv("./dictionaries/palavras.txt")
final_time = time() - start_time
print("Time elapsed (in seconds): {}".format(final_time))
del(start_time,final_time)

Executing from_csv()... Time elapsed (in seconds): 23.443318843841553


In [6]:
pt_br

<CharacterTree object>
320139 words loaded.
Available Initial Characters: a, ª, à, á, ã, A, Á, â, Â, B, b, C, c, ç, D, d, e, é, E, Ê, É, ê, f, F, g, G, h, H, I, i, í, Í, j, J, K, k, L, l, M, m, n, N, o, º, Ò, ó, ô, O, Ó, õ, Ô, p, P, q, Q, r, R, S, s, T, t, u, U, ú, Ú, V, v, w, W, X, x, y, Y, z, Z, Ø, ø, µ

### Spell-checking dictionary sample

`CharacterTree.check(path)` is set to return a list containing only the misspelled words (or the words not loaded in the dictionary). It is important to notice, however, that since `palavras.txt`, although contains words separated by `\n`, some of the words contained in some lines contain spaces, for whatever reason. As such, since they some times are not in the dictionary by themselves, but rather together in a composed word, they might be returned even though they are correctly written.

In [8]:
misspellings = pt_br.check("./dictionaries/palavras_sample.txt")
print("{} incorrect words found: {}".format(len(misspellings), misspellings))

0 incorrect words found: []
