# `spell_checker`

This is a simple `.ipynb` file meant to quickly demonstrate the capabilities of this module.

First of all, we must add the current directory to `PATH` in order to import the `spell_checker` module:

In [1]:
from os import getcwd
from sys import path as sys_path
sys_path.append(getcwd())
from spell_checker import *

Then, we must import modules of interest to perform this demonstration properly:

In [2]:
from time import time

With that out of the way, we can get down to business:

## Basic testing

### `CharacterTree`

In [3]:
a = CharacterTree("abacate","mamão","maniçoba","queijo")
print("Árvore criada com as palavras \"abacate\", \"mamão\",\"maniçoba\" e \"queijo\" nela.")
print("Há maniçoba nela? Resposta: {}.".format("maniçoba" in a))
print("Há abacate nela?  Resposta: {}.".format("abacate" in a))
print("E aba?            Resposta: {}.".format("aba" in a))
print("Então, adicionemos aba.")
a.insert("aba")
print("A palavra aba adicionada.")
print("Há aba na árvore? Resposta: {}.".format("aba" in a))
print("Perfeito.")

Árvore criada com as palavras "abacate", "mamão","maniçoba" e "queijo" nela.
Há maniçoba nela? Resposta: True.
Há abacate nela?  Resposta: True.
E aba?            Resposta: False.
Então, adicionemos aba.
A palavra aba adicionada.
Há aba na árvore? Resposta: True.
Perfeito.


### `RadixTree`

In [4]:
b = RadixTree("abacate","mamão","maniçoba","queijo")
print("Árvore criada com as palavras \"abacate\", \"mamão\",\"maniçoba\" e \"queijo\" nela.")
print("Há maniçoba nela? Resposta: {}.".format("maniçoba" in b))
print("Há abacate nela?  Resposta: {}.".format("abacate" in b))
print("E aba?            Resposta: {}.".format("aba" in b))
print("Então, adicionemos aba.")
b.insert("aba")
print("A palavra aba adicionada.")
print("Há aba na árvore? Resposta: {}.".format("aba" in b))
print("Perfeito.")

Árvore criada com as palavras "abacate", "mamão","maniçoba" e "queijo" nela.
Há maniçoba nela? Resposta: True.
Há abacate nela?  Resposta: True.
E aba?            Resposta: False.
Então, adicionemos aba.
A palavra aba adicionada.
Há aba na árvore? Resposta: True.
Perfeito.


## Spell-checking capabilities

After the afore-executed basic testing, it's time to get down to more interesting stuff, such as the actual spell-checking capability of this module. Although many more improvements can be made, the basic workings of the module shall remain as displayed below.

Under the `dictionaries` folder, a PT-BR dictionary can be found. It was downloaded from [@pythonbr/palavras](https://github.com/pythonprobr/palavras), and different versions were made from it. The ones that shall be used under this section are `palavras.txt` (the whole dictionary) and `palavras_sample.txt`. The sample file was made by randomly picking out 20% of the lines of the original file through the following function:

In [5]:
def sample_file(path,new_path,percentage=.20):
    """Randomly selects lines from files.
    
    Parameters
    ----------
    path : str (path)
        The path of the file to be sampled.
    new_path : str (path)
        The path of the file to be created with the chosen lines.
    percentage : float (default = .20)
        The percentage of lines to be chosen.
    """
    sample = []
    with open(path,"r",encoding="utf-8") as file:
        line = file.readline()
        while line != "":
            if random() < percentage: sample.append(line)
            line = file.readline()
    with open(new_path,"w",encoding="utf-8") as file:
        for item in sample: file.write(item)

### Loading a dictionary

This process takes roughly 30 seconds in my computer; your mileage may vary.

### `CharacterTree`

In [6]:
start_time = time()
print("Executing from_dict()...",end=" ")
ct_pt_br = from_dict("./dictionaries/palavras.txt","character")
final_time = time() - start_time
print("Time elapsed (in seconds): {}".format(final_time))
del(start_time,final_time)

Executing from_dict()... Time elapsed (in seconds): 29.969088077545166


In [7]:
ct_pt_br

<CharacterTree object>
320140 words loaded.
Available Initial Characters: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z, ª, µ, º, Á, Â, É, Ê, Í, Ò, Ó, Ô, Ø, Ú, à, á, â, ã, ç, é, ê, í, ó, ô, õ, ø, ú

### `RadixTree`

In [8]:
start_time = time()
print("Executing from_dict()...",end=" ")
rt_pt_br = from_dict("./dictionaries/palavras.txt","radix")
final_time = time() - start_time
print("Time elapsed (in seconds): {}".format(final_time))
del(start_time,final_time)

Executing from_dict()... Time elapsed (in seconds): 29.447091579437256


In [9]:
rt_pt_br

<RadixTree object>
320140 words loaded.
Available Initial Characters: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z, ª, µ, º, Á, Â, É, Ê, Í, Ò, Ó, Ônibus, Ø, Ú, à, á, â, ã, ç, é, ê, í, ó, ô, ões, ø, ú

### Loading a text file

Text files can be loaded up as well! Take a look:

### `CharacterTree`

In [10]:
start_time = time()
print("Executing from_txt()...",end=" ")
ct_constituicao = from_txt("./texts/constituicao.txt","CHARACTER")
final_time = time() - start_time
print("Time elapsed (in seconds): {}".format(final_time))
del(start_time,final_time)

Executing from_txt()... Time elapsed (in seconds): 2.799135684967041


In [11]:
ct_constituicao

<CharacterTree object>
7142 words loaded.
Available Initial Characters: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, z, º, À, Á, Â, É, Í, à, á, â, é, í, ó, ô, ú

### `RadixTree`

In [12]:
start_time = time()
print("Executing from_txt()...",end=" ")
rt_constituicao = from_txt("./texts/constituicao.txt","RADIX")
final_time = time() - start_time
print("Time elapsed (in seconds): {}".format(final_time))
del(start_time,final_time)

Executing from_txt()... Time elapsed (in seconds): 3.8580868244171143


In [13]:
rt_constituicao

<RadixTree object>
7133 words loaded.
Available Initial Characters: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Qu, R, S, T, U, V, W, X, Yunes, Z, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, z, º, À, Á, Ângelo, É, Í, à, á, âmbito, é, índi, ó, ônus, ú

### Spell-checking dictionary sample

`CharacterTree.check(path)` is set to return a list containing only the misspelled words (or the words not loaded in the dictionary). It is important to notice, however, that since `palavras.txt`, although contains words separated by `\n`, some of the words contained in some lines contain spaces, for whatever reason. As such, since they some times are not in the dictionary by themselves, but rather together in a composed word, they might be returned even though they are correctly written.

In [14]:
ct_misspellings, rt_misspellings = ct_pt_br.check("./dictionaries/palavras_sample.txt"), rt_pt_br.check("./dictionaries/palavras_sample.txt")
print("{} incorrect words found in the CharacterTree: {}".format(len(ct_misspellings), ct_misspellings))
print("{} incorrect words found in the RadixTree: {}".format(len(rt_misspellings), rt_misspellings))

0 incorrect words found in the CharacterTree: []
0 incorrect words found in the RadixTree: []
