Load packages

`!pip3 install textdistance pyyaml requests` if packages not installed

In [7]:
import textdistance
import yaml
import requests

Load glossary from YAML file into dictionary

In [66]:
url = 'https://raw.githubusercontent.com/gvwilson/glossary/master/glossary.yml'

req = requests.get(url)

dictionary = yaml.load(req.text, Loader=yaml.FullLoader)

We create a key for each object so that we can search based on that key.

In [67]:
Terms = {term['slug']: term for term in dictionary}

If we type the word to search exactly as the slug in the Terms dictionary then we get a match

In [21]:
word_to_search = 'data_frame'

In [22]:
Terms[word_to_search]

{'slug': 'data_frame',
 'en': {'term': 'data frame',
  'def': 'A two-dimensional data structure for storing tabular data in memory. Rows represent [records](#record) and columns represent [variables](variable_data).\n'},
 'ref': ['tidy_data']}

But if we deviate the slightest from the slug structure we get a key error as the key is not found in the dictionary

In [68]:
word_to_search = 'data frame'

In [69]:
Terms[word_to_search]

KeyError: 'data frame'

To solve this, we implement `search_similar_word()` using the cosine distance and returning the value closest to the word to search

In [53]:
def search_similar_word(word_to_search):
    similarity_dict = {}
    
    for term in Terms.keys():
        similarity_dict[term] = textdistance.cosine.normalized_similarity(word_to_search, term)

    return max(similarity_dict, key = similarity_dict.get)

In [61]:
search_similar_word('data frame')

'data_frame'

This final logic is what we wish to introduce into the `glossary` package. First search for an exact match, if not closest match, if any other error pops-up print an error.

In [65]:
try:
    Terms[word_to_search]
except KeyError:
    term_found = search_similar_word(word_to_search)
    print(f'{word_to_search} wasn\'t founded, showing {term_found} \n')
    print(Terms[term_found])
except:
    print('This word hasn\'t been found in the dictionary.')

data frame wasn't founded, showing data_frame 

{'slug': 'data_frame', 'en': {'term': 'data frame', 'def': 'A two-dimensional data structure for storing tabular data in memory. Rows represent [records](#record) and columns represent [variables](variable_data).\n'}, 'ref': ['tidy_data']}
