# WibblyWobbly

### Match data to a catalog

Import wibblywobbly and load your data and catalog as list. If you are using pandas use _.to_list()_.

In [1]:
import wibblywobbly as ww

catalog = ["Mouse", "Cat", "Dog", "Human"]
data = ["mice",  "CAT ", "doggo", "PERSON", 999]

WibblyWobbly compares the data to the catalog and returns the most likely options and a similarity score. 
If it cannot find a good match it will return the original data. 

WibblyWobbly automaticaly accepts the catalog options that have a higher similarity score than `thr_accept` and rejects those that have a lower score than `thr_reject`. This treshold values can be adjusted depending in the data quality. It ignores non-string values.

By default it returns a pandas dataframe that can be saved as a csv or excel file _.to_excel()_.

In [2]:
ww.map_list_to_catalog(data, catalog, thr_accept=95, thr_reject=40)

Unnamed: 0,Data,Option1,Score1,Option2,Score2,Option3,Score3
0,CAT,Cat,100,,,,
1,doggo,Dog,90,Mouse,20.0,Cat,0.0
2,mice,Mouse,44,Cat,29.0,Human,22.0
3,PERSON,PERSON,0,,,,
4,999,999,0,,,,


WibblyWobbly can also return a dictionary with the best options. This dictionary can be used to clean a pandas dataframe with _.replace()_ and _.map()_.

In [3]:
ww.map_list_to_catalog(data, catalog, output_format="dictionary")

{'doggo': 'Dog', 999: 999, 'PERSON': 'PERSON', 'CAT ': 'Cat', 'mice': 'mice'}

It is possible set a `reject_value`.

In [4]:
ww.map_list_to_catalog(data, catalog, output_format="dictionary", reject_value='Other')

{'doggo': 'Dog', 999: 999, 'PERSON': 'Other', 'CAT ': 'Cat', 'mice': 'Other'}

WibblyWobbly can also raise warnings of the suspicious values to facilitate visual inspection.

In [5]:
ww.map_list_to_catalog(data, catalog, output_format="dictionary", 
                       thr_accept=95, thr_reject=40,  warnings=True)

WOBBLY: doggo
	Options: Dog (90), Mouse (20), Cat (0)
WOBBLY: mice
	Options: Mouse (44), Cat (29), Human (22)


{'doggo': 'Dog', 999: 999, 'PERSON': 'PERSON', 'CAT ': 'Cat', 'mice': 'Mouse'}