Name		Name	Last commit message	Last commit date
parent directory ..
data		data
failed_parses		failed_parses
image/README		image/README
incorrect_fields		incorrect_fields
manual_replace		manual_replace
README.md		README.md
add_corrections.js		add_corrections.js
approx_match.js		approx_match.js
correct_areas.js		correct_areas.js
correct_categories.js		correct_categories.js
correct_dates.js		correct_dates.js
correct_destinations.js		correct_destinations.js
correct_names.js		correct_names.js
helpers.js		helpers.js
index.js		index.js
run_corrections.js		run_corrections.js
simplify_destination.js		simplify_destination.js
simplify_name.js		simplify_name.js

README.md

Report Corrections

This folder contains methods used to correct column values for each report. These corrections are then applied to each report as it is fetched or can be manually applied to the pre-existing raw reports.

There's also tooling to add manual corrections for edge cases that can't be automatically corrected and quite a lot of approximate string matching algorithms to facilitate corrections.

The purpose of each file is given below:

`approx_match.js`

This file gives some common algorithms for appoximate string matching and some custom ones that I came up with for specific situations.

`edit_distance`

When defining our edit distance we include transpositions (A Damerau–Levenshtein distance) and allow the costs of each edit to be configured, i.e. we can set the cost of additions to 1.5 instead of the default 1.

My implementation also returns the full edit distance array, for other algorithms to use.

`min_edit_slice`

Whilst cutting edge algorithms (using bitwise automatons) can acheive quadratic ($O(mn)$) complexities, I basically had no idea how they worked. So I just implemented a decently efficient ($O(m^2 n)$) implementation that reuses the full edit distance array for each slice start position.

My implementation also returns the slice with the minimum distance, the location of this slice, the distance and full edit distance array for this slice.

`hierachic_match`

When attempting matching to use string matching on sentences, it can be useful to first compare words, then compare individual letters. For this purpose, we can define a hierachical edit distance as the edit distance in words between the two sentences, ignoring any replacements where the edit distance between words is sufficiently small.

Our implementation also includes:

the ability to ignore case
the ability to ignore words from word distance
configuration for the number of allowed word and letter errors
whether to pick a slice match or a full match

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

correct

correct

README.md

Report Corrections

`approx_match.js`

`edit_distance`

`min_edit_slice`

`hierachic_match`

Files

correct

Directory actions

More options

Directory actions

More options

Latest commit

History

correct

Folders and files

parent directory

README.md

Report Corrections

approx_match.js

edit_distance

min_edit_slice

hierachic_match

`approx_match.js`

`edit_distance`

`min_edit_slice`

`hierachic_match`