Interactive command line utility for merging of two csv databases
Python
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
fuzzymatch
.gitignore
README.md
setup.py
test.py

README.md

fuzzymatch

Interactive command line utility to merge two tables based on text similarity of two columns.

How it works

fuzzymatch uses the Levenshtein package to compute similarity between strings taken from two csv files. In ambiguous cases it will ask the user to chose among top-guesses.

The resulting matches will be stored in a separate JSON file. You can cancel the merging process (ctrl-c) and proceed at the point you stopped later. If you run fuzzymatch on two csv files for the first time (which is when the json db doesn't exist) it will ask you which columns it should use for text matching.

$ fuzzymatch kek-presse.csv ivw-printauflagen.tsv out.json
Source table: kek-presse.csv
Target table: ivw-printauflagen.tsv

Confirm possible match for Aachener Nachrichten
   [0]: --skip--
   [1]: Aichacher Nachrichten (id: 1026810812, score: 0.878)
   [2]: Dachauer Nachrichten (id: 1472411012, score: 0.850)
   [3]: Schongauer Nachrichten (id: 1472411038, score: 0.810)
   [4]: Cuxhavener Nachrichten (id: 1276232800, score: 0.810)
   [5]: Schleswiger Nachrichten (id: 1201410400, score: 0.791)
   [6]: Rieser Nachrichten (id: 1026813000, score: 0.789)
   [7]: Schorndorfer Nachrichten (id: 1655012412, score: 0.773)
   [8]: Eckernförder Nachrichten (id: 1371211400, score: 0.773)
   [9]: Holsteiner Nachrichten (id: 1643212200, score: 0.762)

? default: [0]