Skip to content

Analyze string similarity using Levenshtein's distance.

Notifications You must be signed in to change notification settings

mcaputto/similitude

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

similitude

Description

similitude compares edit distances using Levenshtein's Distance.

Algorithmic complexity

Given a string of length m and a string of length n, similitude runs in O(m,n) time and O(min(m,n)) space.

Installation

make

Usage

similitude will compare lines in two files and print the edit distances to stdout.

Example

$ ./bin/similitude test/foo test/bar
19958 S BAKERS FERRY RD, 2387 PIMLICO DR, 20
19958 S BAKERS FERRY RD, 1706 22ND AVE, 20
19958 S BAKERS FERRY RD, 512 SE BASELINE ST, 16
etc.

Python extension

Requirements

Python version 3.6 or greater (due to f-strings).

Installation

pip install -r requirements.txt

Usage

similitude.py will compare lines in two files and load the edit distances into a pandas dataframe.

Example

$ python3 similitude.py test/foo test/bar
Pivot table:
                                                         distance
source             target
0104 SW LANE ST     1 CONDOLEA DR                              12
                    1 JEFFERSON PKWY  APT 266                  20
                    100 SW 195TH AVE SPC 13                    14
...                                                           ...
9906 SE REEDWAY ST  9510 S WILDCAT RD                          11
                    9517 SE 75TH AVE                           12
                    9532 SW WHITFORD LN                        14

[2000000 rows x 1 columns]

About

Analyze string similarity using Levenshtein's distance.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published