Skip to content

subpath/pystrsim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pystrsim

Python wrapper for Rust's strsim library

Github repo: https://github.com/subpath/pystrsim

Installation:

pip install pystrsim

Usage:

import pystrsim

print(f"hamming: {pystrsim.hamming('hamming', 'hammers')} should be 3")
print(f"levenshtein: {pystrsim.levenshtein('kitten', 'sitting')} should be 3")
print(
    f"normalized_levenshtein: {pystrsim.normalized_levenshtein('kitten', 'sitting')} should be ~0.571"
)
print(f"osa_distance: {pystrsim.osa_distance('ac', 'cba')} should be 3")
print(f"damerau_levenshtein: {pystrsim.damerau_levenshtein('ac', 'cba')} should be 2")
print(
    f"normalized_damerau_levenshtein: {pystrsim.normalized_damerau_levenshtein('levenshtein', 'löwenbräu')} should be ~0.272"
)
print(
    f"jaro: {pystrsim.jaro('Friedrich Nietzsche', 'Jean-Paul Sartre')} should be ~0.392"
)
print(
    f"jaro_winkler: {pystrsim.jaro_winkler('cheeseburger', 'cheese fries')} should be ~0.911"
)
print(
    f"sorensen_dice: {pystrsim.sorensen_dice('web applications', 'applications of the web')} should be ~0.7878787878787878"
)

Is it blazingly fast?

Well, no : ) Jellyfish and Levenshtein are faster.

See the benchmark/benchmark.py file.

algorithm library function time
DamerauLevenshtein jellyfish damerau_levenshtein_distance 0.00593378
Hamming Levenshtein hamming 0.000683438
Hamming jellyfish hamming_distance 0.00112426
Jaro jellyfish jaro_similarity 0.00206124
JaroWinkler jellyfish jaro_winkler_similarity 0.00221943
Levenshtein Levenshtein distance 0.00115115
Levenshtein jellyfish levenshtein_distance 0.00257007
damerau_levenshtein pystrsim damerau_levenshtein 0.380067
hamming pystrsim hamming 0.0116847
jaro pystrsim jaro 0.0547281
jaro_winkler pystrsim jaro_winkler 0.057244
levenshtein pystrsim levenshtein 0.102525
normalized_damerau_levenshtein pystrsim normalized_damerau_levenshtein 0.389092
normalized_levenshtein pystrsim normalized_levenshtein 0.107314
osa_distance pystrsim osa_distance 0.15746
sorensen_dice pystrsim sorensen_dice 0.0973786

Releases

No releases published

Packages

No packages published