Skip to content

Latest commit

 

History

History
68 lines (48 loc) · 2.09 KB

README.rst

File metadata and controls

68 lines (48 loc) · 2.09 KB

image

image

image

Fuzzy is a python library implementing common phonetic algorithms quickly. Typically this is in string similarity exercises, but they're pretty versatile.

It uses C Extensions (via Cython) for speed.

The algorithms are:

Usage

The functions are quite easy to use!

>>> import fuzzy >>> soundex = fuzzy.Soundex(4) >>> soundex('fuzzy') 'F200' >>> dmeta = fuzzy.DMetaphone() >>> dmeta('fuzzy') ['FS', None] >>> fuzzy.nysiis('fuzzy') 'FASY'

Performance

Fuzzy's Double Metaphone was ~10 times faster than the pure python implementation by Andrew Collins in some recent testing. Soundex and NYSIIS should be similarly faster. Using iPython's timeit:

In [3]: timeit soundex('fuzzy')
1000000 loops, best of 3: 326 ns per loop

In [4]: timeit dmeta('fuzzy')
100000 loops, best of 3: 2.18 us per loop

In [5]: timeit fuzzy.nysiis('fuzzy')
100000 loops, best of 3: 13.7 us per loop

Distance Metrics

We recommend the Python-Levenshtein module for fast, C based string distance/similarity metrics. Among others functions it includes:

In testing it's been several times faster than comparable pure python implementations of those algorithms.