a python library for doing approximate and phonetic matching of strings.
Jellyfish is a python library for doing approximate and phonetic matching of strings.

Written by James Turk <james.p.turk@gmail.com> and Michael Stephens.

See https://github.com/jamesturk/jellyfish/graphs/contributors for contributors.

Source is available at http://github.com/jamesturk/jellyfish.

Included Algorithms

String comparison:

  • Levenshtein Distance
  • Damerau-Levenshtein Distance
  • Jaro Distance
  • Jaro-Winkler Distance
  • Match Rating Approach Comparison
  • Hamming Distance

Phonetic encoding:

  • American Soundex
  • Metaphone
  • NYSIIS (New York State Identification and Intelligence System)
  • Match Rating Codex

Example Usage

>>> import jellyfish
>>> jellyfish.levenshtein_distance(u'jellyfish', u'smellyfish')
>>> jellyfish.jaro_distance(u'jellyfish', u'smellyfish')
>>> jellyfish.damerau_levenshtein_distance(u'jellyfish', u'jellyfihs')
>>> jellyfish.metaphone(u'Jellyfish')
>>> jellyfish.soundex(u'Jellyfish')
>>> jellyfish.nysiis(u'Jellyfish')
>>> jellyfish.match_rating_codex(u'Jellyfish')