Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
fuzzy string matching library for ruby
Ruby Java Perl Shell
Pull request Compare This branch is 55 commits behind master.

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
benchmark
gemfiles
lib
original
test
.gemtest
.gitignore
.travis.yml
LICENSE.txt
README.md
Rakefile
VERSION.yml
fuzzy-string-match.gemspec
fuzzy-string-match_pure.gemspec

README.md

What is fuzzy-string-match

  • fuzzy-string-match is a fuzzy string matching library for ruby.
  • It is fast. ( written in C with RubyInline )
  • It supports only Jaro-Winkler distance algorithm.
  • This program was ported by hand from lucene-3.0.2. (lucene is Java product)
  • If you want to add another string distance algorithm, please port by yourself and contact me kiyoka@sumibi.org.

The reason why i developed fuzzy-string-match

  • I tried amatch-0.2.5, but it contains some issues.
    1. Some memory leaks.
    2. I felt difficult to maintain it.
  • So, I decide to create another gem by porting lucene-3.0.x.

Installing

  1. gem install fuzzy-string-match

Features

  • Calculate Jaro-Winkler distance of two strings.
    • Pure ruby version can handle both ASCII and UTF8 strings. (and slow)
    • Native version can only ASCII strings. (and fast)

Sample code

  • Native version

require 'fuzzystringmatch' jarow = FuzzyStringMatch::JaroWinkler.new.create( :native ) p jarow.getDistance( "jones", "johnson" )

  • Pure ruby version

require 'fuzzystringmatch' jarow = FuzzyStringMatch::JaroWinkler.new.create( :pure ) p jarow.getDistance( "ああ", "あい" )

Sample on irb

irb(main):001:0> require 'fuzzystringmatch' require 'fuzzystringmatch' => true

irb(main):002:0> jarow = FuzzyStringMatch::JaroWinkler.new.create( :native )
jarow = FuzzyStringMatch::JaroWinkler.new.create( :native )
=> #<FuzzyStringMatch::JaroWinklerNative:0x000001011b0010>

irb(main):003:0> jarow.getDistance( "al",        "al"        )
jarow.getDistance( "al",        "al"        )
=> 1.0

irb(main):004:0> jarow.getDistance( "dixon",     "dicksonx"  )
jarow.getDistance( "dixon",     "dicksonx"  )
=> 0.8133333333333332

Benchmarks

$ rake bench ruby ./benchmark/vs_amatch.rb --- --- Each match functions will be called 1Mega times. --- --- [Amatch] user system total real 1.160000 0.050000 1.210000 ( 1.218259) [this Module (pure)] user system total real 39.940000 0.160000 40.100000 ( 40.542448) [this Module (native)] user system total real 0.480000 0.000000 0.480000 ( 0.484187)

Requires

  • RubyInline
  • Ruby 1.9.1 or higher

Author

  • Copyright (C) Kiyoka Nishiyama kiyoka@sumibi.org
  • I ported from java source code of lucene-3.0.2.

See also

License

  • Apache 2.0 LICENSE
Something went wrong with that request. Please try again.