Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
A distance based hash (one where similar input gives similar output, the opposite of a cryptographic hash), suitable for text applications.
Ruby C
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
bin
examples
ext
lib
spec
.gitignore
.travis.yml
Gemfile
README.md
Rakefile
nilsimsa.gemspec

README.md

nilsimsa

Build Status

Nilsimsa is a distance based hash, which is the opposite of more familiar hashes like MD5. Instead of small changes making a large difference in the resulting hash (to avoid collisions), distance based hashes cause similar values to have similar output. This is good for detecting near similar documents without having to store the original text.

Standard usage is as follows:

require 'nilsimsa'

n1 = Nilsimsa::new text1 = "The quick brown fox" n1.update(text1) puts "Text '#{text1}': #{n1.hexdigest}"

Something went wrong with that request. Please try again.