# francois/jaccard

A library to make calculating the Jaccard Coefficient Index a snap
francois Merge pull request #3 from francois/jaccard/patch-1
`require 'jaccard' & ruby highlights`
Latest commit 8a2212e Jul 6, 2017
Type Name Latest commit message Commit time
Failed to load latest commit information.
autotest Nov 3, 2010
lib Nov 3, 2010
spec Nov 3, 2010
.document Nov 3, 2010
.gitignore Feb 24, 2012
.rvmrc Nov 3, 2010
Gemfile Nov 3, 2010
Rakefile Nov 3, 2010
jaccard.gemspec Feb 24, 2012
yardoc.watchr Nov 3, 2010

# Jaccard

The Jaccard Coefficient Index is a measure of how similar two sets are. This library makes calculating the coefficient very easy, and provides useful helpers.

# Examples

Calculate how similar two sets are:

```    require 'jaccard'

a = ["likes:jeans", "likes:blue"]
b = ["likes:jeans", "likes:women", "likes:red"]
c = ["likes:women", "likes:red"]

# Determines how similar a pair of sets are
Jaccard.coefficient(a, b)
#=> 0.25

Jaccard.coefficient(a, c)
#=> 0.0

Jaccard.coefficient(b, c)
#=> 0.6666666666666666

# According to the input data, b and c have the most similar likes.```

We can also extract the distance quite easily:

```    Jaccard.distance(a, b)
#=> 0.75```

The Jaccard distance is the inverse relation of the coefficient: `1 - coefficient`.

Find out which set is closest to a given set of attributes (return a value where the distance is the minimum):

```    Jaccard.closest_to(a, [b, c])
#=> ["likes:jeans", "likes:women", "likes:red"]

Jaccard.closest_to(b, [a, c])
#=> ["likes:women", "likes:red"]```

Finally, we can find the best pair in a set:

```    require "pp"
pp Jaccard.best_match([a, b, c])
# [["likes:jeans", "likes:women", "likes:red"],
#  ["likes:women", "likes:red"]]
#=> nil```

# Notes on scalability

This library wasn't designed to handle millions of entries. You'll have to benchmark and see if this library meets your needs.

# Note on Patches/Pull Requests

• Fork the project.