Smartly merge multiple objects together
Ruby
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib
spec
.gitignore
.travis.yml
Gemfile
LICENSE
README.md
Rakefile
twins.gemspec

README.md

Twins

Twins sorts through the small differences between multiple objects and smartly consolidate all of them together.

Gem Version Code Climate Dependency Status Build Status twins API Documentation

Usage

Let's say you have a collection of objects representing the same book but from different sources, which brings the possibility for each object to be slightly different from one another.

books = [{
  title: "Shantaram: A Novel",
  author: "Gregory David Roberts",
  published: 2012,
  details: {
    paperback: true
  }
},
{
  title: "Shantaram",
  author: "Gregory David Roberts & Alejandro Palomas",
  published: 2012,
  details: {
    paperback: false
  }
},
{
  title: "Shantaram",
  author: "Gregory David Roberts",
  published: 2012,
  details: {
    paperback: true
  }
},
{
  title: "Shantaram",
  author: "Gregory D. Roberts",
  published: 2005,
  details: {
    paperback: true
  }
}]

Consolidate

Assembles a new Hash based on every elements in the collection. By default Twins#consolidate will determine the candidate values based on the most frequent value present for a given key, also known as the mode.

Twins.consolidate(books)
{
  title: "Shantaram",
  author: "Gregory David Roberts",
  published: 2012,
  details: {
    paperback: true
  }
}

You may also provide Twins#consolidate with priorities for String and Numeric attributes, which will precede on the mode while determining the canditate value.

options = {
  priority: {
    title: "Novel"
  }
}

Twins.consolidate(books, options)
{
  title: "Shantaram: A Novel",
  author: "Gregory David Roberts",
  published: 2012,
  details: {
    paperback: true
  }
}

Pick

Selects the collection's most representative element. By default Twins.pick will determine the candidate element based on the highest count of modes present for a given element.

Twins.pick(books)
{
  title: "Shantaram",
  author: "Gregory David Roberts",
  published: 2012,
  details: {
    paperback: true
  }
}

You may also provide Twins#pick with priorities for String and Numeric attributes, which will be used to compute each element's overall distance while determining the canditate element.

options = {
  priority: {
    title: "Novel"
  }
}

Twins.pick(books, options)
{
  title: "Shantaram: A Novel",
  author: "Gregory David Roberts",
  published: 2012,
  details: {
    paperback: true
  }
}

Internals

Distance

String distances are calculated using a longest subsequence algorithm and Numeric distances are calculated with their difference.

Contributing

  1. Fork it
  2. Create a topic branch
  3. Add specs for your unimplemented modifications
  4. Run bundle exec rspec. If specs pass, return to step 3.
  5. Implement your modifications
  6. Run bundle exec rspec. If specs fail, return to step 5.
  7. Commit your changes and push
  8. Submit a pull request
  9. Thank you!

TODO

  • Think about using jaccard to weight items

Author

Philippe Dionne

License

See LICENSE