A smarter way to have many-to-many relationships in Ruby on Rails.
Ruby
Pull request Compare This branch is 13 commits behind ebobby:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib
test
.gitignore
Gemfile
MIT-LICENSE
README.textile
Rakefile
has-many-with-set.gemspec

README.textile

has-many-with-set

A smarter way of doing many-to-many relationships in Ruby On Rails.

Introduction

Rails has two ways to model many-to-many relationships: has_and_belongs_to_many and has_many :through, this gem introduces a third one: has_many_with_set.

has_many_with_set is equivalent to has_and_belongs_to_many in functionality. It works only when you do not want information about a relationship but the relationship itself, behind the curtains though, they do not work anything alike, has_many_with_set is far more efficient in terms of data size as it reduces the redundancy that occurs in a normal many-to-many relationship when the cardinality is low, that is, the same combination occurs many times. For example, in a blog application, when many posts share the same tags.

How so?

The regular way of doing many-to-many relationships is using a join table to relate two tables, both ways of doing it in Ruby On Rails use this method, the only difference is the degree of control they give you on the “intermediary” table, one hides it from you (which is nice) and the other allows you to put more data in it besides the relationship, use validations, callbacks, etc.

The join table model is a very redundant way of storing these relationships if the same combination happens more than once because you have to create the same amount rows in the join table each time you save this combination for each different parent.

For example:

Tag.new(:name => 'programming').save
Tag.new(:name => 'open source').save
Tag.new(:name => 'startups').save
Tag.new(:name => 'ruby').save
Tag.new(:name => 'development').save

tags = Tag.all

1000.times do
  a = Article.new(:title => "Buzzword about buzzwords!",
                  :body => "Lorem ipsum")

  rand(tags.size + 1).times do
    t = tags[rand(tags.size)]
    a.tags << t
  end

  a.tags.uniq!
  a.save
end

ArticlesTags = Class.new(ActiveRecord::Base)
ArticlesTags.count # this class doesn't exist by default,
                   # I had to create it by hand for the example.
=> 1932

So we create five tags, and we create 1000 articles with a random combination of tags, not surprisingly, our join table has plenty of rows to represent all the relationships between our articles and their tags, if this were to behave linearly, if we had 1,000,000 articles we would have 1,932,000 rows just to represent the relationship.

This example (albeit a bit unrealistic) shows how redundant this is, even though we are using the same combination of tags over and over again we get more and more rows, if we are speaking about thousands it is not a big problem but when your databases grow to the hundreds of thousands or the millions, stuff like this starts to matter.

This is what this gem fixes, it makes sure that when you create a combination of items it is unique and it gets used as many times as its needed when requested again, like a set.

has-many-with-set is here to help.

Installation

Rails 3.x

To use it, add it to your Gemfile:

gem 'has-many-with-set'

That’s pretty much it!

Usage

To to use has-many-with-set to relate two already existing models you have to create the underlying tables that are going to be used by it, this is very easily done by generating a migration for them:

rails generate has_many_with_set:migration PARENT CHILD

And add the relationship to your parent model:

class Parent < ActiveRecord::Base
  has_many_with_set :children
end

And that’s it! You can start using it in your application. This can be done for as many models as you want, (you have to create migrations for all combinations!) you can even use multiple sets to relate different data to the same parent model (like Authors and Tags for your Articles).

Example

Using our previous example:

rails g has_many_with_set:migration Article Tag
      create  db/migrate/20121106063326_create_articles_tags_set.rb
class Article < ActiveRecord::Base
  attr_accessible :body, :title

  has_many_with_set :tags   # <--- key part!
end

Tag.new(:name => 'programming').save
Tag.new(:name => 'open source').save
Tag.new(:name => 'startups').save
Tag.new(:name => 'ruby').save
Tag.new(:name => 'development').save

tags = Tag.all

1000.times do
  a = Article.new(:title => "Buzzword about buzzwords!",
                  :body => "Lorem ipsum")

  rand(tags.size + 1).times do
    t = tags[rand(tags.size)]
    a.tags << t
  end

  a.tags.uniq!
  a.save
end

ArticlesTagsSetsTag = Class.new(ActiveRecord::Base)
ArticlesTagsSetsTag.count # this class doesn't exist by default,
                          # I had to create it by hand for the example.
=> 80

Article.first.tags
=> [#<Tag id: 1, name: "programming", ...>]

Article.last.tags
=> [#<Tag id: 1, name: "programming", ...>, #<Tag id: 5, name: "development", ...]

Same example as before, just now using has_many_with_set. We get the impressive number of 80 rows to represent the same information that we had before with thousands of rows (roughly the same, since we use random combinations is not exactly the same article/tag layout).

The funny thing in this particular example, is that since we have only five tags, there are only 32 possible ways to combine five tags together, these 32 combinations amount to 80 rows in our relationship table…. that is, even if we had a million articles we would still have the same 80 rows to represent our relationships, we don’t need to create any more rows!!

Final remarks

If you want to read a detailed post about how this works and what this gem does with some useful diagrams and sample queries, you can check out this blog post : Using sets for many-to-many-relationships.

Please keep in mind that has-many-with-set is not without some caveats:

  • It can only be used when you do not need to put extra information in the relationships rows since they are shared among many parents.
  • It is only effective when there is a high natural redundancy in your data, that is, when many sets can be shared among many parents.
  • Although the retrieval queries are the same as with regular has_and_belongs_to_many and have no extra cost, it does have a tiny bit of extra cost when saving or updating since we have to find or create a suitable set before actually saving the parent record to the database. This cost is probably negligible as opposed to writing all the time, but I can’t say it’s free.

This is one humble attempt to help make Ruby On Rails a bit more useful with large data sets and applications, I hope you enjoy it and is useful to you, please email me with comments or suggestions (or even code!).

Author

  • Francisco Soto <ebobby@ebobby.org>

Copyright © 2012 Francisco Soto (http://ebobby.org) released under the MIT license.