Home
The Dynashard gem provides an easy way to configure ActiveRecord 3.x models to shard across multiple databases.
ActiveRecord uses the model class to determine which database connection to use. By default, models will all use the connection from ActiveRecord::Base (generally the entry from config/databases.yml
with the same name as the current environment) but you may override the default by establishing a connection in your model:
class Widget < ActiveRecord::Base
establish_connection 'other_database'
end
This approach could be used for sharding by creating an entry per shard in config/database.yml
and creating a subclass per shard:
class Widget < ActiveRecord::Base ; end
# A widget that lives on shard1
class Shard1Widget < Widget
establish_connection 'shard1'
end
# A widget that lives on shard2
class Shard2Widget < Widget
establish_connection 'shard2'
end
# ...
By knowing which shard you want to use, you can use the corresponding subclass to manage models on that shard:
@widget = Shard1Widget.find(:first)
@other_widget = Shard2Widget.create(:name => 'My awesome widget')
This approach can become complex if you want to shard models based on information in other models. For example, if you have an association between users and widgets and want to shard by user, you might do something like this:
class User < ActiveRecord::Base ; end
class Widget < ActiveRecord::Base ; end
# A widget that lives on shard1
class Shard1Widget < Widget
establish_connection 'shard1'
belongs_to :shard1_user
end
# A widget that lives on shard2
class Shard2Widget < Widget
establish_connection 'shard2'
belongs_to :shard2_user
end
# A user whose widgets live on shard1
class Shard1User < User
has_many :shard1_widgets
end
# A user whose widgets live on shard2
class Shard2User < User
has_many :shard2_widgets
end
# ...
This can be difficult to maintain as the number of shards grows, and requires a change to config/database.yml
and an app restart whenever shards are added or removed.
Dynashard uses a similar approach, but generates shard classes and sharded model classes dynamically. The first time a shard is used by a model, Dynashard will create classes something like this:
class Dynashard::Shard0
establish_connection the_new_shard
end
class Dynashard::Shard0::Widget < Widget ; end
You never need to use the generated classes directly, but rather configure your models to be shard-aware and provide a shard context when using your model so that Dynashard can find the appropriate connection:
class Widget < ActiveRecord::Base
shard :by => :user
end
In the above example, all database access for the Widget model requires that a sharding context for :user
be specified. This can be done once to be used for all subsequent access, or can be done around a block:
# This context will be the same for any subsequent access. All models configured to shard :by => :planet
# will use the 'earth' connection from config/databases.yml.
Dynashard.shard_context[:planet] = 'earth'
@leader = Leader.find(:first)
# This context will only be used for access inside a block. All models configured to shard :by => :user
# will use the 'nickh' connection from config/databases.yml.
Dynashard.with_context(:user => 'nickh') do
@widgets = Widget.find(:all)
end
Dynashard supports sharded associations - for example, in the sample models above the User determines which shard the Widget should use.
class User < ActiveRecord::Base
shard :associated, :using => :shard_name
has_many :widgets
def shard_name
# return the name of an entry from config/databases.yml
end
end
class Widget
shard :by => :user
belongs_to :user
end
@user = User.find(:first)
@user.widgets.create(:name => 'This will be created on the database returned by @user.shard_name')
Shard context values can refer to entries in config/database.yml
but may also be hashes that contain database connection parameters:
class User < ActiveRecord::Base
shard :associated, :using => :shard_params
def shard_params
{
:adapter => 'sqlite3',
:database => "db/#{self.name}.sqlite3"
}
end
In addition, values can be any object that responds to :call
and returns either a reference to an entry in config/databases.yml
or a hash of database connection parameters: