Permalink
Browse files

Add support for read-only slave/writable master databases and databas…

…e sharding

The support is described in the sharding.rdoc file included with this
commit. This commit makes significant changes to every adapter in
order to support this new functionality.  I only have the ability to
test PostgreSQL, MySQL, and SQLite (both via the native drivers and
via JDBC), so it's possible I could have broken something on other
adapters.  If you use another adapter, please test this and see if it
breaks anything.  I try to be fairly careful whenever I change
something I can't test, but it's always possible I made an error.

This commit makes the following internal changes:

* The Database and Dataset execute and execute_dui methods now take
  an options hash.  The prepared statement support was integrated
  into this hash, resulting in a simpler implementation.
* The connection pool internals were changed significantly to allow
  connections to different servers.  The previous methods all still
  work the same way, but now take an optional server argument
  specifying which server to use.
* Many low_level methods (transaction, test_connection, synchronize,
  tables) take an optional server argument to specify the server to
  use.
* Some adapter database and dataset methods were made private.
* Adapter Dataset #fetch_rows methods that used Database#synchronize
  explicitly were modified to use Dataset#execute with a block.
  Adapter Database #execute methods were modified for these adapters
  to yield inside of #synchronize.
* Database#connect now requires a server argument.  The included
  adapters use this with the new private Database#server_opts method
  that allows overriding the default opts with the server specific
  opts.
* The JDBC and MySQL adapters were significantly refactored.
* The PostgreSQL adapter #execute_insert now takes a hash of options
  instead of table and values arguments.
* Adapters with specific support for named prepared statements now
  consider the use of a symbol as the first argument to execute
  to indicant the call of a prepared statement.  The
  execute_prepared_statement method in these adapters is now private.
* Adapter execute_select statements were removed in place of execute,
  with the original use of execute changed to execute_dui.  This
  follows the convention of using execute for SELECT queries, and
  execute_dui for DELETE/UPDATE/INSERT queries.
* Removes adapter_skeleton adapter.  The existing adapters provide
  better examples of how things should be done compared to this
  example file.
* No longer defines model methods for non-public dataset methods
  specified in plugins.
  • Loading branch information...
1 parent 7b2cee2 commit 7aeea22dd348f55341cb2bb99b462ee6d5ab564d @jeremyevans committed Aug 4, 2008
View
@@ -1,5 +1,7 @@
=== HEAD
+* Add support for read-only slave/writable master databases and database sharding (jeremyevans)
+
* Remove InvalidExpression, InvalidFilter, InvalidJoinType, and WorkerStop exceptions (jeremyevans)
* Add prepared statement/bound variable support (jeremyevans)
View
@@ -0,0 +1,113 @@
+= Read-Only Slaves/Writable Master and Database Sharding
+
+Starting with version 2.4.0, Sequel has support for read only slave databases
+with a writable master database, as well as database sharding (where you can
+pick a database connection to use for a given dataset). Support for both
+features is database independent, and should work for all database adapters
+included with Sequel.
+
+== The :servers Database option
+
+Both features use the new :servers Database option. The :servers option should
+be a hash with symbol keys and values that are either hashes or procs that
+return hashes. Note that all servers should have the same schema, unless you
+really know what you are doing.
+
+== Master and Slave Database Configurations
+
+=== Single Read-Only Slave, Single Master
+
+To use a single, read-only slave that handles SELECT queries, the following
+is the simplest configuration:
+
+ DB=Sequel.connect('postgres://master_server/database', \
+ :servers=>{:read_only=>{:host=>'slave_server'}})
+
+This will use the host slave_server for SELECT queries and master_server for
+other queries.
+
+=== Multiple Read-Only Slaves, Single Master
+
+Let's say you have 4 slave database servers with names slave_server0,
+slave_server1, slave_server2, and slave_server3.
+
+ DB=Sequel.connect('postgres://master_server/database', \
+ :servers=>{:read_only=>proc{|db| :host=>db.get_slave_host}})
+ def DB.get_slave_host
+ @current_host ||= -1
+ "slave_server#{(@current_host+=1)%4}"
+ end
+
+This will use one of the slave servers for SELECT queries and use the
+master_server for other queries. It's also possible to pick a random host
+instead of using the round robin approach presented above, but that can result
+in less optimal resource usage.
+
+=== Multiple Read-Only Slaves, Multiple Masters
+
+This involves the same basic idea as the multiple slaves, single master, but
+it shows that the master database is named :default. So for 4 masters and
+4 slaves:
+
+ DB=Sequel.connect('postgres://master_server/database', \
+ :servers=>{:read_only=>proc{|db| :host=>db.get_slave_host}, \
+ :default=>proc{|db| :host=>db.get_master_host}})
+ def DB.get_slave_host
+ @current_slave_host ||= -1
+ "slave_server#{(@current_slave_host+=1)%4}"
+ end
+ def DB.get_master_host
+ @current_master_host ||= -1
+ "master_server#{(@current_master_host+=1)%4}"
+ end
+
+== Sharding
+
+There is specific support in Sequel for handling master/slave database
+combinations, with the only necessary setup being the database configuration.
+However, since sharding is always going to be implementation dependent, Sequel
+supplies the basic infrastructure, but you have to tell it which server to use
+for each dataset. Let's assume the simple scenario, a distributed rainbow
+table for SHA-1 hashes, sharding based on the first hex character (for a total
+of 16 shards). First, you need to configure the database:
+
+ servers = {}
+ (('0'..'9').to_a + ('a'..'f').to_a).each do |hex|
+ servers[hex.to_sym] = {:host=>"hash_host_#{hex}"}
+ end
+ DB=Sequel.connect('postgres://hash_host/hashes', :servers=>servers)
+
+This configures 17 servers, the 16 shard servers (/hash_host_[0-9a-f]/), and 1
+default server which will be used if no shard is specified ("hash_host"). If
+you want the default server to be one of the shard servers (e.g. hash_host_a),
+it's easiest to do:
+
+ DB=Sequel.connect('postgres://hash_host_a/hashes', :servers=>servers)
+
+That will still set up a second pool of connections for the default server,
+since it considers the default server and shard servers independent. Note that
+if you always set the shard on a dataset before using it in queries, it will
+not attempt to connect to the default server. Sequel may use the default
+server in queries it generates itself, such as to get column names or table
+schemas, so it is always good to have a default server that works.
+
+To set the shard for a given query, you use the Dataset#server method:
+
+ DB[:hashes].server(:a).filter(:hash=>/31337/)
+
+That will return all matching rows on the hash_host_a shard that have a hash
+column that contains 31337.
+
+Rainbow tables are generally used to find specific hashes, so to save some
+work, you might want to add a method to the dataset that automatically sets
+the shard to use. This is fairly easy using a Sequel::Model:
+
+ class Rainbow < Sequel::Model(:hashes)
+ def_dataset_method(:plaintext_for_hash) do |hash|
+ raise(ArgumentError, 'Invalid SHA-1 Hash') unless /\A[0-9a-f]{40}\z/.match(hash)
+ row = self.server(hash[0...1].to_sym).first(:hash=>hash)
+ row[:plaintext] if row
+ end
+ end
+
+ Rainbow.plaintext_for_hash("e580726d31f6e1ad216ffd87279e536d1f74e606")
@@ -1,54 +0,0 @@
-module Sequel
- module Adapter
- class Database < Sequel::Database
- set_adapter_scheme :adapter
-
- def connect
- AdapterDB.new(@opts[:database], @opts[:user], @opts[:password])
- end
-
- def disconnect
- @pool.disconnect {|c| c.disconnect}
- end
-
- def dataset(opts = nil)
- Adapter::Dataset.new(self, opts)
- end
-
- def execute(sql)
- log_info(sql)
- @pool.hold {|conn| conn.exec(sql)}
- end
- end
-
- class Dataset < Sequel::Dataset
- def literal(v)
- case v
- when Time
- literal(v.iso8601)
- when Date, DateTime
- literal(v.to_s)
- else
- super
- end
- end
-
- def fetch_rows(sql, &block)
- @db.synchronize do
- cursor = @db.execute sql
- begin
- @columns = cursor.get_col_names.map {|c| c.to_sym}
- while r = cursor.fetch
- row = {}
- r.each_with_index {|v, i| row[@columns[i]] = v}
- yield row
- end
- ensure
- cursor.close
- end
- end
- self
- end
- end
- end
-end
@@ -14,14 +14,15 @@ module ADO
class Database < Sequel::Database
set_adapter_scheme :ado
- def connect
- @opts[:driver] ||= 'SQL Server'
- case @opts[:driver]
+ def connect(server)
+ opts = server_opts(server)
+ opts[:driver] ||= 'SQL Server'
+ case opts[:driver]
when 'SQL Server'
require 'sequel_core/adapters/shared/mssql'
extend Sequel::MSSQL::DatabaseMethods
end
- s = "driver=#{@opts[:driver]};server=#{@opts[:host]};database=#{@opts[:database]}#{";uid=#{@opts[:user]};pwd=#{@opts[:password]}" if @opts[:user]}"
+ s = "driver=#{opts[:driver]};server=#{opts[:host]};database=#{opts[:database]}#{";uid=#{opts[:user]};pwd=#{opts[:password]}" if opts[:user]}"
handle = WIN32OLE.new('ADODB.Connection')
handle.Open(s)
handle
@@ -35,9 +36,13 @@ def dataset(opts = nil)
ADO::Dataset.new(self, opts)
end
- def execute(sql)
+ def execute(sql, opts={})
log_info(sql)
- @pool.hold {|conn| conn.Execute(sql)}
+ synchronize(opts[:server]) do |conn|
+ r = conn.Execute(sql)
+ yield(r) if block_given?
+ r
+ end
end
alias_method :do, :execute
end
@@ -54,10 +59,8 @@ def literal(v)
end
end
- def fetch_rows(sql, &block)
- @db.synchronize do
- s = @db.execute sql
-
+ def fetch_rows(sql)
+ execute(sql) do |s|
@columns = s.Fields.extend(Enumerable).map do |column|
name = column.Name.empty? ? '(no column name)' : column.Name
name.to_sym
@@ -71,6 +74,8 @@ def fetch_rows(sql, &block)
self
end
+ private
+
def hash_row(row)
@columns.inject({}) do |m, c|
m[c] = row.shift
@@ -6,29 +6,15 @@ class Database < Sequel::Database
set_adapter_scheme :db2
include DB2CLI
- # AUTO_INCREMENT = 'IDENTITY(1,1)'.freeze
- #
- # def auto_increment_sql
- # AUTO_INCREMENT
- # end
-
- def check_error(rc, msg)
- case rc
- when SQL_SUCCESS, SQL_SUCCESS_WITH_INFO
- nil
- else
- raise Error, msg
- end
- end
-
rc, @@env = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE)
- check_error(rc, "Could not allocate DB2 environment")
+ #check_error(rc, "Could not allocate DB2 environment")
- def connect
+ def connect(server)
+ opts = server_opts(server)
rc, dbc = SQLAllocHandle(SQL_HANDLE_DBC, @@env)
check_error(rc, "Could not allocate database connection")
- rc = SQLConnect(dbc, @opts[:database], @opts[:user], @opts[:password])
+ rc = SQLConnect(dbc, opts[:database], opts[:user], opts[:password])
check_error(rc, "Could not connect to database")
dbc
@@ -44,26 +30,26 @@ def disconnect
end
end
- def test_connection
- @pool.hold {|conn|}
+ def test_connection(server=nil)
+ synchronize(server){|conn|}
true
end
def dataset(opts = nil)
DB2::Dataset.new(self, opts)
end
- def execute(sql, &block)
+ def execute(sql, opts={})
log_info(sql)
- @pool.hold do |conn|
+ synchronize(opts[:server]) do |conn|
rc, sth = SQLAllocHandle(SQL_HANDLE_STMT, @handle)
check_error(rc, "Could not allocate statement")
begin
rc = SQLExecDirect(sth, sql)
check_error(rc, "Could not execute statement")
- block[sth] if block
+ yield(sth) if block_given?
rc, rpc = SQLRowCount(sth)
check_error(rc, "Could not get RPC")
@@ -75,9 +61,22 @@ def execute(sql, &block)
end
end
alias_method :do, :execute
+
+ private
+
+ def check_error(rc, msg)
+ case rc
+ when SQL_SUCCESS, SQL_SUCCESS_WITH_INFO
+ nil
+ else
+ raise Error, msg
+ end
+ end
end
class Dataset < Sequel::Dataset
+ MAX_COL_SIZE = 256
+
def literal(v)
case v
when Time
@@ -89,21 +88,19 @@ def literal(v)
end
end
- def fetch_rows(sql, &block)
- @db.synchronize do
- @db.execute(sql) do |sth|
- @column_info = get_column_info(sth)
- @columns = @column_info.map {|c| c[:name]}
- while (rc = SQLFetch(@handle)) != SQL_NO_DATA_FOUND
- @db.check_error(rc, "Could not fetch row")
- yield hash_row(sth)
- end
+ def fetch_rows(sql)
+ execute(sql) do |sth|
+ @column_info = get_column_info(sth)
+ @columns = @column_info.map {|c| c[:name]}
+ while (rc = SQLFetch(@handle)) != SQL_NO_DATA_FOUND
+ @db.check_error(rc, "Could not fetch row")
+ yield hash_row(sth)
end
end
self
end
- MAX_COL_SIZE = 256
+ private
def get_column_info(sth)
rc, column_count = SQLNumResultCols(sth)
Oops, something went wrong.

0 comments on commit 7aeea22

Please sign in to comment.