Skip to content

Commit

Permalink
add reconcile feature, incorporate code review comments from eric
Browse files Browse the repository at this point in the history
  • Loading branch information
ryanlecompte committed Apr 15, 2012
1 parent 0373961 commit c8c49e3
Show file tree
Hide file tree
Showing 6 changed files with 48 additions and 17 deletions.
2 changes: 2 additions & 0 deletions Changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ HEAD
- No longer force newly available slaves to master if already slaves of that master
- Honor a node's slave-serve-stale-data configuration option; do not mark a sync-with-master-in-progress slave as available if its slave-serve-stale-data is disabled
- Change reachable/unreachable wording to available/unavailable
- Added node reconciliation, i.e. if a node comes back up, make sure that the node manager and the node agree on current role
- More efficient use of redis client connections

0.3.0
-----------
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,10 @@ The full set of options that can be passed to RedisFailover::Client are:
:retry_failure - indicate if failures should be retried (default true)
:max_retries - max retries for a failure (default 3)

## Considerations

Note that by default the failover server will mark slaves that are currently syncing with their master as "available" based on the configuration value set for "slave-serve-stale-data" in redis.conf. By default this value is set to "yes" in the configuration, which means that slaves still syncing with their master will be available for servicing read requests. If you don't want this behavior, just set "slave-serve-stale-data" to "no" in your redis.conf file.

## Limitations

The redis_failover gem currently has limitations. It currently does not gracefully handle network partitions. In cases where
Expand Down
1 change: 0 additions & 1 deletion lib/redis_failover.rb
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
require 'set'
require 'redis'
require 'thread'
require 'logger'
Expand Down
3 changes: 0 additions & 3 deletions lib/redis_failover/client.rb
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,13 @@ class Client
:hlen,
:hmget,
:hvals,
:info,
:keys,
:lastsave,
:lindex,
:llen,
:lrange,
:mapped_hmget,
:mapped_mget,
:mget,
:ping,
:scard,
:sdiff,
:select,
Expand Down
25 changes: 15 additions & 10 deletions lib/redis_failover/node.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ module RedisFailover
class Node
include Util

BLPOP_WAIT_TIME = 3

attr_reader :host, :port

def initialize(options = {})
Expand All @@ -19,13 +21,23 @@ def slave?
!master?
end

def slave_of?(master)
current_master == master
end

def current_master
info = fetch_info
return unless info[:role] == 'slave'
Node.new(:host => info[:master_host], :port => info[:master_port].to_i)
end

# Waits until something interesting happens. If the connection
# with this node dies, the blpop call will raise an error. If
# the blpop call returns without error, then this will be due to
# a graceful shutdown signaled by #stop_waiting.
# a graceful shutdown signaled by #stop_waiting or a timeout.
def wait
perform_operation do
redis.blpop(wait_key, 0)
redis.blpop(wait_key, 0, BLPOP_WAIT_TIME)
redis.del(wait_key)
end
end
Expand Down Expand Up @@ -103,7 +115,7 @@ def wait_key
end

def redis
Redis.new(:host => @host, :password => @password, :port => @port)
@redis ||= Redis.new(:host => @host, :password => @password, :port => @port)

This comment has been minimized.

Copy link
@eric

eric Apr 15, 2012

Collaborator

I believe doing this could cause a deadlock when stop_waiting() is called — there is a mutex that prevents multiple commands from happening on a redis connection at the same time, and the blpop() call should have the lock at the time.

This makes me realize that it would be pretty helpful if the tests were running against real local redis instances to catch issues like this that have to do with the implementation of the lower-level components.

This comment has been minimized.

Copy link
@eric

eric Apr 15, 2012

Collaborator

Sending SIGSTOP (and eventually SIGCONT) to the redis daemon would be a good way to test the case of a hung system...

This comment has been minimized.

Copy link
@ryanlecompte

ryanlecompte Apr 15, 2012

Author Owner

@eric, you're absolutely correct. This was the original reason why I was creating a new redis client instead of using a memoized one. I will revert this change and ensure that I call #disconnect on the redis client to avoid having left-over connections to the server.

This comment has been minimized.

Copy link
@ryanlecompte

ryanlecompte Apr 15, 2012

Author Owner

This should be addressed now @ b4e452a

rescue
raise NodeUnavailableError.new(self)
end
Expand All @@ -113,12 +125,5 @@ def perform_operation
rescue
raise NodeUnavailableError.new(self)
end

def slave_of?(master)
info = fetch_info
info[:role] == 'slave' &&
info[:master_host] == master.host &&
info[:master_port] == master.port.to_s
end
end
end
30 changes: 27 additions & 3 deletions lib/redis_failover/node_manager.rb
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ class NodeManager
def initialize(options)
@options = options
@master, @slaves = parse_nodes
@unavailable = Set.new
@unavailable = []
@queue = Queue.new
@lock = Mutex.new
end
Expand Down Expand Up @@ -72,6 +72,8 @@ def handle_unavailable(node)
end

def handle_available(node)
reconcile(node)

# no-op if we already know about this node
return if @master == node || @slaves.include?(node)
logger.info("Handling available node: #{node}")
Expand All @@ -89,6 +91,8 @@ def handle_available(node)
end

def handle_syncing(node)
reconcile(node)

if node.prohibits_stale_reads?
logger.info("Node #{node} not ready yet, still syncing with master.")
@unavailable << node
Expand Down Expand Up @@ -116,14 +120,14 @@ def promote_new_master(node = nil)
end

def parse_nodes
nodes = @options[:nodes].map { |opts| Node.new(opts) }
nodes = @options[:nodes].map { |opts| Node.new(opts) }.uniq
raise NoMasterError unless master = find_master(nodes)
slaves = nodes - [master]

logger.info("Managing master (#{master}) and slaves" +
" (#{slaves.map(&:to_s).join(', ')})")

[master, Set.new(slaves)]
[master, slaves]
end

def spawn_watchers
Expand Down Expand Up @@ -156,5 +160,25 @@ def redirect_slaves_to_master
end
end
end

# It's possible that a newly available node may have been restarted
# and completely lost its dynamically set run-time role by the node
# manager. This method ensures that the node resumes its role as
# determined by the manager.
def reconcile(node)
return if @master == node && node.master?
return if @master && node.slave_of?(@master)

if @master == node && !node.master?
# we think the node is a master, but the node doesn't
node.make_master!
return
end

# verify that node is a slave for the current master
if @master && !node.slave_of?(@master)
node.make_slave!(@master)
end
end
end
end

0 comments on commit c8c49e3

Please sign in to comment.