Permalink
Commits on Sep 27, 2011
  1. @jonmeredith
  2. @jonmeredith
  3. @jonmeredith
  4. @jonmeredith
  5. @jonmeredith

    Perform final sync once all handoff data has been sent.

    The new cluster membership code switched to forwarding
    once handoff is complete.  Without this change the vnode
    starts forwarding while the new owner is still processing
    buffered TCP data.
    jonmeredith committed Sep 27, 2011
Commits on Sep 26, 2011
  1. @jtuple

    Fix bug with nodes leaving the cluster earlier than intended.

    Change ring_ready to wait on exiting nodes in addition to valid and leaving
    nodes. This ensure the ring converges on a node's intent to leave before the
    node leaves the cluster.
    
    Change claimant from moving itself from exiting to invalid. Instead, after
    the claimant moves to exiting, a new claimant will emerge that will move the
    previous claimant to invalid and initiate shutdown.
    jtuple committed Sep 26, 2011
Commits on Sep 23, 2011
  1. @jonmeredith

    Fixed update_forwarding_mode return in deleted case.

    The caller wraps the state with the next state information.
    jonmeredith committed Sep 23, 2011
  2. @jaredmorrow
  3. @jonmeredith
  4. @jonmeredith
  5. @jonmeredith

    Made Mod:delete happen before unregister.

    Prevent a race with the master starting a new vnode.
    Changed coverage to run while in handoff - otherwise
    listkeys et al will bomb during partition transfer.
    jonmeredith committed Sep 23, 2011
  6. @jonmeredith

    Added infinity timeout on finish_handoff call.

    On a very busy 6-node stagedevrel cluster was hitting.
    11:35:18.950 [error] gen_fsm <0.171.0> in state active terminated with reason: {timeout,{gen_server,call,[riak_core_gossip,{finish_handoff,45671926166590716193865151022383844364247891968,'dev1@127.0.0.1','dev3@127.0.0.1',riak_pipe_vnode}]}}
    
    The process is local and the call is monitored in case gossip dies.
    jonmeredith committed Sep 23, 2011
  7. @jonmeredith

    Changed vnode to unregister from master before cleaning up.

    Fullsync repl was hanging because it delivered a fold message
    while finish_handoff was being called.  The message was never
    processed as the vnode immediately shut down rather than
    forwarding the messages in the queue.
    
    On completion of handoff, async unregister from the vnode master. The
    unregister call now passes the pid of the vnode unregistering
    and now the master sends an unregistered event once the vnode
    is removed from the master ETS table.
    
    While waiting for the acknowledgment of unregister the vnode goes
    into forwarding mode.
    jonmeredith committed Sep 23, 2011
  8. @jtuple
  9. @jtuple
Commits on Sep 21, 2011
  1. @jtuple
  2. @jtuple

    Update new partition claim algorithm after review + bug fixes

    Change claim_simulation.erl eunit test to run a simulation with both the
    new and old claim algorithm as suggested.
    
    Rename riak_core_new_claim:new_claim/2 to new_choose_claim/2 to match
    default_choose_claim/2.
    
    Fix two bugs in riak_core_new_claim.erl that are on code paths that cannot
    occur in 1.0 due to existing invariants, but should be fixed nevertheless:
    - Match error in prefilter_violations: change CNth to {CNth, _}.
    - Handle case where new_choose_claim fails to claim partitions by falling
      back to claim_rebalance_n.
    jtuple committed Sep 21, 2011
Commits on Sep 20, 2011
  1. @jtuple

    Add new partition claim function and claim simulator

    Add riak_core_new_claim:new_wants_claim/2 and new_claim/2.
    Merge in claim simulation code provided by Greg Nelson (grourk@dropcam.com).
    Add pretty_print function to riak_core_ring.
    
    The new claim function is designed to reduce the number of partition transfers
    that occur when rebalancing the ring, aiming as close to possible for minimal
    consistent hashing.
    jtuple committed Sep 20, 2011
  2. @jaredmorrow
  3. @Vagabond

    Fix a variable conflict

    Vagabond committed Sep 20, 2011
  4. @Vagabond
  5. @Vagabond
  6. @Vagabond

    Fix bug with worker checkin tracking

    bz1188
    Vagabond committed Sep 15, 2011
  7. @Vagabond
  8. @Vagabond

    Initial attempt at clean vnode shutdown that waits for queued work

    bz1188
    
    This patch adds a patched supervisor module that supports graceful
    shutdown from a simple_one_for_one, so when a node stops gracefully, we
    can block shutdown long enough to process any queued work and do any
    other cleanups.
    Vagabond committed Sep 14, 2011
  9. @jonmeredith @Vagabond
Commits on Sep 16, 2011
  1. @rustyio

    Merge pull request #86 from basho/AZ721-louder-2i-errors

    AZ721 - Fail Loudly on 2i Errors
    rustyio committed Sep 16, 2011
  2. @kellymclaughlin
  3. @kellymclaughlin

    Fix subtle bug in riak_core_coverage_fsm.

    Fixes: az726
    
    This change fixes a bug in riak_core_coverage_fsm where the updated
    state is not passed to the module implementing the behavior in the
    finish call. This can lead to incomplete results for operations that
    accumulate results in the state and do something with them in the
    finish function.
    kellymclaughlin committed Sep 15, 2011
Commits on Sep 15, 2011
  1. @kellymclaughlin
  2. @kellymclaughlin
  3. @rustyio

    Stop coverage fsm when there is an error.

    AZ721
    rustyio committed Sep 15, 2011
Commits on Sep 14, 2011
  1. @slfritchie
  2. @slfritchie
  3. @slfritchie