Skip to content

Archive the checkpoint partition recovery state machine

jfriesse edited this page Sep 20, 2012 · 1 revision

The purpose of the checkpoint syncrhonization algorithm is to synchronize checkpoints after a paritition or merge of two or more partitions. The secondary purpose of the algorithm is to determine the cluster-wide reference count for every checkpoint.

Every cluster contains a group of checkpoints. Each checkpoint has a checkpoint name and checkpoint number. The number is used to uniquely referencean unlinked but still open checkpoint in the cluser.

Every checkpoint contains a reference count which is used to determine when that checkpoint may be released. The algorithm rebuilds the reference count information each time a partition or merge occurs.

Local Variables

Variable Name Description
my_sync_state may have the values SYNC_CHECKPOINT, SYNC_REFCOUNT
my_current_iteration_state contains any data used to iterate the checkpoints and sections.
checkpoint data refcount_set contains reference count for every node consisting of number of opened connections to checkpoint and node identifier refcount contains a summation of every reference count in the refcount_set

###Event Execution

init event

call process_checkpoints_enter

process event called in the SYNC_CHECKPOINT state

if lowest processor identifier of the  old ring in the new ring
        transmit checkpoints or sections starting from my_current_iteration_state
if all checkpoints and sections could be queued
        call sync_refcounts_enter
else
        record my_current_iteration_state
require process to continue

process event called in the SYNC_REFCOUNT state

if lowest processor identifier of old ring in new ring
        transmit checkpoint reference counts
if all checkpoint reference counts could be queued
        re4quire process to not continue
else
        record my_current_iteration_state for checkpoint reference counts

the sync_checkpoint_enter operation

my_sync_state = SYNC_CHECKPOINT
my_current_iteration_state set to start of checkpoint list

the sync_refcounts_enter operation

my_sync_state = SYNC_REFCOUNT

receipt of foreign ring id message

ignore message

receipt of checkpoint update

if checkpoint exists in temporary storage
        ignore message
else
        create checkpoint
        reset checkpoint refcount array

receipt of checkpoint section update

if checkpoint section exists in temporary storage
        ignore message
else
         create checkpoint section

receipt of reference count update

update temporary checkpoint data storage reference count set by adding any reference counts in     temporary message set to those from the event
update the checkpoints reference count
set the global checkpoint id to the current checkpoint id + 1 if it would increase the global checkpoint id

activate event

for all checkpoints
        free all previously committed checkpoints and sections
        convert temporary checkpoints and sections to regular sections
copy my_saved_ring_id to my_old_ring_id

abort event

free all temporary checkpoints and temporary sections
Clone this wiki locally