forked from corosync/corosync
-
Notifications
You must be signed in to change notification settings - Fork 0
Archive the checkpoint partition recovery state machine
jfriesse edited this page Sep 20, 2012
·
1 revision
The purpose of the checkpoint syncrhonization algorithm is to synchronize checkpoints after a paritition or merge of two or more partitions. The secondary purpose of the algorithm is to determine the cluster-wide reference count for every checkpoint.
Every cluster contains a group of checkpoints. Each checkpoint has a checkpoint name and checkpoint number. The number is used to uniquely referencean unlinked but still open checkpoint in the cluser.
Every checkpoint contains a reference count which is used to determine when that checkpoint may be released. The algorithm rebuilds the reference count information each time a partition or merge occurs.
Variable Name | Description |
---|---|
my_sync_state | may have the values SYNC_CHECKPOINT, SYNC_REFCOUNT |
my_current_iteration_state | contains any data used to iterate the checkpoints and sections. |
checkpoint | data refcount_set contains reference count for every node consisting of number of opened connections to checkpoint and node identifier refcount contains a summation of every reference count in the refcount_set |
###Event Execution
call process_checkpoints_enter
if lowest processor identifier of the old ring in the new ring
transmit checkpoints or sections starting from my_current_iteration_state
if all checkpoints and sections could be queued
call sync_refcounts_enter
else
record my_current_iteration_state
require process to continue
if lowest processor identifier of old ring in new ring
transmit checkpoint reference counts
if all checkpoint reference counts could be queued
re4quire process to not continue
else
record my_current_iteration_state for checkpoint reference counts
my_sync_state = SYNC_CHECKPOINT
my_current_iteration_state set to start of checkpoint list
my_sync_state = SYNC_REFCOUNT
ignore message
if checkpoint exists in temporary storage
ignore message
else
create checkpoint
reset checkpoint refcount array
if checkpoint section exists in temporary storage
ignore message
else
create checkpoint section
update temporary checkpoint data storage reference count set by adding any reference counts in temporary message set to those from the event
update the checkpoints reference count
set the global checkpoint id to the current checkpoint id + 1 if it would increase the global checkpoint id
for all checkpoints
free all previously committed checkpoints and sections
convert temporary checkpoints and sections to regular sections
copy my_saved_ring_id to my_old_ring_id
free all temporary checkpoints and temporary sections