-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[INDY-1336] Stop dropping ppSeqNo to 0 after view_change #1301
[INDY-1336] Stop dropping ppSeqNo to 0 after view_change #1301
Conversation
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
if last_pp_view_no < view_no: | ||
if view_no != self.view_no: | ||
return False | ||
last_pp_seq_no = 0 | ||
if pp_seq_no - last_pp_seq_no > 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a check return pp_seq_no - last_pp_seq_no == 1
would be more clear and safe
…es and prePrepares Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
plenum/server/replica.py
Outdated
elif not self.isPrimary: | ||
self._checkpointer.set_watermarks(low_watermark=0, | ||
high_watermark=sys.maxsize) | ||
# elif not self.isPrimary: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we need to leave this code, since backup instances can not correctly restore last ordered after catchup and have to relax the watermarks to restore last ordered from upcoming 3PC messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
plenum/server/replica.py
Outdated
def clear_requests_and_fix_last_ordered(self): | ||
self._clear_all_3pc_msgs_after_vc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this new code if already call gc
in _gc_before_new_view
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_gc_before_new_view will be called on primaryName changing. On starting view_change we set primaryName to None and _clear_all_3pc_msgs_after_vc does not needed before view_change
@@ -62,6 +62,14 @@ def __init__(self, name: str, validators: List[str], inst_id: int, is_master: bo | |||
self.primaries_batch_needed = False | |||
self.requestQueues = {} | |||
|
|||
@property | |||
def last_ordered_3pc(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it need to be a property?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's for debug purposes and would be removed.
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
self.is_master or self.last_ordered_3pc[1] != 0): | ||
seq_frm = last_pp_seq_no + 1 if pp_view_no == last_pp_view_no else 1 | ||
_, last_pp_seq_no = self.__last_pp_3pc | ||
# if pp_view_no >= last_pp_view_no and ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we check somewhere that pp_seq_no > last_pp_seq_no
, that is this is not a PrePrepare with a ppSeqNo we've already received?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already did this check in _is_next_preprepare call.
seq_to = pp_seq_no - 1 | ||
if seq_to >= seq_frm >= pp_seq_no - self._config.CHK_FREQ + 1: | ||
self._logger.warning( | ||
"{} missing PRE-PREPAREs from {} to {}, " | ||
"going to request".format(self, seq_frm, seq_to)) | ||
"{} missing PRE-PREPAREs for ppSeqNo {}, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we remove seq_to
from the log message?
assert slow_replica.h == 0 | ||
assert slow_replica.H == LOG_SIZE | ||
assert slow_replica.h == low_watermark | ||
# assert slow_replica.H == sys.maxsize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did we remove slow_replica.H
check? Should it be assert slow_replica.H == low_watermark + LOG_SIZE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
# Ensure that there are some quorumed stashed checkpoints | ||
assert slow_replica._checkpointer._stashed_checkpoints_with_quorum() | ||
|
||
# Ensure that now there are 3PC-messages stashed | ||
# as laying outside of the watermarks | ||
assert slow_replica.stasher.stash_size(STASH_WATERMARKS) == incoming_3pc_msgs_count(len(txnPoolNodeSet)) | ||
# assert slow_replica.stasher.stash_size(STASH_WATERMARKS) == incoming_3pc_msgs_count(len(txnPoolNodeSet)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was it removed?
timeout=tconf.Max3PCBatchWait) | ||
|
||
seq_no = 7 if view_no == 0 else 8 | ||
seq_no = num_batches if view_no == 0 else 2 * num_batches + 1 + tconf.Max3PCBatchWait |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this formula correct? Max3PCBatchWait
is seconds between batches, why do we use to calculate the seq_no?
seq_no = 8 if view_no == 0 else 9 | ||
# + 1 because of catchup | ||
# + 2 because of 2 catchup rounds | ||
seq_no = num_batches + 1 if view_no == 0 else 2 * num_batches + tconf.Max3PCBatchWait + 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this formula correct?
@@ -18,7 +18,7 @@ def reset(): | |||
return tconf | |||
|
|||
|
|||
def test_pp_seq_no_starts_from_0_in_new_view(tconf, txnPoolNodeSet, looper, | |||
def test_pp_seq_not_no_starts_from_0_in_new_view(tconf, txnPoolNodeSet, looper, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test_pp_seq_no_not_starts_from_0_in_new_view
…y-1336 Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
…nto public/indy-1336
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
…y-plenum into public/indy-1336
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
…y-plenum into public/indy-1336
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
…y-plenum into public/indy-1336 Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
… into public/indy-1336
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
… into public/indy-1336
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
…y-plenum into public/indy-1336
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
@@ -670,6 +671,7 @@ def last_ordered_3pc(self): | |||
@last_ordered_3pc.setter | |||
def last_ordered_3pc(self, lo_tuple): | |||
self._data.last_ordered_3pc = lo_tuple | |||
self.lastPrePrepareSeqNo = lo_tuple[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a check if lastPrePrepareSeqNo > last_ordered_3pc
here to have more clear logic (not relying on the setter) and avoid a lot of debug messages in logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -2233,6 +2231,12 @@ def get_sent_commit(self, viewNo, ppSeqNo): | |||
def replica_batch_digest(self, reqs): | |||
return replica_batch_digest(reqs) | |||
|
|||
def _clear_all_3pc_msgs_after_vc(self): | |||
self.sentPrePrepares.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to clear other messages? See, for example, process_view_change_started
message (which is called for the new view change protocol)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@@ -287,12 +287,11 @@ def last_ordered_3pc(self) -> tuple: | |||
|
|||
@last_ordered_3pc.setter | |||
def last_ordered_3pc(self, key3PC): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW why do we need last_ordered_3pc
setter in Replica? Who calls it, and can it call Ordering service instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now we have a lot of tests, which set last_ordered_3pc for replica directly. I can change it later.
plenum/server/replica.py
Outdated
|
||
@property | ||
def lastPrePrepareSeqNo(self): | ||
return self._ordering_service._lastPrePrepareSeqNo | ||
return self._ordering_service.lastPrePrepareSeqNo | ||
|
||
@lastPrePrepareSeqNo.setter | ||
def lastPrePrepareSeqNo(self, n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need lastPrePrepareSeqNo
setter in Replica? Who calls it, and can it call Ordering service instead?
@@ -35,12 +38,16 @@ def test_backup_replica_resumes_ordering_on_lag_in_checkpoints( | |||
|
|||
slow_replica, other_replicas = one_replica_and_others_in_backup_instance | |||
view_no = slow_replica.viewNo | |||
slow_replica._checkpointer._received_checkpoints.clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we clear _received_checkpoints
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's need for the second run of this test, i will move it into the end of test.
@@ -18,26 +18,31 @@ def reset(): | |||
return tconf | |||
|
|||
|
|||
def test_pp_seq_no_starts_from_0_in_new_view(tconf, txnPoolNodeSet, looper, | |||
sdk_pool_handle, sdk_wallet_client): | |||
def test_pp_seq_no_not_starts_from_0_in_new_view(tconf, txnPoolNodeSet, looper, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the test name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
|
||
# Check `slow_nodes` and `fast_nodes` set different last_prepared | ||
assert last_prepared_fast != last_prepared_slow | ||
for node in txnPoolNodeSet: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this loop if we use with delay_rules
context manager?
@@ -102,22 +102,31 @@ def test_ordered_request_freed_on_replica_removal(looper, | |||
chkFreqPatched, | |||
view_change): | |||
node = txnPoolNodeSet[0] | |||
sdk_send_random_and_check(looper, txnPoolNodeSet, sdk_pool_handle, sdk_wallet_client, 1) | |||
if node.master_replica._consensus_data.checkpoints: | |||
num_requests_to_stable_checkpoint = node.master_replica._consensus_data.stable_checkpoint - \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since consensus_data.checkpoints
contains checkpoints sent by us, all Checkpoint's seqNoEnd
there are greater or equal to consensus_data.stable_checkpoint
. So, what do we need this code, and why don't it return a value <= 0?
node.master_replica._consensus_data.checkpoints[-1].seqNoEnd | ||
sdk_send_random_and_check(looper, txnPoolNodeSet, sdk_pool_handle, sdk_wallet_client, | ||
num_requests_to_stable_checkpoint) | ||
assert not node.master_replica._consensus_data.checkpoints |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should always have at least 1 Checkpoint (at least after #1317)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but at the moment in this case the list is empty, and we can't use data from it.
|
||
# Send one more request to stabilize checkpoint | ||
sdk_send_random_and_check(looper, txnPoolNodeSet, sdk_pool_handle, sdk_wallet_client, 1) | ||
|
||
looper.run(eventually(check_for_nodes, txnPoolNodeSet, check_stable_checkpoint, 3)) | ||
sdk_send_random_and_check(looper, txnPoolNodeSet, sdk_pool_handle, sdk_wallet_client, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a duplicate line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We send 2 batches. But it will be better to use the sdk_send_batches_of_random_and_check()
.
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
@@ -1259,7 +1259,7 @@ def lastPrePrepareSeqNo(self, n): | |||
values else it will not. To forcefully override as in case of `revert`, | |||
directly set `self._lastPrePrepareSeqNo` | |||
""" | |||
if n > self._lastPrePrepareSeqNo: | |||
if n > self._lastPrePrepareSeqNo or not self.is_master: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about a case of normal ordering on non-master where last_ordered is less than lastPrePrepareSeqNo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i forgot this... sorry
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
…ered setting Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
test this please |
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
test this please |
Signed-off-by: Andrew Nikitin andrew.nikitin@dsr-corporation.com