Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INDY-1336] Stop dropping ppSeqNo to 0 after view_change #1301

Merged
merged 41 commits into from
Sep 6, 2019

Conversation

lampkin-diet
Copy link

Signed-off-by: Andrew Nikitin andrew.nikitin@dsr-corporation.com

Andrew Nikitin added 2 commits August 19, 2019 11:36
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
if last_pp_view_no < view_no:
if view_no != self.view_no:
return False
last_pp_seq_no = 0
if pp_seq_no - last_pp_seq_no > 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a check return pp_seq_no - last_pp_seq_no == 1 would be more clear and safe

Andrew Nikitin added 2 commits August 20, 2019 11:02
…es and prePrepares

Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
elif not self.isPrimary:
self._checkpointer.set_watermarks(low_watermark=0,
high_watermark=sys.maxsize)
# elif not self.isPrimary:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect we need to leave this code, since backup instances can not correctly restore last ordered after catchup and have to relax the watermarks to restore last ordered from upcoming 3PC messages.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

def clear_requests_and_fix_last_ordered(self):
self._clear_all_3pc_msgs_after_vc()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this new code if already call gc in _gc_before_new_view?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_gc_before_new_view will be called on primaryName changing. On starting view_change we set primaryName to None and _clear_all_3pc_msgs_after_vc does not needed before view_change

@@ -62,6 +62,14 @@ def __init__(self, name: str, validators: List[str], inst_id: int, is_master: bo
self.primaries_batch_needed = False
self.requestQueues = {}

@property
def last_ordered_3pc(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need to be a property?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's for debug purposes and would be removed.

Andrew Nikitin added 3 commits August 21, 2019 19:08
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
self.is_master or self.last_ordered_3pc[1] != 0):
seq_frm = last_pp_seq_no + 1 if pp_view_no == last_pp_view_no else 1
_, last_pp_seq_no = self.__last_pp_3pc
# if pp_view_no >= last_pp_view_no and (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we check somewhere that pp_seq_no > last_pp_seq_no , that is this is not a PrePrepare with a ppSeqNo we've already received?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already did this check in _is_next_preprepare call.

seq_to = pp_seq_no - 1
if seq_to >= seq_frm >= pp_seq_no - self._config.CHK_FREQ + 1:
self._logger.warning(
"{} missing PRE-PREPAREs from {} to {}, "
"going to request".format(self, seq_frm, seq_to))
"{} missing PRE-PREPAREs for ppSeqNo {}, "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we remove seq_to from the log message?

assert slow_replica.h == 0
assert slow_replica.H == LOG_SIZE
assert slow_replica.h == low_watermark
# assert slow_replica.H == sys.maxsize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we remove slow_replica.H check? Should it be assert slow_replica.H == low_watermark + LOG_SIZE?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


# Ensure that there are some quorumed stashed checkpoints
assert slow_replica._checkpointer._stashed_checkpoints_with_quorum()

# Ensure that now there are 3PC-messages stashed
# as laying outside of the watermarks
assert slow_replica.stasher.stash_size(STASH_WATERMARKS) == incoming_3pc_msgs_count(len(txnPoolNodeSet))
# assert slow_replica.stasher.stash_size(STASH_WATERMARKS) == incoming_3pc_msgs_count(len(txnPoolNodeSet))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was it removed?

timeout=tconf.Max3PCBatchWait)

seq_no = 7 if view_no == 0 else 8
seq_no = num_batches if view_no == 0 else 2 * num_batches + 1 + tconf.Max3PCBatchWait
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this formula correct? Max3PCBatchWait is seconds between batches, why do we use to calculate the seq_no?

seq_no = 8 if view_no == 0 else 9
# + 1 because of catchup
# + 2 because of 2 catchup rounds
seq_no = num_batches + 1 if view_no == 0 else 2 * num_batches + tconf.Max3PCBatchWait + 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this formula correct?

@@ -18,7 +18,7 @@ def reset():
return tconf


def test_pp_seq_no_starts_from_0_in_new_view(tconf, txnPoolNodeSet, looper,
def test_pp_seq_not_no_starts_from_0_in_new_view(tconf, txnPoolNodeSet, looper,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_pp_seq_no_not_starts_from_0_in_new_view

Andrew Nikitin and others added 19 commits August 28, 2019 19:06
…y-1336

Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
…y-plenum into public/indy-1336

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Andrew Nikitin and others added 4 commits September 3, 2019 13:32
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
@lampkin-diet lampkin-diet changed the title [WIP][INDY-1336] Stop dropping ppSeqNo to 0 after view_change [INDY-1336] Stop dropping ppSeqNo to 0 after view_change Sep 3, 2019
@@ -670,6 +671,7 @@ def last_ordered_3pc(self):
@last_ordered_3pc.setter
def last_ordered_3pc(self, lo_tuple):
self._data.last_ordered_3pc = lo_tuple
self.lastPrePrepareSeqNo = lo_tuple[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a check if lastPrePrepareSeqNo > last_ordered_3pc here to have more clear logic (not relying on the setter) and avoid a lot of debug messages in logs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -2233,6 +2231,12 @@ def get_sent_commit(self, viewNo, ppSeqNo):
def replica_batch_digest(self, reqs):
return replica_batch_digest(reqs)

def _clear_all_3pc_msgs_after_vc(self):
self.sentPrePrepares.clear()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to clear other messages? See, for example, process_view_change_started message (which is called for the new view change protocol)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@@ -287,12 +287,11 @@ def last_ordered_3pc(self) -> tuple:

@last_ordered_3pc.setter
def last_ordered_3pc(self, key3PC):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW why do we need last_ordered_3pc setter in Replica? Who calls it, and can it call Ordering service instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we have a lot of tests, which set last_ordered_3pc for replica directly. I can change it later.


@property
def lastPrePrepareSeqNo(self):
return self._ordering_service._lastPrePrepareSeqNo
return self._ordering_service.lastPrePrepareSeqNo

@lastPrePrepareSeqNo.setter
def lastPrePrepareSeqNo(self, n):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need lastPrePrepareSeqNo setter in Replica? Who calls it, and can it call Ordering service instead?

@@ -35,12 +38,16 @@ def test_backup_replica_resumes_ordering_on_lag_in_checkpoints(

slow_replica, other_replicas = one_replica_and_others_in_backup_instance
view_no = slow_replica.viewNo
slow_replica._checkpointer._received_checkpoints.clear()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we clear _received_checkpoints here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's need for the second run of this test, i will move it into the end of test.

@@ -18,26 +18,31 @@ def reset():
return tconf


def test_pp_seq_no_starts_from_0_in_new_view(tconf, txnPoolNodeSet, looper,
sdk_pool_handle, sdk_wallet_client):
def test_pp_seq_no_not_starts_from_0_in_new_view(tconf, txnPoolNodeSet, looper,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the test name

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok


# Check `slow_nodes` and `fast_nodes` set different last_prepared
assert last_prepared_fast != last_prepared_slow
for node in txnPoolNodeSet:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this loop if we use with delay_rules context manager?

@@ -102,22 +102,31 @@ def test_ordered_request_freed_on_replica_removal(looper,
chkFreqPatched,
view_change):
node = txnPoolNodeSet[0]
sdk_send_random_and_check(looper, txnPoolNodeSet, sdk_pool_handle, sdk_wallet_client, 1)
if node.master_replica._consensus_data.checkpoints:
num_requests_to_stable_checkpoint = node.master_replica._consensus_data.stable_checkpoint - \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since consensus_data.checkpoints contains checkpoints sent by us, all Checkpoint's seqNoEnd there are greater or equal to consensus_data.stable_checkpoint. So, what do we need this code, and why don't it return a value <= 0?

node.master_replica._consensus_data.checkpoints[-1].seqNoEnd
sdk_send_random_and_check(looper, txnPoolNodeSet, sdk_pool_handle, sdk_wallet_client,
num_requests_to_stable_checkpoint)
assert not node.master_replica._consensus_data.checkpoints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always have at least 1 Checkpoint (at least after #1317)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but at the moment in this case the list is empty, and we can't use data from it.


# Send one more request to stabilize checkpoint
sdk_send_random_and_check(looper, txnPoolNodeSet, sdk_pool_handle, sdk_wallet_client, 1)

looper.run(eventually(check_for_nodes, txnPoolNodeSet, check_stable_checkpoint, 3))
sdk_send_random_and_check(looper, txnPoolNodeSet, sdk_pool_handle, sdk_wallet_client, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a duplicate line?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We send 2 batches. But it will be better to use the sdk_send_batches_of_random_and_check().

Andrew Nikitin added 3 commits September 3, 2019 20:02
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
@@ -1259,7 +1259,7 @@ def lastPrePrepareSeqNo(self, n):
values else it will not. To forcefully override as in case of `revert`,
directly set `self._lastPrePrepareSeqNo`
"""
if n > self._lastPrePrepareSeqNo:
if n > self._lastPrePrepareSeqNo or not self.is_master:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a case of normal ordering on non-master where last_ordered is less than lastPrePrepareSeqNo?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i forgot this... sorry

Andrew Nikitin added 2 commits September 5, 2019 13:00
Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
…ered setting

Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
@lampkin-diet
Copy link
Author

test this please

Signed-off-by: Andrew Nikitin <andrew.nikitin@dsr-corporation.com>
@lampkin-diet
Copy link
Author

test this please

@lampkin-diet lampkin-diet merged commit c781699 into hyperledger:master Sep 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants