Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INDY-1759: Restoration of lastPrePrepareSeqNo #971

Merged
merged 18 commits into from
Nov 20, 2018

Conversation

spivachuk
Copy link
Contributor

@spivachuk spivachuk commented Nov 8, 2018

  • Implemented restoration of lastPrePrepareSeqNo on a backup primary after restart.
  • Made some renaming in the code related to view change to make it clearer.
  • Corrected sdk_send_batches_of_random and sdk_send_batches_of_random_and_check test helper functions.
  • Wrote tests for lastPrePrepareSeqNo restoration feature.
  • Corrected existing tests that restarted node instances which had been stopped previously. Now a new node instance is created for restarting a node.
  • Made advancing last_ordered_3pc on a backup replica up to master's last_ordered_3pc on completion of a regular view change (not propagate primary) conditional: now it is made only if the backup replica is not primary.
  • Updated tests in test_no_propagate_request_on_different_last_ordered_before_vc module according to the change in the logic of advancing last_ordered_3pc on backup replicas on a regular view change.
  • Fixed intermittent failures in tests in test_no_propagate_request_on_different_last_ordered_before_vc module.

- Implemented restoration of lastPrePrepareSeqNo on a backup primary after restart.
- Made some renaming in the code related to view change to make it clearer.
- Corrected sdk_send_batches_of_random and sdk_send_batches_of_random_and_check test helper functions.
- Wrote a test for a positive case of lastPrePrepareSeqNo restoration on a backup primary after restart.

Signed-off-by: Nikita Spivachuk <nikita.spivachuk@dsr-company.com>
…nto backup-primary-restoration

Signed-off-by: Nikita Spivachuk <nikita.spivachuk@dsr-company.com>

# Conflicts:
#	plenum/common/messages/node_messages.py
#	plenum/server/view_change/view_changer.py
@spivachuk spivachuk changed the title INDY-1759: Restoration of lastPrePrepareSeqNo [WiP] INDY-1759: Restoration of lastPrePrepareSeqNo Nov 8, 2018
ArtObr and others added 8 commits November 9, 2018 10:33
Signed-off-by: ArtObr <artemobruchnikov@gmail.com>
Signed-off-by: ArtObr <artemobruchnikov@gmail.com>
Signed-off-by: ArtObr <artemobruchnikov@gmail.com>
- Corrected tests that restarted node instances which had been stopped previously. Now a new node instance is created for restarting a node.

Signed-off-by: Nikita Spivachuk <nikita.spivachuk@dsr-company.com>
Signed-off-by: ArtObr <artemobruchnikov@gmail.com>
- Added more tests for lastPrePrepareSeqNo restoration feature.

Signed-off-by: Nikita Spivachuk <nikita.spivachuk@dsr-company.com>
Signed-off-by: ArtObr <artemobruchnikov@gmail.com>
@@ -962,14 +962,17 @@ def sdk_send_batches_of_random_and_check(looper, txnPoolNodeSet, sdk_pool, sdk_w
if num_batches == 1:
return sdk_send_random_and_check(looper, txnPoolNodeSet, sdk_pool, sdk_wallet, num_reqs, **kwargs)

reqs_in_batch = num_reqs // num_batches
reqs_in_last_batch = reqs_in_batch + num_reqs % num_batches
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect, it should be num_reqs % num_batches?

@@ -674,6 +683,69 @@ def on_view_change_complete(self):
replica.clear_requests_and_fix_last_ordered()
self.monitor.reset()

def store_last_sent_pre_prepare_seq_no(self, inst_id, pp_seq_no):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please move all logic related to last_sent_pre_prepare_seq_no to a separate helper class
  2. Cover it by unit tests

@@ -745,6 +745,7 @@ def send3PCBatch(self):
if ppReq is None:
continue
self.sendPrePrepare(ppReq)
self.node.store_last_sent_pre_prepare_seq_no(self.instId, ppReq.ppSeqNo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not store it for master instance.

logger.info("{} restoring lastPrePrepareSeqNo "
"from stored lastSentPrePrepare value {}"
.format(self, serialized_value))
primary_replica.lastPrePrepareSeqNo = pp_seq_no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about watermarks?

if self.view_changer.previous_view_no == 0:
backup_primary_pp_seq_no_restored = \
self._try_restore_last_sent_pre_prepare_seq_no()
if not backup_primary_pp_seq_no_restored:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this flag? Why can't we always erase it?

looper, txnPoolNodeSet, sdk_pool_handle, sdk_wallet_client,
tconf, tdir, allPluginsPath):

for _ in range(6):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need 6 view changes here? This will be very long test.
I think the check for viewNo can be done on unit test level.

backup_inst_id = 1


def test_backup_primary_does_not_restore_pp_seq_no_if_view_is_not_same(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test looks too complex to just check that backup primary does not restore pp_seq_no if viewNo is not the same.
Why do we need to restart all nodes here?

backup_inst_id = 1


def test_backup_primary_restores_pp_seq_no_if_view_is_same(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also be done as a unit test.

nodeCount = 7


def test_backup_replica_does_not_restore_pp_seq_no_if_not_primary_anymore(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done as a unit test.

@ashcherbakov
Copy link
Contributor

ashcherbakov commented Nov 13, 2018

I think we need to have the following tests:

  • Unit tests for LastPrePrepareSeqNoRestoreHelper:
    -- cover can_restore method:
    --- check that can not restore if not primary
    --- check that can not retore if viewNo is not equal
    --- other checks (cover every if-else with a test)
    -- cover restore method:
    --- check that replica and node values are restored properly

  • Integration tests:

  1. Restart Backup Primary [parametrize for viewNo=0 and 1]:
  • send reqs
  • restart backup primary
  • make sure that it restored prepare_seq_no, ordered_seq_no and watermarks
  • make sure backup can continue ordering
  1. Restart all nodes [parametrize for viewNo=0 and 1]:
  • send reqs
  • restart all nodes
  • make sure that nothing is restored
  • make sure backup can continue ordering
  1. Restart backup primary with high watermark [parametrize for viewNo=0 and 1]:
  • send a lot of reqs (more than stable checkpoint, so that watermarks are changed)
  • restart backup primary
  • make sure that it restored prepare_seq_no, ordered_seq_no and watermarks
  • make sure backup can continue ordering
  1. Negative case [parametrize for viewNo=0 and 1]:
  • send a lot of reqs (more than stable checkpoint, so that watermarks are changed)
  • stop backup primary on Inst2
  • do view change
  • start the node
  • make sure that nothing is restored
  • make sure backup can continue ordering

ArtObr and others added 3 commits November 13, 2018 11:55
Signed-off-by: ArtObr <artemobruchnikov@gmail.com>
- Extracted last_sent_pp_seq_no restoration feature to a separate class.
- Added shifting watermarks correspondingly to restored last_sent_pp_seq_no.
- Removed storing last_sent_pp_seq_no for master replica.
- Added unit tests for last_sent_pp_seq_no restoration class.

Signed-off-by: Nikita Spivachuk <nikita.spivachuk@dsr-company.com>
def check_last_warning(expected_msg):
global warning_msg_count
warning_msg_count += 1
assert warning_msg_count == len(container)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather prefer error codes (instead of just true/false) instead of analyzing logs

- Added integration tests for last_sent_pp_seq_no restoration feature.
- Added a unit test for LastSentPpStoreHelper._restore_last_sent_pp_seq_no method.
- Rolled back changes in FakeNode class that had caused regression in existing tests.
- Removed OutputWarningHandler test helper class.
- Removed tests of last_sent_pp_seq_no restoration feature which had been replaced with new ones.

Signed-off-by: Nikita Spivachuk <nikita.spivachuk@dsr-company.com>
@spivachuk spivachuk changed the title [WiP] INDY-1759: Restoration of lastPrePrepareSeqNo INDY-1759: Restoration of lastPrePrepareSeqNo Nov 18, 2018
- Removed tests of last_sent_pp_seq_no restoration feature which had been replaced with new ones.

Signed-off-by: Nikita Spivachuk <nikita.spivachuk@dsr-company.com>
backup_inst_id = 1


def test_node_erases_stored_last_sent_pp_key_on_pool_restart(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests stops a tested node, then restarts other nodes in the pool, then start the tested node.
It would be great to write another test, where all nodes (including the tested one) are restarted at the same time

- Added an integration test for erasing last_sent_pp_seq_no in case of simultaneous restart of all the nodes in the pool.
- Made advancing last_ordered_3pc on a backup replica up to master's last_ordered_3pc on completion of a regular view change (not propagate primary) conditional: now it is made only if the backup replica is not primary.

Signed-off-by: Nikita Spivachuk <nikita.spivachuk@dsr-company.com>
…nto backup-primary-restoration

Signed-off-by: Nikita Spivachuk <nikita.spivachuk@dsr-company.com>

# Conflicts:
#	plenum/server/node.py
- Updated tests in test_no_propagate_request_on_different_last_ordered_before_vc module according to the change in the logic of advancing last_ordered_3pc on backup replicas on a regular view change.
- Fixed intermittent failures in tests in test_no_propagate_request_on_different_last_ordered_before_vc module.

Signed-off-by: Nikita Spivachuk <nikita.spivachuk@dsr-company.com>
@ashcherbakov
Copy link
Contributor

(ci) test this please

@ashcherbakov ashcherbakov merged commit 9dbf18e into hyperledger:master Nov 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants