Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INDY-1404: Do not do view change until catchup is finished #757

Merged
merged 27 commits into from
Jul 2, 2018

Conversation

ashcherbakov
Copy link
Contributor

  • do not process any View Change related messages (including ViewChangeDone from Current State for primary propagation) until the current catchup is finished;
  • this fixes initialization of a node where we mixed together initial catchup and primary propagation (initial view change);
  • also fixed recovering when primary is disconnected and (n-f)th node joins the pool

Signed-off-by: ashcherbakov alexander.sherbakov@dsr-company.com

- do not process any View Change related messages (including ViewChangeDone from Current State for primary propagation) until the current catchup is finished;
- this fixes initialization of a node where we mixed together initial catchup and primary propagation (initial view change);
- also fixed recovering when primary is disconnected and (n-f)th node joins the pool

Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
@ashcherbakov ashcherbakov changed the title Do not do view change until catchup is finished [WIP] INDY-1404: Do not do view change until catchup is finished Jun 19, 2018
Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
@andkononykhin
Copy link
Contributor

test this please

@andkononykhin
Copy link
Contributor

test this please

ashcherbakov and others added 5 commits June 21, 2018 11:54
Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
Signed-off-by: ArtObr <artemobruchnikov@gmail.com>
@ArtObr
Copy link
Contributor

ArtObr commented Jun 22, 2018

test this please

…rledger/indy-plenum into catchup-cancellation
Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
…rledger/indy-plenum into catchup-cancellation
@ashcherbakov ashcherbakov changed the title [WIP] INDY-1404: Do not do view change until catchup is finished INDY-1404: Do not do view change until catchup is finished Jun 28, 2018
Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
@@ -404,3 +404,9 @@ def _validate_message(self, dct):
self._raise_invalid_fields(
f.MSG.nm, msg,
"The message type must be {} ".format(expected_type_cls.typename))


class FutureViewChangeDone:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if purpose of this message was documented, as with all other messages in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1905,6 +1909,9 @@ def is_catchup_needed(self) -> bool:
Check if all requests ordered till last prepared certificate
Check if last catchup resulted in no txns
"""
if not self.view_change_in_progress:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a bit counterintuitive, it would be nice to have a comment explaining why no catchup is needed if view change is not in progress

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# This is especially critical for Propagate Primary mode (on receiving CURRENT_STATE by a new node).
if self.view_no in self._next_view_indications:
for frm, vcd in self._next_view_indications[self.view_no].items():
self._on_verified_view_change_done_msg(vcd, frm)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it's better to call process_vchd_msg here, which will call _on_verified_view_change_done_msg only when it's truly verified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will not work. Added a comment with a description.

# remove all for previous views
for view_no in tuple(self._next_view_indications.keys()):
if view_no <= self.view_no:
self._next_view_indications.pop(view_no)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

del self._next_view_indications[view_no] could be more appropriate, since pop returns removed item which we don't actually need

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ensure_view_change(looper, other_nodes)
checkProtocolInstanceSetup(looper=looper, nodes=other_nodes, numInstances=2)
ensure_all_nodes_have_same_data(looper, nodes=other_nodes)
looper.runFor(expectedPoolViewChangeStartedTimeout(len(txnPoolNodeSet)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a probably source of flakiness. Can we come up with some predicate which can indicate that view change won't happen for sure?

old_meths = do_view_change(txnPoolNodeSet)
for node in txnPoolNodeSet:
node.view_changer.sendInstanceChange(old_view_no + 1)
looper.runFor(expectedPoolViewChangeStartedTimeout(len(txnPoolNodeSet)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a probably source of flakiness. Can we come up with some predicate which can indicate that view change won't happen for sure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

from plenum.test.waits import expectedPoolViewChangeStartedTimeout


@pytest.fixture(params=['starting', 'discovering', 'discovered', 'syncing'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need strings here and lot's of if/else? Would the following do the job?

@pytest.fixture(params=[Mode.starting, Mode.discovering...])
def mode(request)
    return request.param

from plenum.test.waits import expectedPoolViewChangeStartedTimeout


@pytest.fixture(params=['starting', 'discovering', 'discovered', 'syncing'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need strings here and lot's of if/else? Would the following do the job?

@pytest.fixture(params=[Mode.starting, Mode.discovering...])
def mode(request)
    return request.param

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
@ashcherbakov
Copy link
Contributor Author

test this please

1 similar comment
@ashcherbakov
Copy link
Contributor Author

test this please

Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
@ashcherbakov
Copy link
Contributor Author

test this please

2 similar comments
@ashcherbakov
Copy link
Contributor Author

test this please

@ashcherbakov
Copy link
Contributor Author

test this please

Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
@ashcherbakov
Copy link
Contributor Author

test this please

Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
skhoroshavin
skhoroshavin previously approved these changes Jul 2, 2018
Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>
…hup-cancellation

Signed-off-by: ashcherbakov <alexander.sherbakov@dsr-company.com>

# Conflicts:
#	plenum/test/recorder/test_replay_with_view_change.py
@andkononykhin andkononykhin merged commit deb82df into master Jul 2, 2018
@ashcherbakov ashcherbakov deleted the catchup-cancellation branch July 3, 2018 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants