INDY-1911: Send INSTANCE_CHANGE when state signatures are not fresh enough #1078

skhoroshavin · 2019-02-07T16:12:31Z

No description provided.

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

ashcherbakov · 2019-02-08T07:52:21Z

plenum/server/view_change/view_changer.py

@@ -714,7 +713,21 @@ def get_msgs_for_lagged_nodes(self) -> List[ViewChangeDone]:
                        format(self, self.view_no))
        return messages

+    def propose_view_change(self, suspicion=Suspicions.PRIMARY_DEGRADED):
+        proposed_view_no = self.view_no
+        if not self.view_change_in_progress or suspicion == Suspicions.INSTANCE_CHANGE_TIMEOUT:


But we don't call this method for INSTANCE_CHANGE_TIMEOUT, do we? See, for example, on_view_change_not_completed_in_time

ashcherbakov · 2019-02-08T08:12:16Z

plenum/test/freshness/test_freshness_view_change.py

+    sdk_ensure_pool_functional(looper, txnPoolNodeSet, sdk_wallet_client, sdk_pool_handle)
+
+
+def test_view_change_happens_if_ordering_is_halted(looper, tconf, txnPoolNodeSet,


Do we have a test when the ordering is stopped because Primary doesn't send a PrePrepare in time?

Actually this test emulates behaviour you described by blocking delivery of PrePrepares to all nodes

Yes, I understand it. Knowing how it's implemented it's clear that this is the same and doesn't matter.
But from a black box point of view it can be a different case (in the first case Primary sends PrePrepare, that is not malicious, but the PrePrepares are just lost; in the second case Nodes are ready to order, but the Primary is malicious and doesn't sent PrePrepares, or send them not frequently enough).

ashcherbakov · 2019-02-08T08:21:46Z

plenum/test/view_change/test_view_change_with_instance_change_lost_due_to_restarts.py

+
+    for n in some_nodes:
+        n.view_changer.on_master_degradation()
+        looper.runFor(0.2)


Why do we need this run? Should we check instead that other nodes received 2 IC messages?

Yes, check for received IC messages would be more clean, current implementation is just a quick hack around.

ashcherbakov · 2019-02-08T08:29:59Z

plenum/test/view_change/test_view_change_with_instance_change_lost_due_to_restarts.py

+    1. some_nodes (Beta and Gamma) send InstanceChange for all nodes.
+    2. Restart other_nodes (Gamma and Delta)
+    3. last_node (Delta) send InstanceChange for all nodes.
+    4. Ensure elections done and pool is functional


If I understand correctly, in this test we have the following situation after step3:

Alpha (Primary) has 3 IC and started VC

Beta has 3 IC and started VC

Gamma has 1 IC

Delta has 1 IC
Since Primary stopped ordering, freshness checks lead to start of a VC on all nodes.
So, it makes sense to call the test like test_vc_finished_when_less_than_quorum_started_including_primary.

But what if a different combination of nodes restarted, so that Primary doesn't have a quorum of IC?
For example,

Alpha and Beta send IC

Beta is restarted (wait until it's restarted before go to the next step)

Alpha is restarted (wait until it's restarted before go to the next step)

Gamma is restarted (wait until it's restarted before go to the next step)

Delta sends IC
= >
Alpha - 1 IC
Beta - 1 IC
Gamma - 1 IC
Delta - 3 IC => starts VC.
As a result, Delta will be in infinite VC.

I think the freshness check can not solve the situation above, so persistence of IC will be needed.

Another test: all nodes start VC, but 1/2 of the nodes restart during the VC.
Sub-tests:

1/2 nodes are restarted immediately

1/2 nodes are restarted in > than INSTANCE_CHANGE_TIMEOUT

1/2 nodes are restarted in > than 3*INSTANCE_CHANGE_TIMEOUT

Nice to have:
Maybe we should implement something like unit/property-based tests for checking that we can recover from any state.
Take ViewChange classes, put then into different view, and sent IC and VC messages to it in different combinations. It's expected that all ViewChanger classes should eventually come to the same view.
We can do it with our new property-based model as well.
But if we could do it with a real View Changer class, we would have more confidence.

Actually I tried to implement test where only minority of nodes enter view change, however there are some problems with this:

it is possible to do so on a 7 nodes pool, however since master primary also enters view change and ordering is stopped this case is no different from already implemented

preventing primary from collecting instance change quorum by restarting it will lead to primary disconnection events and view change will happen anyways

fun fact - in INDY-1903 primary was not restarted, but it didn't receive instance change messages during network outages, that's why it didn't enter view change. However it should have received them due to queueing (and I fail to see how to reproduce it correctly in tests), so probably there was some yet another bug?

Implementing property based tests for view changer would be extremely helpful, however view changer does depend a lot on node and it uses real communication layer, not simulated one. There are multiple implementation options, but they all have drawbacks:

just implement full-blown integration property based test - this can be done pretty quickly, however such test would take ages to run, and I have concerns about reproducibility of its results

use monkeypatching and mocking extensively to isolate view changer without touching it too much - this can be done in moderate amount of time, however I'm afraid such tests would be very fragile as they will depend a lot on implementation details

refactor view changer so that it can be easily used in isolation - this is the cleanest solution, and it can be used as a foundation for similar changes in other parts of code, but the question is if we are okay with amount of work it would take

ashcherbakov · 2019-02-08T08:35:43Z

plenum/test/view_change/test_view_change_wont_happen_if_ic_is_discarded.py

+    2. Restart nodes_to_restart (Beta, Gamma).
+    3. Wait OUTDATED_INSTANCE_CHANGES_CHECK_INTERVAL sec.
+    4. nodes_to_restart send InstanceChanges for all nodes.
+    5. Ensure elections done.


Please mention that VC didn't happen since IC from the panic node was discarded by timeout.

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

…_lost_due_to_restarts Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

Sergey Khoroshavin added 4 commits February 6, 2019 15:40

INDY-1911: Attempt to send INSTANCE_CHANGE on failed freshness check

61b57ec

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1911: Send INSTANCE_CHANGE on failed freshness check

9904e34

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1911: Integration tests improvements

4b37dbc

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1911: Small improvements to view changer

ace6284

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

ashcherbakov reviewed Feb 8, 2019

View reviewed changes

Sergey Khoroshavin added 6 commits February 8, 2019 12:49

INDY-1911: Use propose_view_change in more places

a7e2f39

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1911: Fix failing tests

9b97411

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1911: Get rid of runFor in test_view_change_with_instance_change…

3c36384

…_lost_due_to_restarts Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1911: Test documentation improvement

10454fb

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

INDY-1911: Make flake8 happy

3c3325d

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

Fix flakiness of test_metrics_config

dc1a7d9

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

skhoroshavin force-pushed the indy-1911 branch from bdcae11 to dc1a7d9 Compare February 8, 2019 13:13

INDY-1911: Add more tests

affcc27

Signed-off-by: Sergey Khoroshavin <sergey.khoroshavin@dsr-corporation.com>

ashcherbakov approved these changes Feb 11, 2019

View reviewed changes

ashcherbakov merged commit 5ccf4a3 into hyperledger:master Feb 11, 2019

skhoroshavin deleted the indy-1911 branch February 11, 2019 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INDY-1911: Send INSTANCE_CHANGE when state signatures are not fresh enough #1078

INDY-1911: Send INSTANCE_CHANGE when state signatures are not fresh enough #1078

skhoroshavin commented Feb 7, 2019

ashcherbakov Feb 8, 2019

skhoroshavin Feb 8, 2019

ashcherbakov Feb 8, 2019

skhoroshavin Feb 8, 2019

ashcherbakov Feb 8, 2019

ashcherbakov Feb 8, 2019

skhoroshavin Feb 8, 2019

skhoroshavin Feb 8, 2019

ashcherbakov Feb 8, 2019

ashcherbakov Feb 8, 2019

ashcherbakov Feb 8, 2019

skhoroshavin Feb 8, 2019

skhoroshavin Feb 8, 2019

ashcherbakov Feb 8, 2019

skhoroshavin Feb 8, 2019

		sdk_ensure_pool_functional(looper, txnPoolNodeSet, sdk_wallet_client, sdk_pool_handle)


		def test_view_change_happens_if_ordering_is_halted(looper, tconf, txnPoolNodeSet,

INDY-1911: Send INSTANCE_CHANGE when state signatures are not fresh enough #1078

INDY-1911: Send INSTANCE_CHANGE when state signatures are not fresh enough #1078

Conversation

skhoroshavin commented Feb 7, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment