INDY-1683: bugfix backup_instance_faulty_processor with quorum logic #954

Toktar · 2018-10-24T14:15:07Z

Changes:

add tests
bugfix backup_instance_faulty_processor with quorum logic

Changes: - add tests - bugfix backup_instance_faulty_processor with quorum logic Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

ashcherbakov · 2018-10-25T08:13:59Z

plenum/test/replica/test_instance_faulty_processor.py

+from plenum.test.test_node import ensureElectionsDone, checkNodesConnected
+
+
+@pytest.fixture(scope="module", params=[{"instance_degraded": "local", "instance_primary_disconnected": "local"},


I think we can write unit tests for BackupInstanceFaultyProcessor. Please have a look at fake_node fixture we can use for it.
I can imagine the following unit tests:

test that 1 call of on_backup_degradation leads to replica removal with local strategy

test that 1 call of on_backup_degradation does not lead to replica removal with quorum strategy

test that 1 call of on_backup_primary_disconnected leads to replica removal with local strategy

test that 1 call of on_backup_primary_disconnected does not lead to replica removal with quorum strategy

test that process_backup_instance_faulty_msg does not lead to removal in case of local strategy

test that f calls of process_backup_instance_faulty_msg with same messages does not lead to removal in case of quorum strategy

test that f+1 calls of process_backup_instance_faulty_msg with same messages leads to removal in case of quorum strategy

test that BackupInstanceFaulty from our Node is not required to remove replica (just a quorum of any nodes)

test that only BackupInstanceFaulty with the same viewNo as the current one leads to removal

test that n BackupInstanceFaulty with different instances where we have f count for every instance in total doesn't lead to removal (in case of quorum)

test that n BackupInstanceFaulty with different instances where we have f+1 count for 1 instance and f for others lead to removal this 1 instance only

test that n BackupInstanceFaulty with different instances where we have f+1 for all instances in total leads to removal of all instances

test that restore_replicas restores all replicas

test BackupInstanceFaulty with empty instances values

Added test_on_backup_degradation_for_one with local strategy

Added test_on_backup_degradation_for_one with quorum strategy

Added test_on_backup_primary_disconnected_for_one with local strategy

Added test_on_backup_primary_disconnected_for_one with quorum strategy

Was in method __process_backup_instance_faulty_msg_work_with_different_msgs()

Added check in __process_backup_instance_faulty_msg_work_with_different_msgs() but I don't sure that this check is correct. Should we sleep some time and what time?

In test_process_backup_instance_faulty_msg() we send n-1 message and it's more than f. Is it enough?

Changed sending in __process_backup_instance_faulty_msg_work_with_different_msgs()

Added test_process_backup_instance_with_incorrect_view_no

Why shouldn't remove?

Was in test_restore_replicas

Added test_process_backup_instance_empty_msg

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

…nto task-1682-remove-replicas-with-ic

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

…nto task-1682-remove-replicas-with-ic

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

Changes: - change logic to remove replica with quorum of own messages - add tests to test_instance_faulty_processor.py Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

Changes: - refactoring test_instance_faulty_processor.py - update test_replica_removing_after_node_started.py - change REPLICAS_REMOVING_WITH_DEGRADATION to quorum Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

…nto task-1682-remove-replicas-with-ic

ashcherbakov · 2018-11-07T06:41:04Z

plenum/server/backup_instance_faulty_processor.py

            if inst_id not in self.node.replicas.keys():
                continue
+            self.backup_instances_faulty.setdefault(inst_id, dict()).setdefault(frm, 0)
+            self.backup_instances_faulty[inst_id].setdefault(self.node.name, 0)
+            self.backup_instances_faulty[inst_id][frm] += 1
            if not self.node.quorums.backup_instance_faulty.is_reached(


Please add a comment explaining this condition

I will add. This code add default value for message sender and for own node messages. But may be you see cleaner way to do it.

ashcherbakov · 2018-11-07T07:02:25Z

plenum/test/replica/test_replica_removing_with_backup_degraded.py

-    """
-    instance_to_remove = 1
-    view_no = txnPoolNodeSet[0].viewNo
+      Node will change view even though it does not find the master to be degraded


Are we talking about a view change or removing of replicas here? Is the comment correct?

ashcherbakov · 2018-11-07T07:09:53Z

plenum/server/backup_instance_faulty_processor.py

            if not self.node.quorums.backup_instance_faulty.is_reached(
-                    len(self.backup_instances_faulty[inst_id])):
+                    len(self.backup_instances_faulty[inst_id].keys())) \
+                    and not self.node.quorums.backup_instance_faulty.is_reached(


Should we use a different quorum here?

Should we take into account only degradations from this node in a row, that is if we face degradation once per hour, it should not be accumulated, and replica should not be removed, right?

As for the second item: should we send BackupInstanceNotFaulty when there is no degradation observed during the next check in monitor? BackupInstanceNotFaulty could clear all backup_instance_faulty for this node .
Unit tests need to be added for this.

ashcherbakov · 2018-11-07T07:18:28Z

plenum/test/replica/test_instance_faulty_processor.py

+
+    backup_instance_faulty_processor.on_backup_degradation(degraded_backups)
+
+    assert not (set(node.replicas.remove_replica_calls) - set(degraded_backups))


Why can't we just compare two sets here? assert set(node.replicas.remove_replica_calls) == set(degraded_backups)

ashcherbakov · 2018-11-07T07:19:56Z

plenum/test/replica/test_instance_faulty_processor.py

+
+    backup_instance_faulty_processor.restore_replicas()
+    # check that all replicas were restored and backup_instances_faulty has been cleaned
+    assert not backup_instance_faulty_processor.backup_instances_faulty


But was it non-empty before we call restore_replicas?

ashcherbakov · 2018-11-07T07:28:35Z

plenum/test/replica/test_instance_faulty_processor.py

+    assert nodes.issubset(backup_instance_faulty_processor.backup_instances_faulty[instance_to_remove])
+    assert not node.replicas.remove_replica_calls
+
+    # check that messages from all nodes lead to replica removing


The comment says that we should send just 1 more message (sufficient for quorum)

ashcherbakov · 2018-11-07T07:32:34Z

plenum/test/replica/test_instance_faulty_processor.py

+    node = FakeNode(tdir, config=tconf)
+    node.view_change_in_progress = False
+    node.requiredNumberOfInstances = len(node.replicas)
+    node.allNodeNames = ["Node{}".format(i)


Should it be "Node{}".format(i+1)?

We have a range(1, (node.requiredNumberOfInstances - 1) * 3 + 2) . But I will change it to range((node.requiredNumberOfInstances - 1) * 3 + 1) to make it code more understandable.

INDY-1683: bugfix backup_instance_faulty_processor with quorum logic

e32f63b

Changes: - add tests - bugfix backup_instance_faulty_processor with quorum logic Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

ashcherbakov reviewed Oct 25, 2018

View reviewed changes

Toktar added 11 commits October 25, 2018 15:08

INDY-1682: fix intermittent tests

caa9d63

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

INDY-1682: add tests to test_instance_faulty_processor

d7ea7dd

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

indy-1682: add test_process_backup_instance_with_incorrect_view_no

1e60090

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

Merge branch 'master' of https://github.com/hyperledger/indy-plenum i…

ef64ede

…nto task-1682-remove-replicas-with-ic

INDY-1682: update test_instance_faulty_processor.py

1d43bae

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

Merge branch 'master' of https://github.com/hyperledger/indy-plenum i…

e04016f

…nto task-1682-remove-replicas-with-ic

INDY-1683: added tests to test_instance_faulty_processor.py

83e77a4

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

INDY-1683: add tests in test_instance_faulty_processor.py

de76c92

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

INDY-1683: change quorum logic for backup_instance_faulty_processor

29a945c

Changes: - change logic to remove replica with quorum of own messages - add tests to test_instance_faulty_processor.py Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

INDY-1683: change replica removing strategies in config

db55461

Changes: - refactoring test_instance_faulty_processor.py - update test_replica_removing_after_node_started.py - change REPLICAS_REMOVING_WITH_DEGRADATION to quorum Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

Merge branch 'master' of https://github.com/hyperledger/indy-plenum i…

6280b32

…nto task-1682-remove-replicas-with-ic

ashcherbakov reviewed Nov 7, 2018

View reviewed changes

ashcherbakov approved these changes Nov 7, 2018

View reviewed changes

ashcherbakov merged commit 85d9bb4 into hyperledger:master Nov 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INDY-1683: bugfix backup_instance_faulty_processor with quorum logic #954

INDY-1683: bugfix backup_instance_faulty_processor with quorum logic #954

Toktar commented Oct 24, 2018

ashcherbakov Oct 25, 2018 •

edited

Toktar Oct 25, 2018 •

edited

ashcherbakov Nov 7, 2018

Toktar Nov 7, 2018

ashcherbakov Nov 7, 2018

ashcherbakov Nov 7, 2018

ashcherbakov Nov 7, 2018

ashcherbakov Nov 7, 2018

ashcherbakov Nov 7, 2018

ashcherbakov Nov 7, 2018

ashcherbakov Nov 7, 2018

Toktar Nov 7, 2018

		from plenum.test.test_node import ensureElectionsDone, checkNodesConnected


		@pytest.fixture(scope="module", params=[{"instance_degraded": "local", "instance_primary_disconnected": "local"},


		backup_instance_faulty_processor.on_backup_degradation(degraded_backups)

		assert not (set(node.replicas.remove_replica_calls) - set(degraded_backups))

INDY-1683: bugfix backup_instance_faulty_processor with quorum logic #954

INDY-1683: bugfix backup_instance_faulty_processor with quorum logic #954

Conversation

Toktar commented Oct 24, 2018

ashcherbakov Oct 25, 2018 • edited

Choose a reason for hiding this comment

Toktar Oct 25, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ashcherbakov Oct 25, 2018 •

edited

Toktar Oct 25, 2018 •

edited