INDY-1112: change primeries election procedure for backup instances. #539

sergey-shilov · 2018-02-19T16:04:29Z

Now primaries for backup instances are choosen in round-robin
manner always starting from primary. If the next node is a primary
for some instance then this node is skipped. So the first non-primary
node is choosen as primary for current instance. Such approach allows
to avoid election of instances of the same node as a primeries for
different instances.
The election procedure of the primary for master instance is not changed.

Signed-off-by: Sergey Shilov sergey.shilov@dsr-company.com

ghost · 2018-02-19T16:04:31Z

Could one of the committers please verify this patch?

Now primaries for backup instances are choosen in round-robin manner always starting from primary. If the next node is a primary for some instance then this node is skipped. So the first non-primary node is choosen as primary for current instance. Such approach allows to avoid election of instances of the same node as a primeries for different instances. The election procedure of the primary for master instance is not changed. Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

lampkin-diet · 2018-02-20T07:01:29Z

plenum/server/primary_selector.py

+        logger.trace("{} selected {} as next primary node for instId {}, "
+                     "viewNo {} with rank {}, nodeReg {}".format(
+                         self, name, instance_id, self.viewNo, rank, nodeReg))
+        assert name, "{} failed to get next primary node name".format(self)


Do we "assert name" before logging that primary was selected?
What should we do if name is None?

Of course "assert name" should be before logging.
If name is None then Apocalypse has come. It can not be None by design, otherwise implementation is incorrect.

ashcherbakov · 2018-02-20T07:59:09Z

plenum/server/node.py

+                '''
+                if instance_id == 0:
+                    primary_rank = self.poolManager.get_rank_by_name(
+                        replica.primaryName.split(":", 1)[0], nodeReg)


Why don't use name variable here instead of replica.primaryName.split(":", 1)[0]?

Because replica.primaryName in fact is not just node name, it has a form "node_name:instance_id". So it would be nice to rename replica.primaryName to replica.primaryInstanceName, I'll check how wide re-factoring it will cause.

ashcherbakov · 2018-02-20T08:04:30Z

plenum/server/primary_selector.py

+    def next_primary_replica_name_for_backup(self, instance_id, master_primary_rank,
+                                             primaries, nodeReg=None):
+        """
+        Returns name of the next node which is supposed to be a new Primary


Please correct the description since it's not a real round robin, but it takes into account the other primaries.
Please mention that we return two values, not just name.

Hmm... On my opinion it is pure round robin, the only thing is that we may skip some nodes, but it is round robin anyway.
As for primary for master replica - this is not round robin, I think. There is a formula that is not changed in scope of these changes:
(view_no + instance_id) % total_nodes
I really don't see round robin here.
As for returned two values - done.

ashcherbakov · 2018-02-20T08:04:34Z

plenum/server/primary_selector.py

+
+        return name
+
+    def next_primary_replica_name_for_master(self, nodeReg=None):
        """
        Returns name of the next node which is supposed to be a new Primary


Please mention that we return two values, not just name.

ashcherbakov · 2018-02-20T08:04:47Z

plenum/server/primary_selector.py

+
+        return name
+
+    def next_primary_replica_name_for_master(self, nodeReg=None):


Please add a unit test for this.

ashcherbakov · 2018-02-20T08:04:52Z

plenum/server/primary_selector.py

+        name = self.next_primary_node_name_for_master(nodeReg)
+        return name, Replica.generateName(nodeName=name, instId=0)
+
+    def next_primary_replica_name_for_backup(self, instance_id, master_primary_rank,


Please add a unit test for this.

ashcherbakov · 2018-02-20T08:05:12Z

plenum/server/node.py

@@ -2354,15 +2354,42 @@ def lost_master_primary(self):
        self._schedule_view_change()

    def select_primaries(self, nodeReg: Dict[str, HA]=None):
+        primaries = set()


Please add an integration test for this.

ashcherbakov · 2018-02-20T08:10:37Z

plenum/server/node.py

+        Build a set of names of primaries, it is needed to avoid
+        duplicates of primary nodes for different replicas.
+        '''
+        for instance_id, replica in enumerate(self.replicas):


Do we need to keep track of 'old' primaries just to avoid selecting the same primaries as was before?
Please add a test for this!

ashcherbakov · 2018-02-20T08:11:14Z

plenum/server/node.py

@@ -2354,15 +2354,42 @@ def lost_master_primary(self):
        self._schedule_view_change()

    def select_primaries(self, nodeReg: Dict[str, HA]=None):
+        primaries = set()


Should we move all this logic to primary_selector?

We have not information about primaries on primary_selector level.

ashcherbakov · 2018-02-20T08:11:54Z

plenum/server/primary_selector.py

+
+        return name
+
+    def next_primary_node_name_for_backup(self, instance_id, nodeReg=None):


Do we need this method at all?

No, it is already deleted.

ashcherbakov · 2018-02-20T08:12:11Z

plenum/server/primary_selector.py

@@ -54,14 +54,56 @@ def next_primary_node_name(self, instance_id, nodeReg=None):

        return name

-    def next_primary_replica_name(self, instance_id, nodeReg=None):
+    def next_primary_node_name_for_master(self, nodeReg=None):


Looks like this method must be private (with _ prefix).

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

sergey-shilov · 2018-02-20T12:57:26Z

Test this please.

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

ashcherbakov · 2018-02-21T16:00:37Z

plenum/test/primary_selection/test_primary_selection_after_primary_demotion_and_promotion.py

+    assert primariesIdxs[1] == 1
+
+    master_node = txnPoolMasterNodes[0]
+    client, wallet = stewardAndWalletForMasterNode


Can we use SDK here?

ashcherbakov · 2018-02-21T16:29:38Z

plenum/test/primary_selection/test_primary_selector.py

+        yield nodes
+
+
+def test_primaties_selection(txnPoolNodeSetWithElector):


There is a typo in the test name

ashcherbakov · 2018-02-21T16:29:54Z

plenum/test/primary_selection/test_primary_selector.py

+        yield nodes
+
+
+def test_primaties_selection(txnPoolNodeSetWithElector):


Please split the test into a multiple of small tests.

ashcherbakov · 2018-02-21T16:30:34Z

plenum/test/primary_selection/test_primary_selector.py

+        primaries = set()
+        view_no_bak = node.elector.viewNo
+        node.elector.viewNo = view_no
+        for instance_id in range(0, node.replicas.num_replicas):


Can we write it in a more simple way without a loop and if-else conditions?

Re-factored.

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

…oolManager. Now a node adds itself to nodeReg during catch-up of pool ledger. Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

ashcherbakov · 2018-02-27T07:36:49Z

test this please

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

ashcherbakov · 2018-02-27T13:10:44Z

plenum/server/pool_manager.py

@@ -272,6 +272,18 @@ def nodeHaChanged(self, txn):
        # TODO: Check if new HA is same as old HA and only update if
        # new HA is different.
        if nodeName == self.name:
+            # Update itself in node registry if needed
+            (ip, port) = self.node.nodeReg[nodeName]
+            if ip != txn[DATA][NODE_IP] or port != txn[DATA][NODE_PORT]:


What if txn conatins IP only (or PORT only)?

Bad things will happen in this case... Additional checks should be added.

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

sergey-shilov · 2018-02-28T07:46:02Z

Test this please.

…yperledger#539) * INDY-1112: change primeries election procedure for backup instances. Now primaries for backup instances are choosen in round-robin manner always starting from primary. If the next node is a primary for some instance then this node is skipped. So the first non-primary node is choosen as primary for current instance. Such approach allows to avoid election of instances of the same node as a primeries for different instances. The election procedure of the primary for master instance is not changed. Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com> * Add some stylistical fixes, add comments. Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com> * Add test for primary demotion and promotion. Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com> * Add test for primaries selection routines. Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com> * Re-factor primary selector tests. Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com> * Remove adding of node itself to nodeReg during initialisation of txnPoolManager. Now a node adds itself to nodeReg during catch-up of pool ledger. Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com> * Fix test. Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com> * Update node's HA in node registry on pool ledger catch-up reply. Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com> * Add separate checks for IP/port to be updated for HA and cliHA. Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

sergey-shilov force-pushed the fix/INDY-1112 branch from a9765be to 8773548 Compare February 19, 2018 16:09

lampkin-diet reviewed Feb 20, 2018

View reviewed changes

ashcherbakov requested changes Feb 20, 2018

View reviewed changes

Add some stylistical fixes, add comments.

dc7e5ca

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

Sergey Shilov added 2 commits February 20, 2018 19:01

Add test for primary demotion and promotion.

a6b4295

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

Add test for primaries selection routines.

e676717

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

ashcherbakov reviewed Feb 21, 2018

View reviewed changes

Sergey Shilov added 4 commits February 22, 2018 13:58

Re-factor primary selector tests.

8ba6eb8

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

Remove adding of node itself to nodeReg during initialisation of txnP…

ba95dd9

…oolManager. Now a node adds itself to nodeReg during catch-up of pool ledger. Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

Merge remote-tracking branch 'base/master' into fix/INDY-1112

22402fa

Fix test.

b964c79

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

Update node's HA in node registry on pool ledger catch-up reply.

2e400d1

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

ashcherbakov reviewed Feb 27, 2018

View reviewed changes

Add separate checks for IP/port to be updated for HA and cliHA.

b59f0a8

Signed-off-by: Sergey Shilov <sergey.shilov@dsr-company.com>

ashcherbakov approved these changes Feb 28, 2018

View reviewed changes

ashcherbakov merged commit f4a5fa8 into hyperledger:master Feb 28, 2018

Toktar mentioned this pull request Mar 5, 2018

INDY-1112: change primeries election procedure for backup instances. … Toktar/indy-plenum#1

Merged


		return name

		def next_primary_replica_name_for_master(self, nodeReg=None):


		return name

		def next_primary_node_name_for_backup(self, instance_id, nodeReg=None):

		yield nodes


		def test_primaties_selection(txnPoolNodeSetWithElector):

INDY-1112: change primeries election procedure for backup instances. #539

INDY-1112: change primeries election procedure for backup instances. #539

Conversation

sergey-shilov commented Feb 19, 2018

ghost commented Feb 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sergey-shilov commented Feb 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ashcherbakov commented Feb 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sergey-shilov commented Feb 28, 2018