INDY-1682: Remove replicas then instance performance degraded #948

Toktar · 2018-10-16T13:27:20Z

Changes:

Add new message BackupInstanceFaulty
Add sending BackupInstanceFaulty then backup instance degraded
Add backup degraded logic in all strategies in monitor
Add sending BackupInstanceFaulty then backup primary disconnected
Add processing BackupInstanceFaulty messages
Swith on monitoring logic from AccumulatingMonitorStrategy
Add tests

ToDo:

Add more tests

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

ashcherbakov · 2018-10-17T11:16:05Z

plenum/server/monitor.py

+        avg_lat = self.getLatency(desired_inst_id)
+        avg_lat_others_by_inst = []
+        for inst_id in self.instances.ids:
+            if self.instances.masterId == inst_id:


I think we should compare with desired_inst_id, not with self.instances.masterId

ashcherbakov · 2018-10-17T11:18:37Z

plenum/server/monitor.py

-            logger.trace("{} master throughput ratio {} is acceptable.".
-                         format(self, r))
-        return tooLow
+        if logging and r:


We need to compare with Delta before logs ( if logging and tooLow)

ashcherbakov · 2018-10-17T11:20:28Z

plenum/server/monitor.py

@@ -252,7 +252,7 @@ def metrics(self):
        Calculate and return the metrics.
        """
        masterThrp, backupThrp = self.getThroughputs(self.instances.masterId)
-        r = self.masterThroughputRatio()
+        r = self.is_instance_throughput_too_low(self.instances.masterId)


Should it be instance_throughput_ratio(self.instances.masterId)?

ashcherbakov · 2018-10-17T11:22:46Z

plenum/server/node.py

@@ -2883,7 +2892,7 @@ def _remove_replica_if_primary_lost(self, inst_id):
                and self.primaries_disconnection_times[inst_id] is not None \
                and time.perf_counter() - self.primaries_disconnection_times[inst_id] >= \
                self.config.TolerateBackupPrimaryDisconnection:
-            self.replicas.remove_replica(inst_id)
+            self.send_backup_instance_faulty([inst_id])


Is del missed here?

ashcherbakov · 2018-10-17T11:30:26Z

plenum/server/node.py

@@ -3578,3 +3587,25 @@ def mark_request_as_executed(self, request: Request):
        authenticator = self.authNr(request.as_dict)
        if isinstance(authenticator, ReqAuthenticator):
            authenticator.clean_from_verified(request.key)
+
+    def process_backup_instance_faulty_msg(self, backup_faulty: BackupInstanceFaulty, frm: str) -> None:


I think the following form of the method is more clean and clear:

if getattr(backup_faulty, f.VIEW_NO.nm) != self.viewNo: return for inst_id in getattr(backup_faulty, f.INSTANCES.nm): self.backup_instances_faulty.setdefault(inst_id, set()).add(frm) if inst_id in self.replicas.keys(): continue if not self.quorums.backup_instance_faulty.is_reached( len(self.backup_instances_faulty[inst_id])): continue if self.name not in self.backup_instances_faulty[inst_id]: continue self.replicas.remove_replica(inst_id)

Yes, thank you! But are you mean
if inst_id not in self.replicas.keys(): continue ?

ashcherbakov · 2018-10-17T12:32:39Z

plenum/server/node.py

@@ -386,7 +387,8 @@ def __init__(self,
            (CatchupReq, self.ledgerManager.processCatchupReq),
            (CatchupRep, self.ledgerManager.processCatchupRep),
            (CurrentState, self.process_current_state_message),
-            (ObservedData, self.send_to_observer)
+            (ObservedData, self.send_to_observer),
+            (BackupInstanceFaulty, self.process_backup_instance_faulty_msg)


Maybe we can put all logic related to backup instance removing to a separate class (BackupInstanceFaultyProcessor)?
It can contains the following methods:

process_backup_instance_faulty_msg

on_backup_degradation

restore

__send_backup_instance_faulty

__remove_replica

ashcherbakov · 2018-10-17T12:32:55Z

plenum/server/view_change/view_changer.py

@@ -210,6 +211,11 @@ def is_behind_for_view(self) -> bool:

    # EXTERNAL EVENTS

+    def on_backup_degradation(self, degraded_backups):


Maybe we can put all logic related to backup instance removing to a separate class (BackupInstanceFaultyProcessor)?

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

ashcherbakov · 2018-10-22T12:41:53Z

plenum/server/backup_instance_faulty_processor.py

+logger = getlogger()
+
+
+class BackupInstanceFaultyProcessor:


It would be great to cover the class by Unit Tests

Toktar added 7 commits October 16, 2018 15:24

INDY-1682: remove replicas in backup degraded

f43096c

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

INDY-1682: add sending BackupInstanceFaulty then primary disconnected

e0c6ace

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

INDY-1682: add test to backup degraded

ce1a4f7

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

INDY-1682: switch on AccumulatingMonitorStrategy logic for monitor

cd0b76c

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

INDY-1682: add REASON to BackupInstanceFaulty

9d5d1fa

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

INDY-1682: Update tests

ab116da

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

INDY-1682: Update tests

bec30bc

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

ashcherbakov reviewed Oct 17, 2018

View reviewed changes

Toktar added 3 commits October 17, 2018 15:48

INDY-1682: bugfix in monitor.py

deb0ab7

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

INDY-1682: added BackupInstanceFaultyProcessor

ec06e50

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

INDY-1682: code style

9238683

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

ashcherbakov reviewed Oct 22, 2018

View reviewed changes

ashcherbakov closed this Oct 22, 2018

ashcherbakov reopened this Oct 22, 2018

ashcherbakov closed this Oct 23, 2018

ashcherbakov reopened this Oct 23, 2018

Toktar force-pushed the task-1682-remove-replicas-with-ic branch from 8cad1e7 to 9238683 Compare October 23, 2018 17:59

Toktar changed the title ~~[WIP][INDY-1682] Remove replicas then instance performance degraded~~ INDY-1682: Remove replicas then instance performance degraded Oct 23, 2018

ashcherbakov approved these changes Oct 24, 2018

View reviewed changes

ashcherbakov merged commit 6ce3942 into hyperledger:master Oct 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

INDY-1682: Remove replicas then instance performance degraded #948

INDY-1682: Remove replicas then instance performance degraded #948

Toktar commented Oct 16, 2018 •

edited

ashcherbakov Oct 17, 2018

Toktar Oct 17, 2018

ashcherbakov Oct 17, 2018

Toktar Oct 17, 2018

ashcherbakov Oct 17, 2018

Toktar Oct 17, 2018

ashcherbakov Oct 17, 2018

ashcherbakov Oct 17, 2018

Toktar Oct 17, 2018

ashcherbakov Oct 17, 2018

Toktar Oct 17, 2018

ashcherbakov Oct 17, 2018

Toktar Oct 19, 2018

ashcherbakov Oct 17, 2018

Toktar Oct 19, 2018

ashcherbakov Oct 22, 2018

		@@ -210,6 +211,11 @@ def is_behind_for_view(self) -> bool:

		# EXTERNAL EVENTS

		def on_backup_degradation(self, degraded_backups):

INDY-1682: Remove replicas then instance performance degraded #948

INDY-1682: Remove replicas then instance performance degraded #948

Conversation

Toktar commented Oct 16, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Toktar commented Oct 16, 2018 •

edited