-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INDY-1682: Remove replicas then instance performance degraded #948
INDY-1682: Remove replicas then instance performance degraded #948
Conversation
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
plenum/server/monitor.py
Outdated
avg_lat = self.getLatency(desired_inst_id) | ||
avg_lat_others_by_inst = [] | ||
for inst_id in self.instances.ids: | ||
if self.instances.masterId == inst_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should compare with desired_inst_id
, not with self.instances.masterId
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it
plenum/server/monitor.py
Outdated
logger.trace("{} master throughput ratio {} is acceptable.". | ||
format(self, r)) | ||
return tooLow | ||
if logging and r: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to compare with Delta before logs ( if logging and tooLow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it
plenum/server/monitor.py
Outdated
@@ -252,7 +252,7 @@ def metrics(self): | |||
Calculate and return the metrics. | |||
""" | |||
masterThrp, backupThrp = self.getThroughputs(self.instances.masterId) | |||
r = self.masterThroughputRatio() | |||
r = self.is_instance_throughput_too_low(self.instances.masterId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be instance_throughput_ratio(self.instances.masterId)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed it
plenum/server/node.py
Outdated
@@ -2883,7 +2892,7 @@ def _remove_replica_if_primary_lost(self, inst_id): | |||
and self.primaries_disconnection_times[inst_id] is not None \ | |||
and time.perf_counter() - self.primaries_disconnection_times[inst_id] >= \ | |||
self.config.TolerateBackupPrimaryDisconnection: | |||
self.replicas.remove_replica(inst_id) | |||
self.send_backup_instance_faulty([inst_id]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is del
missed here?
plenum/server/node.py
Outdated
@@ -3578,3 +3587,25 @@ def mark_request_as_executed(self, request: Request): | |||
authenticator = self.authNr(request.as_dict) | |||
if isinstance(authenticator, ReqAuthenticator): | |||
authenticator.clean_from_verified(request.key) | |||
|
|||
def process_backup_instance_faulty_msg(self, backup_faulty: BackupInstanceFaulty, frm: str) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the following form of the method is more clean and clear:
if getattr(backup_faulty, f.VIEW_NO.nm) != self.viewNo:
return
for inst_id in getattr(backup_faulty, f.INSTANCES.nm):
self.backup_instances_faulty.setdefault(inst_id, set()).add(frm)
if inst_id in self.replicas.keys():
continue
if not self.quorums.backup_instance_faulty.is_reached(
len(self.backup_instances_faulty[inst_id])):
continue
if self.name not in self.backup_instances_faulty[inst_id]:
continue
self.replicas.remove_replica(inst_id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you! But are you mean
if inst_id not in self.replicas.keys(): continue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
plenum/server/node.py
Outdated
@@ -386,7 +387,8 @@ def __init__(self, | |||
(CatchupReq, self.ledgerManager.processCatchupReq), | |||
(CatchupRep, self.ledgerManager.processCatchupRep), | |||
(CurrentState, self.process_current_state_message), | |||
(ObservedData, self.send_to_observer) | |||
(ObservedData, self.send_to_observer), | |||
(BackupInstanceFaulty, self.process_backup_instance_faulty_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can put all logic related to backup instance removing to a separate class (BackupInstanceFaultyProcessor)?
It can contains the following methods:
- process_backup_instance_faulty_msg
- on_backup_degradation
- restore
- __send_backup_instance_faulty
- __remove_replica
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -210,6 +211,11 @@ def is_behind_for_view(self) -> bool: | |||
|
|||
# EXTERNAL EVENTS | |||
|
|||
def on_backup_degradation(self, degraded_backups): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can put all logic related to backup instance removing to a separate class (BackupInstanceFaultyProcessor)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
logger = getlogger() | ||
|
||
|
||
class BackupInstanceFaultyProcessor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to cover the class by Unit Tests
8cad1e7
to
9238683
Compare
Changes:
ToDo: