Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INDY-1682: Remove replicas then instance performance degraded #948

Merged

Conversation

Toktar
Copy link
Contributor

@Toktar Toktar commented Oct 16, 2018

Changes:

  • Add new message BackupInstanceFaulty
  • Add sending BackupInstanceFaulty then backup instance degraded
  • Add backup degraded logic in all strategies in monitor
  • Add sending BackupInstanceFaulty then backup primary disconnected
  • Add processing BackupInstanceFaulty messages
  • Swith on monitoring logic from AccumulatingMonitorStrategy
  • Add tests

ToDo:

  • Add more tests

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
avg_lat = self.getLatency(desired_inst_id)
avg_lat_others_by_inst = []
for inst_id in self.instances.ids:
if self.instances.masterId == inst_id:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should compare with desired_inst_id, not with self.instances.masterId

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

logger.trace("{} master throughput ratio {} is acceptable.".
format(self, r))
return tooLow
if logging and r:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to compare with Delta before logs ( if logging and tooLow)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

@@ -252,7 +252,7 @@ def metrics(self):
Calculate and return the metrics.
"""
masterThrp, backupThrp = self.getThroughputs(self.instances.masterId)
r = self.masterThroughputRatio()
r = self.is_instance_throughput_too_low(self.instances.masterId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be instance_throughput_ratio(self.instances.masterId)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it

@@ -2883,7 +2892,7 @@ def _remove_replica_if_primary_lost(self, inst_id):
and self.primaries_disconnection_times[inst_id] is not None \
and time.perf_counter() - self.primaries_disconnection_times[inst_id] >= \
self.config.TolerateBackupPrimaryDisconnection:
self.replicas.remove_replica(inst_id)
self.send_backup_instance_faulty([inst_id])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is del missed here?

@@ -3578,3 +3587,25 @@ def mark_request_as_executed(self, request: Request):
authenticator = self.authNr(request.as_dict)
if isinstance(authenticator, ReqAuthenticator):
authenticator.clean_from_verified(request.key)

def process_backup_instance_faulty_msg(self, backup_faulty: BackupInstanceFaulty, frm: str) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following form of the method is more clean and clear:

if getattr(backup_faulty, f.VIEW_NO.nm) != self.viewNo:
    return
for inst_id in getattr(backup_faulty, f.INSTANCES.nm):
    self.backup_instances_faulty.setdefault(inst_id, set()).add(frm)
    if inst_id in self.replicas.keys():
       continue
    if not self.quorums.backup_instance_faulty.is_reached(
                            len(self.backup_instances_faulty[inst_id])):
        continue
    if self.name not in self.backup_instances_faulty[inst_id]:
        continue
    self.replicas.remove_replica(inst_id)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you! But are you mean
if inst_id not in self.replicas.keys(): continue ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -386,7 +387,8 @@ def __init__(self,
(CatchupReq, self.ledgerManager.processCatchupReq),
(CatchupRep, self.ledgerManager.processCatchupRep),
(CurrentState, self.process_current_state_message),
(ObservedData, self.send_to_observer)
(ObservedData, self.send_to_observer),
(BackupInstanceFaulty, self.process_backup_instance_faulty_msg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can put all logic related to backup instance removing to a separate class (BackupInstanceFaultyProcessor)?
It can contains the following methods:

  • process_backup_instance_faulty_msg
  • on_backup_degradation
  • restore
  • __send_backup_instance_faulty
  • __remove_replica

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -210,6 +211,11 @@ def is_behind_for_view(self) -> bool:

# EXTERNAL EVENTS

def on_backup_degradation(self, degraded_backups):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can put all logic related to backup instance removing to a separate class (BackupInstanceFaultyProcessor)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
logger = getlogger()


class BackupInstanceFaultyProcessor:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to cover the class by Unit Tests

@ashcherbakov ashcherbakov reopened this Oct 22, 2018
@ashcherbakov ashcherbakov reopened this Oct 23, 2018
@Toktar Toktar force-pushed the task-1682-remove-replicas-with-ic branch from 8cad1e7 to 9238683 Compare October 23, 2018 17:59
@Toktar Toktar changed the title [WIP][INDY-1682] Remove replicas then instance performance degraded INDY-1682: Remove replicas then instance performance degraded Oct 23, 2018
@ashcherbakov ashcherbakov merged commit 6ce3942 into hyperledger:master Oct 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants