Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INDY-1680: add function for removing replica #912

Merged
merged 22 commits into from
Sep 18, 2018

Conversation

Toktar
Copy link
Contributor

@Toktar Toktar commented Sep 7, 2018

Changes:

  • add function for remove a replica (and cleaning client requests for this replica)
  • restore the number of replicas during the view change
  • add tests for check this function and check the correct work of other systems
  • correctly process messages for switched off replicas
  • change a replicas list format in node.py to a SortedDict
  • change a replicas list format in monitors to a dict

Changes:
- add function to remove a replica (and cleaning client requests for this replica)
- add grow count of replicas in view change
- add tests for check this function and check the correct work of other systems

TODO:
- correctly process messages for switched off replicas (for implementation this replicas should be a dict)

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Changes:
- correctly process messages for switched off replicas (for implementation this replicas should be a dict)
- change a replicas list format to a dict

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
@Toktar Toktar changed the title INDY-1680: added function for removing replica INDY-1680: add function for removing replica Sep 7, 2018
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
…cas_status()

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Changes:
- add method add_replica
- add method remove_replica
- rename method to restoreReplicas
- adjustReplicas(from, to)
- change monitor replicas list to dict
- in _select_primaries add restore_replicas
- in restore_replicas in remove requests use requestQueues

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
@@ -87,7 +87,7 @@ def create_replicas(self, config=None):
return _TestReplicas(self, self.monitor, config)

def adjustReplicas(self):
Copy link
Contributor

@spivachuk spivachuk Sep 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this method violate the base class contract?
Also the logic of this method should be corrected since replicas is a dict now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it.

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>

@property
def masterId(self) -> Optional[int]:
"""
Return the index of the replica that belongs to the master protocol
instance
"""
return 0 if self.count > 0 else None
return 0 if len(self.started) > 0 else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct the condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

def removeInstance(self, index=None):
# TODO: This doesn't take into account index, but this function is never called with defined index,
# probably we can simplify this thing?
def removeInstance(self, index):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct the parameter name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. And in Instances.remove() changed parameter name too.

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
while len(self.replicas) > self.requiredNumberOfInstances:
self.replicas.shrink()
self.processStashedMsgsForReplica(old_required_number_of_instances)
old_required_number_of_instances += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter name does not suppose the value changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
@@ -73,15 +76,17 @@ def _master_replica(self):

def service_inboxes(self, limit: int = None):
number_of_processed_messages = \
sum(replica.serviceQueues(limit) for replica in self._replicas)
sum(replica.serviceQueues(limit) for replica in self._replicas.values())
return number_of_processed_messages

def pass_message(self, message, instance_id=None):
Copy link
Contributor

@spivachuk spivachuk Sep 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also Node.msgHasAcceptableInstId method should be corrected. It must stash messages for instances with a number higher than the current maximum required number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

def primaries(self) -> dict:
primaries = dict()
for r in self._replicas.values():
primaries[r.instId] = r
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property must return not replicas but names of instances primaries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
@@ -145,7 +145,7 @@ def __init__(self, *args, **kwargs):
notifierEventTriggeringConfig=notifierEventTriggeringConfig,
pluginPaths=pluginPaths)
for i in range(len(self.replicas)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more clear to iterate over the keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if index is None:
index = self.num_replicas - 1
replica = self._replicas.pop(index)
for request_key in replica.requestQueues:
Copy link
Contributor

@spivachuk spivachuk Sep 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requestQueues is not a request queue, it is a map from ledger IDs to request queues.
Is there a unit test verifying freeing requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed it.
Now all tests from test_replica_removing.py check it.

Copy link
Contributor

@spivachuk spivachuk Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to add verification of freeing the request by the replica (i.e. forwardedTo has been decreased by 1) to test_replica_removing_in_ordering.

ensureElectionsDone(looper=looper, nodes=txnPoolNodeSet)


def _check_replica_removed(node, start_replicas_count):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should take an instance ID. Also it should check that items related to the replica are removed from all the relevant collections and it should not verify conditions not related to removing the replica (such as master degradation and requests freeing).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check that replica removed from all relevant collections and changed parameters list. But I think, we should check things as a master degraded in _check_replica_removed() because it's a proof that replica was successfully removed without negative effects.

Copy link
Contributor

@spivachuk spivachuk Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should check this but not in _check_replica_removed method since it is not its responsibility according to its name.
Also assert len(node.requests) == 0 should not be a responsibility of this method too.

index = start_replicas_count - 1
node.replicas.remove_replica(index)
_check_replica_removed(node, start_replicas_count)
# trigger view change on all nodes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such the teardown sections in tests make them dependent on each other. Instead of this introduce a fixture which ensures view change and use this fixture in every test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

looper.run(checkNodesConnected(txnPoolNodeSet))
waitForViewChange(looper, txnPoolNodeSet, expectedViewNo=1,
customTimeout=2 * tconf.VIEW_CHANGE_TIMEOUT)
ensureElectionsDone(looper=looper, nodes=txnPoolNodeSet)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restoration of replicas should be verified in an obvious way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
old_required_number_of_instances: int,
new_required_number_of_instances: int):
r = super().adjustReplicas(old_required_number_of_instances,
new_required_number_of_instances)
if r > 0:
if hasattr(self, 'sent_pps'):
new_replicas = self.replicas[-r:]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.replicas is a dict now. Correct its usage here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
new_replicas = self.replicas[-r:]
new_replicas = [r for inst_id, r in self.replicas
if old_required_number_of_instances >=
inst_id < new_required_number_of_instances]
Copy link
Contributor

@spivachuk spivachuk Sep 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the inequation.

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
changes:
- updated test_request_time_tracker.py
- updated test_replica_removing_before_ordering()

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
@andkononykhin andkononykhin merged commit 2e1b95f into hyperledger:master Sep 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants