Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INDY-1450: Request ledger statuses and consistency proofs in catchup #800

Merged
merged 13 commits into from
Jul 11, 2018

Conversation

Toktar
Copy link
Contributor

@Toktar Toktar commented Jul 9, 2018

Changes:

  • Send message request with ledger status if node did not recieve ledger status in some time.
  • Send ledger status for request consistency proof of last ordering if node have not f+1 consistensy proofs

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Changes:
- added tests for catchup with lost consistency_proof
- added resend ledger statuses for request first consistency proofs

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
@Toktar Toktar changed the title [WIP][ci skip][INDY-1450] Request ledger statuses in catchup INDY-1450: Request ledger statuses and consistency proofs in catchup Jul 9, 2018
@Toktar
Copy link
Contributor Author

Toktar commented Jul 9, 2018

test this please

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
@Toktar Toktar force-pushed the task-1450-lost-msgs-in-catchup branch from af5a1e8 to 2aa41b4 Compare July 9, 2018 13:39
@@ -86,6 +88,20 @@ def addLedger(self, iD: int, ledger: Ledger,
verifier=MerkleVerifier(ledger.hasher)
)

def request_ledger_status_if_needed(self, ledger_id):
ledgerInfo = self.getLedgerInfoByType(ledger_id)
nodes = [node for node in self.owner.nodeReg if node not in ledgerInfo.ledgerStatusOk]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a check node.name != self.owner.name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't need because request_ledger_status_from_nodes has this check.

ha=nodeHa, cliha=nodeCHa,
pluginPaths=allPluginsPath)
# patch canProcessConsistencyProof
monkeypatch.setattr(node_to_disconnect.ledgerManager,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we patch canProcessConsistencyProof or processConsistencyProof?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry. Fixed it.

def unpatch_after_call(proof):
if node_to_disconnect.ledgerManager.spylog.count(
LedgerManager.processConsistencyProof) <= 2:
tmp_method(proof)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But tmp_method is for the old instance of disconnected node (before restart)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
@Toktar Toktar force-pushed the task-1450-lost-msgs-in-catchup branch from 1bf3bd0 to 967e7c5 Compare July 10, 2018 07:53
@@ -174,7 +174,24 @@ def unpatch_after_call(proof):
config=tconf,
ha=nodeHa, cliha=nodeCHa,
pluginPaths=allPluginsPath)
# patch canProcessConsistencyProof

original_process_cp = node_to_disconnect.ledgerManager.canProcessConsistencyProof
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still canProcessConsistencyProof but need to be processConsistencyProof, right?

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
def request_ledger_status_if_needed(self, ledger_id):
ledgerInfo = self.getLedgerInfoByType(ledger_id)
nodes = [node for node in self.owner.nodeReg if node not in ledgerInfo.ledgerStatusOk]
self.owner.request_ledger_status_from_nodes(ledger_id, nodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will be if we do not gather sufficient count of LedgerStatuses this time too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we do this change in a separate task

recvdConsProof = ledgerInfo.recvdConsistencyProofs
ledger_status = self.owner.build_ledger_status(ledgerId)
nodes = [frm for frm in self.owner.nodeReg if
frm not in recvdConsProof and frm != self.owner.name]
Copy link
Contributor

@spivachuk spivachuk Jul 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should ask only the nodes that are definitely ahead of us and did not sent us ConsistencyProofs yet. (Now we also ask the nodes that have not even sent us LedgerStatuses.) To do this we should track the set of nodes which have already been asked for ConsistencyProofs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this comment for this PR since we decided to re-ask messages only once and so we should re-ask even those nodes which even potentially may be ahead of us.

ashcherbakov
ashcherbakov previously approved these changes Jul 10, 2018
@@ -661,6 +687,11 @@ def canStartCatchUpProcess(self, ledgerId: int):
adjustedQuorum.f + 1:
# At least once correct node believes that this node is behind.

# Stop last consistensy proof requesting.
if ledgerId in self.consistency_proof_action_ids:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also cancel re-asking for LedgerStatuses at this moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -315,6 +338,9 @@ def processLedgerStatus(self, status: LedgerStatus, frm: str):
format(self, ledgerInfo.ledgerStatusOk, ledgerId))
logger.info('{} found from ledger status {} that it does not need catchup'.
format(self, ledgerStatus))
if ledgerId in self.request_ledger_status_action_ids:
self._cancel(self.request_ledger_status_action_ids[ledgerId])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also cancel re-asking for last ConsistencyProofs at this moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@spivachuk spivachuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments.

consistency proof request (not for ledger status). In this case catchup node
has no quorum for proofing some size of transactions list. It need to request
CONSISTENCY_PROOFs again and finishes catchup. Test makes sure that the node
eventually finishes catchup'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this test does not verify what it must because all the ConsistencyProofs being received by the restarted node are the same. So when the restarted node gathers initial f+1 ConsistencyProofs, it starts catch-up, because these f+1 messages are the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this test and added new with check cancel of schedule with requesting ledger statuses and consistency proofs.

Changes:
- added cancel for request ledger statuses
- added cancel for request consistency proofs
- added test for check cancel scheduling ledger statuses and consistency proofs
- remove test test_catchup_with_lost_last_consistency_proof

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
def _cancel_request_ledger_statuses_and_consistency_proofs(self, ledger_id):
if ledger_id in self.request_ledger_status_action_ids:
self._cancel(self.request_ledger_status_action_ids[ledger_id])
self.request_ledger_status_action_ids[ledger_id] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to remove a cancelled action from the map and the the condition being verified above mismatch. It will be uniform and simpler to remove an entry from the map rather than to use None value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something like this?

if ledger_id in self.request_ledger_status_action_ids:
            action = self.request_ledger_status_action_ids.pop(ledger_id)
            self._cancel(action)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


if ledger_id in self.consistency_proof_action_ids:
self._cancel(self.consistency_proof_action_ids[ledger_id])
self.consistency_proof_action_ids[ledger_id] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as in the previous comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -291,6 +316,13 @@ def processLedgerStatus(self, status: LedgerStatus, frm: str):
if self.isLedgerOld(ledgerStatus):
ledger_status = self.owner.build_ledger_status(ledgerId)
self.sendTo(ledger_status, frm)
if ledgerId not in self.consistency_proof_action_ids or \
ledgerId in self.consistency_proof_action_ids and \
self.consistency_proof_action_ids[ledgerId] is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition will be simplified if _cancel_request_ledger_statuses_and_consistency_proofs is corrected (see the comment above).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

self._cancel(self.consistency_proof_action_ids[ledger_id])
self.consistency_proof_action_ids[ledger_id] = None

def request_ledger_status_if_needed(self, ledger_id):
Copy link
Contributor

@spivachuk spivachuk Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suffix _if_needed in the method name seems to be redundant because cancellation of the scheduled action is not a responsibility of this method itself.
It would be better to name this method reask_for_ledger_status.

Also at first this method should remove the triggered action from the map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

nodes = [node for node in self.owner.nodeReg if node not in ledgerInfo.ledgerStatusOk]
self.owner.request_ledger_status_from_nodes(ledger_id, nodes)

def request_last_CP(self, ledgerId):
Copy link
Contributor

@spivachuk spivachuk Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to name this method reask_for_last_CP.

Also at first this method should remove the triggered action from the map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to reask_for_last_consistency_proof(). Is it ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
@ashcherbakov ashcherbakov merged commit 4c506ac into hyperledger:master Jul 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants