-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INDY-1450: Request ledger statuses and consistency proofs in catchup #800
INDY-1450: Request ledger statuses and consistency proofs in catchup #800
Conversation
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Changes: - added tests for catchup with lost consistency_proof - added resend ledger statuses for request first consistency proofs Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
…nto task-1450-lost-msgs-in-catchup
test this please |
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
af5a1e8
to
2aa41b4
Compare
plenum/common/ledger_manager.py
Outdated
@@ -86,6 +88,20 @@ def addLedger(self, iD: int, ledger: Ledger, | |||
verifier=MerkleVerifier(ledger.hasher) | |||
) | |||
|
|||
def request_ledger_status_if_needed(self, ledger_id): | |||
ledgerInfo = self.getLedgerInfoByType(ledger_id) | |||
nodes = [node for node in self.owner.nodeReg if node not in ledgerInfo.ledgerStatusOk] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add a check node.name != self.owner.name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we don't need because request_ledger_status_from_nodes has this check.
ha=nodeHa, cliha=nodeCHa, | ||
pluginPaths=allPluginsPath) | ||
# patch canProcessConsistencyProof | ||
monkeypatch.setattr(node_to_disconnect.ledgerManager, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we patch canProcessConsistencyProof
or processConsistencyProof
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry. Fixed it.
def unpatch_after_call(proof): | ||
if node_to_disconnect.ledgerManager.spylog.count( | ||
LedgerManager.processConsistencyProof) <= 2: | ||
tmp_method(proof) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But tmp_method
is for the old instance of disconnected node (before restart)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
1bf3bd0
to
967e7c5
Compare
@@ -174,7 +174,24 @@ def unpatch_after_call(proof): | |||
config=tconf, | |||
ha=nodeHa, cliha=nodeCHa, | |||
pluginPaths=allPluginsPath) | |||
# patch canProcessConsistencyProof | |||
|
|||
original_process_cp = node_to_disconnect.ledgerManager.canProcessConsistencyProof |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still canProcessConsistencyProof
but need to be processConsistencyProof
, right?
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
…nto task-1450-lost-msgs-in-catchup
plenum/common/ledger_manager.py
Outdated
def request_ledger_status_if_needed(self, ledger_id): | ||
ledgerInfo = self.getLedgerInfoByType(ledger_id) | ||
nodes = [node for node in self.owner.nodeReg if node not in ledgerInfo.ledgerStatusOk] | ||
self.owner.request_ledger_status_from_nodes(ledger_id, nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will be if we do not gather sufficient count of LedgerStatuses this time too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that we do this change in a separate task
plenum/common/ledger_manager.py
Outdated
recvdConsProof = ledgerInfo.recvdConsistencyProofs | ||
ledger_status = self.owner.build_ledger_status(ledgerId) | ||
nodes = [frm for frm in self.owner.nodeReg if | ||
frm not in recvdConsProof and frm != self.owner.name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should ask only the nodes that are definitely ahead of us and did not sent us ConsistencyProofs yet. (Now we also ask the nodes that have not even sent us LedgerStatuses.) To do this we should track the set of nodes which have already been asked for ConsistencyProofs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore this comment for this PR since we decided to re-ask messages only once and so we should re-ask even those nodes which even potentially may be ahead of us.
plenum/common/ledger_manager.py
Outdated
@@ -661,6 +687,11 @@ def canStartCatchUpProcess(self, ledgerId: int): | |||
adjustedQuorum.f + 1: | |||
# At least once correct node believes that this node is behind. | |||
|
|||
# Stop last consistensy proof requesting. | |||
if ledgerId in self.consistency_proof_action_ids: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also cancel re-asking for LedgerStatuses at this moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
plenum/common/ledger_manager.py
Outdated
@@ -315,6 +338,9 @@ def processLedgerStatus(self, status: LedgerStatus, frm: str): | |||
format(self, ledgerInfo.ledgerStatusOk, ledgerId)) | |||
logger.info('{} found from ledger status {} that it does not need catchup'. | |||
format(self, ledgerStatus)) | |||
if ledgerId in self.request_ledger_status_action_ids: | |||
self._cancel(self.request_ledger_status_action_ids[ledgerId]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also cancel re-asking for last ConsistencyProofs at this moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see comments.
consistency proof request (not for ledger status). In this case catchup node | ||
has no quorum for proofing some size of transactions list. It need to request | ||
CONSISTENCY_PROOFs again and finishes catchup. Test makes sure that the node | ||
eventually finishes catchup''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this test does not verify what it must because all the ConsistencyProofs being received by the restarted node are the same. So when the restarted node gathers initial f+1 ConsistencyProofs, it starts catch-up, because these f+1 messages are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this test and added new with check cancel of schedule with requesting ledger statuses and consistency proofs.
Changes: - added cancel for request ledger statuses - added cancel for request consistency proofs - added test for check cancel scheduling ledger statuses and consistency proofs - remove test test_catchup_with_lost_last_consistency_proof Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
…nto task-1450-lost-msgs-in-catchup
plenum/common/ledger_manager.py
Outdated
def _cancel_request_ledger_statuses_and_consistency_proofs(self, ledger_id): | ||
if ledger_id in self.request_ledger_status_action_ids: | ||
self._cancel(self.request_ledger_status_action_ids[ledger_id]) | ||
self.request_ledger_status_action_ids[ledger_id] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way to remove a cancelled action from the map and the the condition being verified above mismatch. It will be uniform and simpler to remove an entry from the map rather than to use None
value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean something like this?
if ledger_id in self.request_ledger_status_action_ids:
action = self.request_ledger_status_action_ids.pop(ledger_id)
self._cancel(action)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
plenum/common/ledger_manager.py
Outdated
|
||
if ledger_id in self.consistency_proof_action_ids: | ||
self._cancel(self.consistency_proof_action_ids[ledger_id]) | ||
self.consistency_proof_action_ids[ledger_id] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same as in the previous comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
plenum/common/ledger_manager.py
Outdated
@@ -291,6 +316,13 @@ def processLedgerStatus(self, status: LedgerStatus, frm: str): | |||
if self.isLedgerOld(ledgerStatus): | |||
ledger_status = self.owner.build_ledger_status(ledgerId) | |||
self.sendTo(ledger_status, frm) | |||
if ledgerId not in self.consistency_proof_action_ids or \ | |||
ledgerId in self.consistency_proof_action_ids and \ | |||
self.consistency_proof_action_ids[ledgerId] is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition will be simplified if _cancel_request_ledger_statuses_and_consistency_proofs
is corrected (see the comment above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
plenum/common/ledger_manager.py
Outdated
self._cancel(self.consistency_proof_action_ids[ledger_id]) | ||
self.consistency_proof_action_ids[ledger_id] = None | ||
|
||
def request_ledger_status_if_needed(self, ledger_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suffix _if_needed
in the method name seems to be redundant because cancellation of the scheduled action is not a responsibility of this method itself.
It would be better to name this method reask_for_ledger_status
.
Also at first this method should remove the triggered action from the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
plenum/common/ledger_manager.py
Outdated
nodes = [node for node in self.owner.nodeReg if node not in ledgerInfo.ledgerStatusOk] | ||
self.owner.request_ledger_status_from_nodes(ledger_id, nodes) | ||
|
||
def request_last_CP(self, ledgerId): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to name this method reask_for_last_CP
.
Also at first this method should remove the triggered action from the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to reask_for_last_consistency_proof(). Is it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Signed-off-by: toktar <renata.toktar@dsr-corporation.com>
Changes: