New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve transitive quorum set logic, persist it into the database #2045
Improve transitive quorum set logic, persist it into the database #2045
Conversation
|
||
auto vv = makeValue(1); | ||
|
||
auto checkInQuorum = [&](bool checkSelf, std::set<int> ids) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkSelf is always true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good catch, fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handful of clarification comments and a couple perf/correctness bits. Overall looks very good though! I'm happy to be killing off the existing calculation of this! It's a serious bottleneck in the SCP-receive path.
src/herder/PendingEnvelopes.cpp
Outdated
res = (r != SCP::TB_FALSE); | ||
|
||
mNodesInQuorum.put(node, res); | ||
mQuorumTracker.rebuild([&](NodeID const& id) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice for readability to put a return type on this lambda and a comment on the call to rebuild summarizing the logic in the lambda (I can read it off but its purpose and correctness takes a few seconds).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
src/herder/QuorumTracker.h
Outdated
class QuorumTracker : public NonMovableOrCopyable | ||
{ | ||
SCP& mSCP; | ||
std::unordered_map<NodeID, SCPQuorumSetPtr> mQuorum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment here: "mQuorum encodes our best current knowledge about the transitive closure of the local node's quorum set (qset). The keys of mQuorum are the nodes that we think are in that transitive closure, and the associated values are either that node's own qset or else nullptr if we do not know / have not heard the definition of that qset yet. The map is incrementally expanded as we hear new messages containing references to new qsets. If a node ever changes its qset -- that is, if we hear a message from a node referencing a qset that is different from the qset for that node stored in mQuorum -- we flush and rebuild mQuorum."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
bool isNodeInQuorum(NodeID const& id); | ||
|
||
// attempts to expand quorum at node `id` | ||
// returns true if expansion succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define success and failure of expansion here, and obligation to call rebuild when expansion fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
} | ||
|
||
bool | ||
QuorumTracker::expand(NodeID const& id, SCPQuorumSetPtr qSet) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider flushing cache in this method also? failure to expand is a "this data structure is invalid and needs to be rebuilt" condition, AIUI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am letting this up to the caller (added clarification to the contract of expand)
*qSet, [&](NodeID const& id) { backlog.insert(id); }); | ||
} | ||
mQuorum[n] = qSet; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a third condition you're not handling here, where it != mQuorum.end(), and it->second != nullptr. Implicitly I get that that's a no-op case if it->second == qSet (as I think it generally ought to be here?), but if it's not equal to qSet, you've got a problem. Either a logic error in lookup or maybe a data inconsistency somewhere? Suggest asserting in an else here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to return false
in this case (which is already the case):
at this layer, it's possible that we would go from "this node has a quorum X" but we're told that from now on "node has quorum Y" (Y !=X and Y null or not, doesn't matter), I already deal with the case where X == Y explicitely.
Resolution for caller is to call rebuild
when this happens
src/herder/PendingEnvelopes.h
Outdated
@@ -139,6 +139,8 @@ class PendingEnvelopes | |||
// returns true if we think that the node is in quorum | |||
bool isNodeInQuorum(NodeID const& node); | |||
|
|||
std::unordered_map<NodeID, SCPQuorumSetPtr> const& getCurrentQuorum() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As with comment on QuorumTracker's mQuorum member, should comment this map to explain its contents (what the keys mean, the presence of nullptr values in it and so forth). Or make a utility type in QuorumTracker that encapsulates this representation. Or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
std::string qSetHHex(binToHex(qSetH)); | ||
|
||
auto prep = db.getPreparedStatement( | ||
"UPDATE quoruminfo SET qsethash = :h WHERE nodeid = :id"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're on a version of sql that supports upserts (INSERT ... ON CONFLICT SET ...) now, if you want to simplify this a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too worried about this one yet, and sqlite doesn't support it
} | ||
|
||
SCPQuorumSetPtr | ||
HerderPersistence::getQuorumSet(Database& db, soci::session& sess, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a separate-commits-compile-on-their-own perspective, I think you are calling this function in the commit before this one that defines it. I don't care too much but if you want to clean up / rebase-edit, might want to fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
std::vector<uint8_t> qSetBytes; | ||
decoder::decode_b64(qset64, qSetBytes); | ||
|
||
xdr::xdr_get g1(&qSetBytes.front(), &qSetBytes.back() + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought xdr_{to,from}_opaque should handle this a little less-clunkily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this was from the old code
fixed
if (res == nullptr) | ||
{ | ||
// see if we had some information for that node | ||
auto& db = mApp.getDatabase(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to me like it risks doing a SQL query on every SCP message (including those from strangers), even when they're returning null. I think we probably need a little more caching of the results here, both positive and negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no it won't:
rebuild is only happening when we update quorum information which only happens when we actually process an SCP message, and we only process SCP messages for nodes in the known transitive quorum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so only known people -- but is it still potentially an awful lot of messages?
142616b
to
66962b0
Compare
66962b0
to
1dba743
Compare
r+ 1dba743ada250e0fc7205ce4c64a56663fb6bd1b |
1dba743
to
24f14e1
Compare
had to fix some tests that have some hacky setup: worked around it for now by adding a safeguard when we don't know the local node's quorum set |
r+ 24f14e1 |
24f14e1
to
55d9b18
Compare
missed a few more tests that had bogus setup |
2 similar comments
missed a few more tests that had bogus setup |
missed a few more tests that had bogus setup |
r+ 55d9b18 |
Improve transitive quorum set logic, persist it into the database Reviewed-by: MonsieurNicolas
Description
This PR makes a couple changes to how we deal with the transitive quorum: