-
Notifications
You must be signed in to change notification settings - Fork 969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potentially incorrect quorum intersection check (Version: 11.4.0) #2267
Comments
Additionally, the initial implementation, from which the stellar implementation was derived from, also uses the topological order and not the component that contains most elements. |
@andrenarchy and @liamsi thanks for the catch! This looks easily fixable, though I'm a bit curious -- you refer to the greatest element but the Lachowski paper refers to some greatest element, and indeed reachability is a partial order, there's no reason to assume there would be a single such element. So I think it suffices to just pick any element and then repeatedly follow the reachability relation to get to some greatest element; does that follow your understanding? Further: given that the algorithm immediately does an early return if there's any quorum in any other SCC at all, and given that almost all the graphs we see in practice have reachability pointing from any smaller SCCs to the largest (by size) SCC -- i.e. the topological and size orders of the SCCs happen to coincide in practice -- it seem to me that this is likely not causing any mis-classifications in the field today. Of course I'm happy to fix it (am doing so presently) but I just want to make sure I understand the severity of the bug. Does your understanding of the bug correspond to what I've said here? Thanks again! |
So yes, I'm sure @andrenarchy also meant some greatest element (and not the greatest). Maybe the topological and size orders of the SCCs happen to coincide in practice; but we were wondering if one couldn't one construct a massive (element-counting-wise) component B which does not have quorums completely contained in B (e.g. by making all nodes in B pointing to a node in another smaller component A) and make the checker fail although the rest of the network lives in the element-wise smaller component A and there is overall quorum intersection. Not sure if this could be done in the actual network (flood the network with these B nodes) realistically. As far as I understand this bug isn't severe anyways: the checker currently only emits a warning even if it finds that the network does not enjoy quorum intersection. Thank you, too! |
Yes I'm certainly interested in doing the right thing here, as you say, one could construct a malicious case that makes it fail (I will try to put such a testcase in my fix!), just double checking the likely practical severity on benign inputs currently. |
@liamsi and @andrenarchy I've opened PR #2278 here with a fix, would love an extra pair of eyes on it if you have a moment to review. Thanks again for catching this! |
You don't need to compute the reachability relation between SCCs. Simply iterate all SCCs and if more than one contains a quorum, then fail. Otherwise there is at most one SCC containing all minimal quorums (if you don't check if quorum slices point to existing nodes, then there might be no quorum at all) and it is the greatest element. some was used to simplify the definition of the algorithm, otherwise there should be additional check verifying that there is the greatest element. |
Thanks for the clarification @fixxxedpoint! I think the main point raised in this issue is that the current implementation always chose the component with the most elements (for the main iteration) which isn't necessarily the greatest element according to the topological order. Sure, if there is quorum intersection there is only one last / greatest / maximal element according to the reachability relation (otherwise there is a split). @graydon Thank you! I'll have a look at the PR later this week. |
I should actually leave my previous comment in that PR. I think that you are right here - SCC of maximal size is not necessary the greatest element of the topological order of SCCs. I will try to clarify it in the article. Simple example is like you mentioned - huge clique where additionally each member points at some different SCC, but nobody outside of that clique points it. If you further process it with the algorithm, it will immediately exit, since it includes no quorum, but another SCC might include disjoint quorums. |
@fixxxedpoint just to make sure I follow you here, why do you think that:
? I understand that sorting by reachability is irrelevant to the question of whether there are two or more SCCs with quorums; but I don't understand why "only one SCC has quorums" implies "that SCC is a greatest element in a reachability order". I also don't quite understand why it matters in the first place that we're looking at a greatest element in reachability order. Any further insight appreciated! I'm happy to rearrange the code to be as simple as possible, just want to make sure we're covering all the important aspects of the problem. |
(meanwhile opened #2279 with a simpler fix that just picks the first SCC it finds with a quorum -- failing if there's more than one) |
In the definition of the algorithm it should be "some maximal element", not some greatest element. Thank you for pointing it. It was used for the sake of simplicity. I think I will change it to something like: check if there is only one SCC containing some quorum; top <- SCC containing some quorum... Regarding your other question (informal proof): |
@fixxxedpoint hmm I think I'm a bit lost in this proof: you say that as a premise |
It's not a quorum found in |
While implementing the quorum intersection check in Python in stellar-observatory (as part of an SDF-supported academic research project) with @liamsi we noticed a potential issue in the corresponding Stellar Core implementation. It seems to be implemented incorrectly in the part where strongly connected components of the induced dependency/reachability graph are handled:
This means that the algorithm would be looking at the wrong strongly connected component if there is a component that is bigger than the top component in the topological order, thus giving wrong results.
The text was updated successfully, but these errors were encountered: