Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potentially incorrect quorum intersection check (Version: 11.4.0) #2267

Closed
andrenarchy opened this issue Sep 7, 2019 · 13 comments · Fixed by #2279
Closed

Potentially incorrect quorum intersection check (Version: 11.4.0) #2267

andrenarchy opened this issue Sep 7, 2019 · 13 comments · Fixed by #2279
Assignees
Labels

Comments

@andrenarchy
Copy link

andrenarchy commented Sep 7, 2019

While implementing the quorum intersection check in Python in stellar-observatory (as part of an SDF-supported academic research project) with @liamsi we noticed a potential issue in the corresponding Stellar Core implementation. It seems to be implemented incorrectly in the part where strongly connected components of the induced dependency/reachability graph are handled:

  • In the paper by Lachowski the "greatest element" or "top" has a special meaning: it is meant with respect to the topological (reachability) order, i.e., the "top" component element cannot "reach" other components. See also the discussion on PaperHive.
  • In Core it seems to use the biggest component (i.e., by the number of elements), see here.

This means that the algorithm would be looking at the wrong strongly connected component if there is a component that is bigger than the top component in the topological order, thus giving wrong results.

@andrenarchy andrenarchy added the bug label Sep 7, 2019
@liamsi
Copy link

liamsi commented Sep 8, 2019

Additionally, the initial implementation, from which the stellar implementation was derived from, also uses the topological order and not the component that contains most elements.

@graydon graydon self-assigned this Sep 9, 2019
@graydon
Copy link
Contributor

graydon commented Sep 13, 2019

@andrenarchy and @liamsi thanks for the catch! This looks easily fixable, though I'm a bit curious -- you refer to the greatest element but the Lachowski paper refers to some greatest element, and indeed reachability is a partial order, there's no reason to assume there would be a single such element. So I think it suffices to just pick any element and then repeatedly follow the reachability relation to get to some greatest element; does that follow your understanding?

Further: given that the algorithm immediately does an early return if there's any quorum in any other SCC at all, and given that almost all the graphs we see in practice have reachability pointing from any smaller SCCs to the largest (by size) SCC -- i.e. the topological and size orders of the SCCs happen to coincide in practice -- it seem to me that this is likely not causing any mis-classifications in the field today.

Of course I'm happy to fix it (am doing so presently) but I just want to make sure I understand the severity of the bug. Does your understanding of the bug correspond to what I've said here?

Thanks again!

@liamsi
Copy link

liamsi commented Sep 13, 2019

So yes, I'm sure @andrenarchy also meant some greatest element (and not the greatest).

Maybe the topological and size orders of the SCCs happen to coincide in practice; but we were wondering if one couldn't one construct a massive (element-counting-wise) component B which does not have quorums completely contained in B (e.g. by making all nodes in B pointing to a node in another smaller component A) and make the checker fail although the rest of the network lives in the element-wise smaller component A and there is overall quorum intersection. Not sure if this could be done in the actual network (flood the network with these B nodes) realistically.

As far as I understand this bug isn't severe anyways: the checker currently only emits a warning even if it finds that the network does not enjoy quorum intersection.

Thank you, too!

@graydon
Copy link
Contributor

graydon commented Sep 14, 2019

Yes I'm certainly interested in doing the right thing here, as you say, one could construct a malicious case that makes it fail (I will try to put such a testcase in my fix!), just double checking the likely practical severity on benign inputs currently.

@graydon
Copy link
Contributor

graydon commented Sep 18, 2019

@liamsi and @andrenarchy I've opened PR #2278 here with a fix, would love an extra pair of eyes on it if you have a moment to review. Thanks again for catching this!

@fixxxedpoint
Copy link

fixxxedpoint commented Sep 18, 2019

@andrenarchy and @liamsi thanks for the catch! This looks easily fixable, though I'm a bit curious -- you refer to the greatest element but the Lachowski paper refers to some greatest element, and indeed reachability is a partial order, there's no reason to assume there would be a single such element. So I think it suffices to just pick any element and then repeatedly follow the reachability relation to get to some greatest element; does that follow your understanding?

Further: given that the algorithm immediately does an early return if there's any quorum in any other SCC at all, and given that almost all the graphs we see in practice have reachability pointing from any smaller SCCs to the largest (by size) SCC -- i.e. the topological and size orders of the SCCs happen to coincide in practice -- it seem to me that this is likely not causing any mis-classifications in the field today.

Of course I'm happy to fix it (am doing so presently) but I just want to make sure I understand the severity of the bug. Does your understanding of the bug correspond to what I've said here?

Thanks again!

You don't need to compute the reachability relation between SCCs. Simply iterate all SCCs and if more than one contains a quorum, then fail. Otherwise there is at most one SCC containing all minimal quorums (if you don't check if quorum slices point to existing nodes, then there might be no quorum at all) and it is the greatest element. some was used to simplify the definition of the algorithm, otherwise there should be additional check verifying that there is the greatest element.

@liamsi
Copy link

liamsi commented Sep 18, 2019

Thanks for the clarification @fixxxedpoint!

I think the main point raised in this issue is that the current implementation always chose the component with the most elements (for the main iteration) which isn't necessarily the greatest element according to the topological order. Sure, if there is quorum intersection there is only one last / greatest / maximal element according to the reachability relation (otherwise there is a split).

@graydon Thank you! I'll have a look at the PR later this week.

@fixxxedpoint
Copy link

fixxxedpoint commented Sep 18, 2019

Thanks for the clarification @fixxxedpoint!

I think the main point raised in this issue is that the current implementation always chose the component with the most elements (for the main iteration) which isn't necessarily the greatest element according to the topological order. Sure, if there is quorum intersection there is only one last / greatest / maximal element according to the reachability relation (otherwise there is a split).

@graydon Thank you! I'll have a look at the PR later this week.

I should actually leave my previous comment in that PR. I think that you are right here - SCC of maximal size is not necessary the greatest element of the topological order of SCCs. I will try to clarify it in the article. Simple example is like you mentioned - huge clique where additionally each member points at some different SCC, but nobody outside of that clique points it. If you further process it with the algorithm, it will immediately exit, since it includes no quorum, but another SCC might include disjoint quorums.

@graydon
Copy link
Contributor

graydon commented Sep 18, 2019

@fixxxedpoint just to make sure I follow you here, why do you think that:

Otherwise there is at most one SCC containing all minimal quorums (if you don't check if quorum slices point to existing nodes, then there might be no quorum at all) and it is the greatest element

? I understand that sorting by reachability is irrelevant to the question of whether there are two or more SCCs with quorums; but I don't understand why "only one SCC has quorums" implies "that SCC is a greatest element in a reachability order".

I also don't quite understand why it matters in the first place that we're looking at a greatest element in reachability order. Any further insight appreciated! I'm happy to rearrange the code to be as simple as possible, just want to make sure we're covering all the important aspects of the problem.

@graydon
Copy link
Contributor

graydon commented Sep 18, 2019

(meanwhile opened #2279 with a simpler fix that just picks the first SCC it finds with a quorum -- failing if there's more than one)

@fixxxedpoint
Copy link

fixxxedpoint commented Sep 18, 2019

@fixxxedpoint just to make sure I follow you here, why do you think that:

Otherwise there is at most one SCC containing all minimal quorums (if you don't check if quorum slices point to existing nodes, then there might be no quorum at all) and it is the greatest element

? I understand that sorting by reachability is irrelevant to the question of whether there are two or more SCCs with quorums; but I don't understand why "only one SCC has quorums" implies "that SCC is a greatest element in a reachability order".

I also don't quite understand why it matters in the first place that we're looking at a greatest element in reachability order. Any further insight appreciated! I'm happy to rearrange the code to be as simple as possible, just want to make sure we're covering all the important aspects of the problem.

In the definition of the algorithm it should be "some maximal element", not some greatest element. Thank you for pointing it. It was used for the sake of simplicity. I think I will change it to something like: check if there is only one SCC containing some quorum; top <- SCC containing some quorum...

Regarding your other question (informal proof):
Assume that configuration is "sane", that is every node is included in some quorum, and that the only SCC (lets call it A) containing a quorum is not the greatest element of the topological order of SCCs. We have now two cases to consider: 1) there is some other SCC B and A and B are not comparable; 2) there is some other SCC B and we have A < B. In the former case, it is easy to notice that for any element of B we can find a quorum for it that doesn't intersect with A - take some quorum for some element of B (configuration is "sane") and remove from it all nodes that are not reachable from that selected element. Moreover, it also means that there is some strongly connected component C != A containing some minimal quorum included in that quorum and so we should find it in the first part of our algorithm. In the latter case, we can similarly prove that for any element of B some of its minimal quorums doesn't intersect with A and so we would find an SCC containing it in the first part of our algorithm. So the only SCC containing some quorum must be the greatest element of the reachability relation.

@graydon
Copy link
Contributor

graydon commented Sep 18, 2019

@fixxxedpoint hmm I think I'm a bit lost in this proof: you say that as a premise A is the only SCC containing a quorum. How can the remaining arguments -- concerning quorums found in B -- follow? There are no quorums in B.

@fixxxedpoint
Copy link

fixxxedpoint commented Sep 18, 2019

@fixxxedpoint hmm I think I'm a bit lost in this proof: you say that as a premise A is the only SCC containing a quorum. How can the remaining arguments -- concerning quorums found in B -- follow? There are no quorums in B.

It's not a quorum found in B. It is some quorum found for some element of B, which is not necessarily included in B (it might even intersect with A before we remove "unnecessary" elements). Then we prove there is some quorum that doesn't intersect with A. Like every quorum, it contains some minimal quorum. Such minimal quorum might not even include that element of B from which we started. Next, we prove that minimal quorum must be included in some SCC that is != A (and it might be different from B as well).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants