Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node in lagging part of a split may get unexpected untrusted message #1810

jeanphilippeD opened this issue Sep 24, 2019 · 0 comments · Fixed by #1919


Copy link

@jeanphilippeD jeanphilippeD commented Sep 24, 2019

The design of AckMessage requires that nodes are always able to validate trust on reception to avoid having to cache them.

Soak testing showed a very rare failure (30'000 churn tests).
Seed reproduced against fleming at 4310c9c

Summary of investigation in soak test failure:
Reveal a bug in our AckMessage/check trust mechanism when split occurs.
Should be relatively straight forward to mitigate.

Actions to fix it

  • Ensure no message is untrusted: ?Send the proof needed by both side of a split when we detect a split, until we get consensus of AckMessage from both side of the split?
  • Ensure If we receive an AckMessage before we split and that is for the other side of the split, we do not process it (?add target prefix in AckMessage?).
  • Add regression test: better test case that would trigger that bug much much faster.

Detail of investigation, extract from log attached.

What happens 10 split, 100(v24), 101 is not quite there yet.
100 send an AckMessage for 001 at version (001)v24(10:29:01.820723063), and 001 get consensus on it, so nodes that see that consensus will only send proof from (001)v24 and above for message to 100.
001 send an AckMessage for 100 at version (100)v25(10:29:01.820786819).

Elder(a6f083..(10)) receives it. It still has not seen the split, and thinks it is the authority for it, and realize it does not trust it (only trust to (001)v23).
=> This is a break of contract, nodes should be able to assume that all messages send by valid nodes will reach them with needed trust.
==> This could be addressed by ensuring both side of a new split Acked our version before pruning proof we send to either side.
==> (Pb: Could stop prunning for a while if one side of split does not communicate, not an issue, happen rarely and only result in some additional proof for a while).
=> How will that message be processed by the Elder(a6f083..(10)) if it had enough proof and so trust it as it should.
==> Could that message be confused, should it be disabiguated with the destination prefix
==> Probably: negligable cost since message happen rarely (on churn), so saving a few bytes for the destination prefix is not necessary.


@jeanphilippeD jeanphilippeD added this to To Do in BLS Sep 24, 2019
BLS automation moved this from To Do to Done Nov 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
1 participant
You can’t perform that action at this time.