merkle_tree of checkpoints for datachain #174

Merged
merged 1 commit into from Sep 2, 2016

Projects

None yet

3 participants

@maqi
Member
maqi commented Aug 31, 2016 edited

This change is Reviewable

@afck
Member
afck commented Aug 31, 2016

Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion.


text/0029-data-chains.md/0029-data-chains.md, line 334 [r1] (raw file):

3. Once all hashes from all groups received, each group calculates the merkle-tree, then broadcast
the root hash to the network

But we don't need the hashes from all groups to compute the Merkle tree's root hash! That would be an amount of data growing linearly with the network size, where O(log(n)) should be achievable:

E.g. nodes in group 000 would:

  • exchange their latest link's hash with 001 and vice-versa, and compute the hash of those two hashes - called hash 00 -,
  • exchange hash 00 with their bucket-1- contacts (probably 010), who share their 01-hash (probably computed from 010 and 011, but we don't need to know that) with us, so we can both compute the 0-hash,
  • exchange the 0-hash with our bucket-0-contacts and get the 1-hash and compute the current network's root hash.
    Then every node in the network has the root hash and all the partial hashes that lead to their latest link. So every node has an O(log n)-proof for its identity that every other node in the network can verify.

Specifically, if I (a member of 000) want to prove my identity to any other node in the network that has the root hash, I just need to provide the 0- and 1-hash, so it can verify that the root is the hash of those two. Then I provide the 00- and 01-hashes, so it can verify that the 0-hash is the hash of those two. Then I provide the 001- and 000-hashes, and finally I provide my link block (i.e. the list of my group members), so the other node can verify that the 000-hash is actually the hash of that group.


Comments from Reviewable

@afck
Member
afck commented Sep 2, 2016

Review status: 0 of 1 files reviewed at latest revision, 4 unresolved discussions.


text/0029-data-chains.md/0029-data-chains.md, line 332 [r2] (raw file):

A datachain of the group `locks` the node and data status of that group along the timeline. But this
still leave the risk that a whole forged blockchain cannot be detected by the network. A merkle-tree

"blockchain" -> "data chain"?


text/0029-data-chains.md/0029-data-chains.md, line 336 [r2] (raw file):

such forged chain easily. Once [DisjointGroup] is deployed, such merkle-tree can be computed as:

1. Every fixed interval (say one hour), each group computes a hash of all the checkpoints generated

I don't think it even needs to take all the checkpoints. Wouldn't the latest link block suffice?


text/0029-data-chains.md/0029-data-chains.md, line 347 [r2] (raw file):

4. To verify the computed merkle-tree is correct, the root hash needs to be published to the network
(or just contact the furthest group knows, which has the highest chance of mismatch, to reduce the
messages transferred).

So this is just a check that nothing got out of sync, is it? Because every group should already have computed the same root hash by itself?


text/0029-data-chains.md/0029-data-chains.md, line 349 [r2] (raw file):

messages transferred).

Such merkle-trees (snapshots of network) prove the validatity of datachains when restoring data

"validity"


Comments from Reviewable

@maqi
Member
maqi commented Sep 2, 2016

Review status: 0 of 1 files reviewed at latest revision, 4 unresolved discussions.


text/0029-data-chains.md/0029-data-chains.md, line 336 [r2] (raw file):

Previously, afck (Andreas Fackler) wrote…

I don't think it even needs to take all the checkpoints. Wouldn't the latest link block suffice?

not sure here. as I understand, link covers some checkpoints. these may cause two issues: 1, there might be multiple links during an interval, so only `latest` might not be enough. 2, the link might be too long, which causing a wider range of data lost in case of a hash mismatch

any way, we can easily define anything to be locked by the merkle tree later on, so I will prefer just staying with the current wording.


text/0029-data-chains.md/0029-data-chains.md, line 347 [r2] (raw file):

Previously, afck (Andreas Fackler) wrote…

So this is just a check that nothing got out of sync, is it? Because every group should already have computed the same root hash by itself?

yes.

Comments from Reviewable

@afck
Member
afck commented Sep 2, 2016

Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion.


text/0029-data-chains.md/0029-data-chains.md, line 336 [r2] (raw file):

Previously, maqi wrote…

not sure here. as I understand, link covers some checkpoints. these may cause two issues:
1, there might be multiple links during an interval, so only latest might not be enough.
2, the link might be too long, which causing a wider range of data lost in case of a hash mismatch

any way, we can easily define anything to be locked by the merkle tree later on, so I will prefer just staying with the current wording.

OK. Not sure I understand what exactly "checkpoint" means in this context.

Comments from Reviewable

@dirvine
Member
dirvine commented Sep 2, 2016

Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion.


text/0029-data-chains.md/0029-data-chains.md, line 336 [r3] (raw file):

such forged chain easily. Once [DisjointGroup] is deployed, such merkle-tree can be computed as:

1. Every fixed interval (say one hour), each group computes a hash of all the checkpoints generated

I would add this interval should be event based and not time based. Then we should be good to merge this further work section. IF we decided to go this route this would be a separate RFC so maybe worth mentioning that as well?


Comments from Reviewable

@maqi
Member
maqi commented Sep 2, 2016

Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion.


text/0029-data-chains.md/0029-data-chains.md, line 336 [r2] (raw file):

Previously, afck (Andreas Fackler) wrote…

OK. Not sure I understand what exactly "checkpoint" means in this context.

to me, this checkpoint means a locker needs to be locked not only along the time (by the data chain) but also cross locked with other chain. It can be link, data or anything (combined or just one).

text/0029-data-chains.md/0029-data-chains.md, line 336 [r3] (raw file):

Previously, dirvine (David Irvine) wrote…

I would add this interval should be event based and not time based. Then we should be good to merge this further work section. IF we decided to go this route this would be a separate RFC so maybe worth mentioning that as well?

the problem of event based is a synchronization among data chains. as each group having there own event sequence. Also, when trying to validate a checkpoint, event based requires an explicit merkel tree snapshot index to be recorded for each checkpoint, meanwhile the interval based can be deduced.

Comments from Reviewable

@dirvine
Member
dirvine commented Sep 2, 2016

Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion.


text/0029-data-chains.md/0029-data-chains.md, line 336 [r3] (raw file):

Previously, maqi wrote…

the problem of event based is a synchronization among data chains. as each group having there own event sequence.
Also, when trying to validate a checkpoint, event based requires an explicit merkel tree snapshot index to be recorded for each checkpoint, meanwhile the interval based can be deduced.

Yes this is why we would present a full RFC for this further work IMHO, time is always simple but IMO rarely correct in event driven systems. Checkpoints should not change frequently unless we had split/merge frequency that was high. To me that would be a design error.

Comments from Reviewable

@maqi
Member
maqi commented Sep 2, 2016

Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion.


text/0029-data-chains.md/0029-data-chains.md, line 336 [r3] (raw file):

Previously, dirvine (David Irvine) wrote…

Yes this is why we would present a full RFC for this further work IMHO, time is always simple but IMO rarely correct in event driven systems. Checkpoints should not change frequently unless we had split/merge frequency that was high. To me that would be a design error.

I agree a full RFC shall be raised to detailed this further work. However, I think here a vague word of `interval` shall be OK to present the concept. `time interval` is just used as a simple example, which I think shall be fine here?

Comments from Reviewable

@dirvine
Member
dirvine commented Sep 2, 2016

Review status: 0 of 1 files reviewed at latest revision, 1 unresolved discussion.


text/0029-data-chains.md/0029-data-chains.md, line 336 [r3] (raw file):

Previously, maqi wrote…

I agree a full RFC shall be raised to detailed this further work. However, I think here a vague word of interval shall be OK to present the concept. time interval is just used as a simple example, which I think shall be fine here?

Time further specifies the interval type and right now I don't think we should though.

Comments from Reviewable

@dirvine
Member
dirvine commented Sep 2, 2016

Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions.


text/0029-data-chains.md/0029-data-chains.md, line 334 [r1] (raw file):

Previously, afck (Andreas Fackler) wrote…

But we don't need the hashes from all groups to compute the Merkle tree's root hash! That would be an amount of data growing linearly with the network size, where O(log(n)) should be achievable:

E.g. nodes in group 000 would:

  • exchange their latest link's hash with 001 and vice-versa, and compute the hash of those two hashes - called hash 00 -,
  • exchange hash 00 with their bucket-1- contacts (probably 010), who share their 01-hash (probably computed from 010 and 011, but we don't need to know that) with us, so we can both compute the 0-hash,
  • exchange the 0-hash with our bucket-0-contacts and get the 1-hash and compute the current network's root hash.
    Then every node in the network has the root hash and all the partial hashes that lead to their latest link. So every node has an O(log n)-proof for its identity that every other node in the network can verify.

Specifically, if I (a member of 000) want to prove my identity to any other node in the network that has the root hash, I just need to provide the 0- and 1-hash, so it can verify that the root is the hash of those two. Then I provide the 00- and 01-hashes, so it can verify that the 0-hash is the hash of those two. Then I provide the 001- and 000-hashes, and finally I provide my link block (i.e. the list of my group members), so the other node can verify that the 000-hash is actually the hash of that group.

@maqi ^^

Comments from Reviewable

@maqi
Member
maqi commented Sep 2, 2016

Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions.


text/0029-data-chains.md/0029-data-chains.md, line 334 [r1] (raw file):

Previously, dirvine (David Irvine) wrote…

@maqi ^^

yes, the doc has been updated according to this comment.

Comments from Reviewable

@dirvine
Member
dirvine commented Sep 2, 2016

Reviewed 1 of 1 files at r3.
Review status: 0 of 1 files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@dirvine
Member
dirvine commented Sep 2, 2016

Reviewed 1 of 1 files at r4.
Review status: all files reviewed at latest revision, all discussions resolved.


Comments from Reviewable

@dirvine dirvine merged commit 9288be0 into maidsafe:master Sep 2, 2016

1 check passed

code-review/reviewable 1 file reviewed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment