Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Local snapshot file format #25

Merged
merged 21 commits into from
Dec 15, 2021

Conversation

luca-moser
Copy link
Member

@luca-moser luca-moser commented Aug 26, 2020

Copy link
Contributor

@GalRogozinski GalRogozinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nodes will write the data straight to a file and bypass the DB, they they will miss out on all the automatic feature goodies DB brings ( such as integrity checks and rollbacks)...

Maybe it is a good idea to mention that it is suggested to save the data to a db and extract the information from it to create the file?


All types are serialized in little-endian and occur in the sequence of the rows defined below. Local snapshot files are compressed via zlib to further reduce size.

SEP = solid entry point. `Array[T]` are prefixed with a varint denoting the length.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a one liner explanation of what is an SEP?

</tr>
<tr>
<td>Version</td>
<td>byte</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version fields are specified as varints in all other RFCs.

luca-moser and others added 5 commits October 26, 2020 11:54
…format.md

Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>
…format.md

Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>
…format.md

Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>
…format.md

Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>
…format.md

Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>
Copy link
Contributor

@GalRogozinski GalRogozinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few points/questions besides the comments:

  • I am not sure I actually understand your ideas completely
    In my imagination there should be one static snapshot file that represents the global snapshot and another one that just has the diff from the global that shouldn't change. I understand that you want people to share both files?
  • I suppose the node will only generate the diff file upon some api calls, or every interval according to configuration? Because it probably will be better to usually update data in db that takes care of edge cases for data dumps?
  • Maybe it is time we start thinking about an authentication mechanism for shared snapshot files. Something that always bothered me is that people can just start spreading cooked snapshot files around. Now that we are redefining the snapshot file maybe it is a good idea to think of some solution! Maybe add a signature field to it? And the node can verify it against a configurable public key? (just the simplest idea in my mind)

* delete old transaction data below a given milestone.

Current node implementations use a [local snapshot file format](https://github.com/iotaledger/iri-ls-sa-merger/tree/351020d3b5e342b6e9a41f2868575ab7ff8c251c#generating-an-export-file-from-a-localsnapshots-db) which only works with account based ledgers. For Chrysalis Phase 2 this file format has to be assimilated to support a UTXO based ledger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add something about:
"Standardizing one single format across different node implementations"

I guess otherwise nodes could just share their dbs instead of files


### Formats

> All types are serialized in little-endian
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is non-numerical data encoded?

![](https://i.imgur.com/bt5BUpe.png)

A delta ledger state local snapshot is denoted by the type byte `1`:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though I agree a node should keep track of the diff at every milestone, should the snapshot file keep track of it?
I would just expect a diff from the base

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to include information at every milestone actually: especially given the fact that it will be a good thing to introduce MS data as well (proof of inclusion specifically).


# Drawbacks

Nodes need to support this new format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you had nothing to fill in :-) ?

Maybe:

  1. We have 2 different snapshot files which may be confusing
  2. The size of the file will grow as the number of outputs increase

@luca-moser
Copy link
Member Author

@GalRogozinski

It is up to the nodes from where they get the data from for the snapshot files but they will read it from the database mainly. Have a look at the implementation in Hornet if you're interested in an implementation.

I am not sure I actually understand your ideas completely
In my imagination there should be one static snapshot file that represents the global snapshot and another one that just has the diff from the global that shouldn't change.

That is what this RFC is describing, two files. One containing a full ledger and one only being deltas for after that state. A global snapshot and full snapshot are equivalent.

I understand that you want people to share both files?

You need to share both files if a node needs to bootstrap for the first time, as the delta files of course do not contain all the information needed for a node to bootstrap.

I suppose the node will only generate the diff file upon some api calls, or every interval according to configuration? Because it probably will be better to usually update data in db that takes care of edge cases for data dumps?

This is an implementation detail. Nodes will probably create these files up on request (trigger by an API call) and/or by defined interval. Please check out the Hornet code if you're interested in an implementation detail.

Maybe it is time we start thinking about an authentication mechanism for shared snapshot files. Something that always bothered me is that people can just start spreading cooked snapshot files around. Now that we are redefining the snapshot file maybe it is a good idea to think of some solution! Maybe add a signature field to it? And the node can verify it against a configurable public key? (just the simplest idea in my mind)

A node operator should download the snapshot files from a trusted source. I believe if you really want to ensure the data is correct, you'd have to expand the mechanism to compare multiple snapshot files from different sources.

Perhaps a tool where a user defines a list of nodes can request snapshot data from them and then compare them against each other would work.

In any case, faulty snapshots will cause the nodes to crash if the inclusion Merkle proof are off.

@GalRogozinski
Copy link
Contributor

Thanks for clarifying

What worries me about people that can create their own full snapshot files (rather than having a static global snapshot file) is that they can be mixed up with the wrong delta files. From what we had in the past with spent-addresses this is what I expect to happen.

In any case, faulty snapshots will cause the nodes to crash if the inclusion Merkle proof are off.

Hmm, I think you meant that this what will happen after snapshot is loaded and fresh milestones will come in, thus making the situation same as before. However, what if we add the entire milestone (it is 560 byte assuming it has 3 signatures) to the diff file?
So node syncing can validate the milestones and proof of inclusion to make sure the ledger state is indeed correct.
If I am not mistaken this will add ~1.76 GB to the file per year. In my opinion it is a small price to pay and it is much better than the other alternatives we mentioned here.

Copy link
Contributor

@charlesthompson3 charlesthompson3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here a few very minor edits, mostly for grammar and clarity; please let me know if you have any questions about them!

-Charles


Since a UTXO based ledger is much larger in size, this RFC proposes two formats for snapshot files:
* A `full` format which represents a complete ledger state.
* A `delta` format which only contains diffs (consumed and spent outputs) of milestones from a given milestone index onwards.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consumed and spent

So far the only output type that can be consumed is a TX, which implies spending those funds. I would remove the "consumed" term as it implies that are already outputs that can be consumed without spending: such as Coordicie's mana pledges.


This separation allows nodes to swiftly create new delta snapshot files, which then can be distributed with a companion full snapshot file to reconstruct a recent state.

Unlike the current format, these new formats do not include spent addresses since this information is no longer held by nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for removing them is the discontinuing of WOTS (if we finally come to an agreement in that regard).


![](https://i.imgur.com/e6WuufK.png)

While the node producing such a full ledger state snapshot could theoretically pre-compute the actual snapshot milestone state, this is deferred to the consumer of the data to speed up local snapshot creation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am starting to get a bit confused about the terminology: "actual snapshot milestone" == "seps_miletone_index"?

address_type<byte>
ed25519_address<array[32]>
value<uint64>
diffs<array[diffs_count]>:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no specific ordering of diffs you enforce here. You can have a diff for milestone X followed by a diff for milestone X+100. For integrity and validation sake it would be nicer to have an array[ledger_milestone_index - seps_milestone_index] of an array of diffs between milestone X and X+1, sorted by X, starting from seps_milestone_index and ending on ledger_milestone_index.


* Is all the information to startup a node from the local snapshot available with the described format?
* Can we get rid of the spent addresses or do we still need to keep them?
* Do we need to account for different types of outputs already? (we currently only have them deposit to addresses)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid so... #32 already introduces a new output type that is crucial for validation.


# Rationale and alternatives

* In conjunction with a companion full snapshot, a tool or node can "truncate" the data from a delta snapshot back to a single full snapshot. In that case, the `ledger_milestone_index` and `seps_milestone_index` would be the same. In the example above, given the full and delta snapshots, one could produce a new full snapshot for milestone 1350.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you mean that, after the truncation, the seps_milestone_index of the Delta will be == to the ledger_milestone_index of the Full. Correct?

@Wollac Wollac added the discussion needed RFC has pending protocol team discussions label Jan 6, 2021
@luca-moser
Copy link
Member Author

We need to change the snapshot file format to following:

  • Include the entire milestone per diff
  • The merkle root will be computed out of the output IDs instead of the message IDs
  • The outputs within the snapshot file have to be ordered in the same order as they were for the merkle root computation

We switch away from message IDs, so one can compute whether the outputs within a milestone diff actually correspond to the milestone for that cone.

@luca-moser luca-moser force-pushed the local-snapshot-file-format branch 2 times, most recently from 6b0169f to 2535b7f Compare January 18, 2021 07:45
consumed_outputs<array>:
message_hash<array[32]>
transaction_hash<array[32]>
output_index<uint16>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing the target TransactionId.

@lzpap lzpap merged commit 500c750 into iotaledger:main Dec 15, 2021
@lzpap
Copy link
Member

lzpap commented Dec 15, 2021

merged due to TIP refactor

lzpap added a commit that referenced this pull request Jul 13, 2022
* add tip-23

* Update tip-0023.md

* Update tip-0023.md

* Update dynamic bytearray defs (#25)

* Update with "Message" to "Block" renaming to align with IOTA 2.0 terminology

Co-authored-by: Levente Pap <levente.pap@iota.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion needed RFC has pending protocol team discussions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants