RFC: Local snapshot file format #25

luca-moser · 2020-08-26T09:51:02Z

GalRogozinski

If nodes will write the data straight to a file and bypass the DB, they they will miss out on all the automatic feature goodies DB brings ( such as integrity checks and rollbacks)...

Maybe it is a good idea to mention that it is suggested to save the data to a db and extract the information from it to create the file?

GalRogozinski · 2020-08-26T15:22:48Z

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

+
+All types are serialized in little-endian and occur in the sequence of the rows defined below. Local snapshot files are compressed via zlib to further reduce size.
+
+SEP = solid entry point. `Array[T]` are prefixed with a varint denoting the length.


maybe add a one liner explanation of what is an SEP?

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

karimodm · 2020-08-27T13:18:58Z

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

+    </tr>
+    <tr>
+        <td>Version</td>
+        <td>byte</td>


Version fields are specified as varints in all other RFCs.

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

…format.md Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>

GalRogozinski

Few points/questions besides the comments:

I am not sure I actually understand your ideas completely
In my imagination there should be one static snapshot file that represents the global snapshot and another one that just has the diff from the global that shouldn't change. I understand that you want people to share both files?
I suppose the node will only generate the diff file upon some api calls, or every interval according to configuration? Because it probably will be better to usually update data in db that takes care of edge cases for data dumps?
Maybe it is time we start thinking about an authentication mechanism for shared snapshot files. Something that always bothered me is that people can just start spreading cooked snapshot files around. Now that we are redefining the snapshot file maybe it is a good idea to think of some solution! Maybe add a signature field to it? And the node can verify it against a configurable public key? (just the simplest idea in my mind)

GalRogozinski · 2020-11-01T15:48:59Z

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

+* delete old transaction data below a given milestone.
+
+Current node implementations use a [local snapshot file format](https://github.com/iotaledger/iri-ls-sa-merger/tree/351020d3b5e342b6e9a41f2868575ab7ff8c251c#generating-an-export-file-from-a-localsnapshots-db) which only works with account based ledgers. For Chrysalis Phase 2 this file format has to be assimilated to support a UTXO based ledger.
+


Maybe add something about:
"Standardizing one single format across different node implementations"

I guess otherwise nodes could just share their dbs instead of files

GalRogozinski · 2020-11-01T15:50:50Z

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

+
+### Formats
+
+> All types are serialized in little-endian


how is non-numerical data encoded?

GalRogozinski · 2020-11-01T16:08:01Z

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

+![](https://i.imgur.com/bt5BUpe.png)
+
+A delta ledger state local snapshot is denoted by the type byte `1`:
+


Even though I agree a node should keep track of the diff at every milestone, should the snapshot file keep track of it?
I would just expect a diff from the base

I think it makes sense to include information at every milestone actually: especially given the fact that it will be a good thing to introduce MS data as well (proof of inclusion specifically).

GalRogozinski · 2020-11-01T16:09:30Z

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

+
+# Drawbacks
+
+Nodes need to support this new format.


I guess you had nothing to fill in :-) ?

Maybe:

We have 2 different snapshot files which may be confusing

The size of the file will grow as the number of outputs increase

luca-moser · 2020-11-05T10:43:32Z

@GalRogozinski

It is up to the nodes from where they get the data from for the snapshot files but they will read it from the database mainly. Have a look at the implementation in Hornet if you're interested in an implementation.

I am not sure I actually understand your ideas completely
In my imagination there should be one static snapshot file that represents the global snapshot and another one that just has the diff from the global that shouldn't change.

That is what this RFC is describing, two files. One containing a full ledger and one only being deltas for after that state. A global snapshot and full snapshot are equivalent.

I understand that you want people to share both files?

You need to share both files if a node needs to bootstrap for the first time, as the delta files of course do not contain all the information needed for a node to bootstrap.

I suppose the node will only generate the diff file upon some api calls, or every interval according to configuration? Because it probably will be better to usually update data in db that takes care of edge cases for data dumps?

This is an implementation detail. Nodes will probably create these files up on request (trigger by an API call) and/or by defined interval. Please check out the Hornet code if you're interested in an implementation detail.

Maybe it is time we start thinking about an authentication mechanism for shared snapshot files. Something that always bothered me is that people can just start spreading cooked snapshot files around. Now that we are redefining the snapshot file maybe it is a good idea to think of some solution! Maybe add a signature field to it? And the node can verify it against a configurable public key? (just the simplest idea in my mind)

A node operator should download the snapshot files from a trusted source. I believe if you really want to ensure the data is correct, you'd have to expand the mechanism to compare multiple snapshot files from different sources.

Perhaps a tool where a user defines a list of nodes can request snapshot data from them and then compare them against each other would work.

In any case, faulty snapshots will cause the nodes to crash if the inclusion Merkle proof are off.

GalRogozinski · 2020-11-09T09:28:37Z

Thanks for clarifying

What worries me about people that can create their own full snapshot files (rather than having a static global snapshot file) is that they can be mixed up with the wrong delta files. From what we had in the past with spent-addresses this is what I expect to happen.

In any case, faulty snapshots will cause the nodes to crash if the inclusion Merkle proof are off.

Hmm, I think you meant that this what will happen after snapshot is loaded and fresh milestones will come in, thus making the situation same as before. However, what if we add the entire milestone (it is 560 byte assuming it has 3 signatures) to the diff file?
So node syncing can validate the milestones and proof of inclusion to make sure the ledger state is indeed correct.
If I am not mistaken this will add ~1.76 GB to the file per year. In my opinion it is a small price to pay and it is much better than the other alternatives we mentioned here.

charlesthompson3

Here a few very minor edits, mostly for grammar and clarity; please let me know if you have any questions about them!

-Charles

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

karimodm · 2020-12-11T13:40:03Z

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

+
+Since a UTXO based ledger is much larger in size, this RFC proposes two formats for snapshot files: 
+* A `full` format which represents a complete ledger state.
+* A `delta` format which only contains diffs (consumed and spent outputs) of milestones from a given milestone index onwards.


consumed and spent

So far the only output type that can be consumed is a TX, which implies spending those funds. I would remove the "consumed" term as it implies that are already outputs that can be consumed without spending: such as Coordicie's mana pledges.

karimodm · 2020-12-11T13:40:55Z

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

+
+This separation allows nodes to swiftly create new delta snapshot files, which then can be distributed with a companion full snapshot file to reconstruct a recent state.
+
+Unlike the current format, these new formats do not include spent addresses since this information is no longer held by nodes.


The reason for removing them is the discontinuing of WOTS (if we finally come to an agreement in that regard).

karimodm · 2020-12-11T13:42:23Z

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

+
+![](https://i.imgur.com/e6WuufK.png)
+
+While the node producing such a full ledger state snapshot could theoretically pre-compute the actual snapshot milestone state, this is deferred to the consumer of the data to speed up local snapshot creation.


I am starting to get a bit confused about the terminology: "actual snapshot milestone" == "seps_miletone_index"?

karimodm · 2020-12-11T13:46:45Z

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

+		address_type<byte>
+		ed25519_address<array[32]>
+	value<uint64>
+diffs<array[diffs_count]>:


There is no specific ordering of diffs you enforce here. You can have a diff for milestone X followed by a diff for milestone X+100. For integrity and validation sake it would be nicer to have an array[ledger_milestone_index - seps_milestone_index] of an array of diffs between milestone X and X+1, sorted by X, starting from seps_milestone_index and ending on ledger_milestone_index.

karimodm · 2020-12-11T13:48:19Z

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

+
+* Is all the information to startup a node from the local snapshot available with the described format?
+* Can we get rid of the spent addresses or do we still need to keep them?
+* Do we need to account for different types of outputs already? (we currently only have them deposit to addresses)


I am afraid so... #32 already introduces a new output type that is crucial for validation.

karimodm · 2020-12-11T13:49:52Z

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

+
+# Rationale and alternatives
+
+* In conjunction with a companion full snapshot, a tool or node can "truncate" the data from a delta snapshot back to a single full snapshot. In that case, the `ledger_milestone_index` and `seps_milestone_index` would be the same. In the example above, given the full and delta snapshots, one could produce a new full snapshot for milestone 1350.


Here you mean that, after the truncation, the seps_milestone_index of the Delta will be == to the ledger_milestone_index of the Full. Correct?

luca-moser · 2021-01-07T09:45:25Z

We need to change the snapshot file format to following:

Include the entire milestone per diff
The merkle root will be computed out of the output IDs instead of the message IDs
The outputs within the snapshot file have to be ordered in the same order as they were for the merkle root computation

We switch away from message IDs, so one can compute whether the outputs within a milestone diff actually correspond to the milestone for that cone.

thibault-martinez · 2021-01-18T23:08:54Z

text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md

+    consumed_outputs<array>:
+        message_hash<array[32]>
+        transaction_hash<array[32]>
+        output_index<uint16>


I think this is missing the target TransactionId.

lzpap · 2021-12-15T15:07:57Z

merged due to TIP refactor

* add tip-23 * Update tip-0023.md * Update tip-0023.md * Update dynamic bytearray defs (#25) * Update with "Message" to "Block" renaming to align with IOTA 2.0 terminology Co-authored-by: Levente Pap <levente.pap@iota.org>

luca-moser added 7 commits August 26, 2020 11:45

adds local snapshot file format RFC

2799179

removes hackmd heading

ddb0b04

adds more open questions

c990385

fixes GH omitting array types

7c2ca9b

also fix omitted type for output array

e77be6a

removes redundant </b> tag

c115b5e

adds version byte

c3d1ae3

GalRogozinski reviewed Aug 26, 2020

View reviewed changes

karimodm reviewed Aug 27, 2020

View reviewed changes

luca-moser added 2 commits August 27, 2020 18:35

use new format

f6a87d5

removes hackmd fields

d320cf4

luca-moser requested review from karimodm and Wollac August 27, 2020 16:38

luca-moser added 3 commits August 27, 2020 18:42

use uint16 for utxo index

39c2e70

adds missing header

1fba065

updates local snapshot RFC

2d09aab

thibault-martinez requested changes Oct 26, 2020

View reviewed changes

luca-moser and others added 5 commits October 26, 2020 11:54

Update text/0000-local-snapshot-file-format/0000-local-snapshot-file-…

6ad0d0d

…format.md Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>

Update text/0000-local-snapshot-file-format/0000-local-snapshot-file-…

3669eb9

…format.md Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>

Update text/0000-local-snapshot-file-format/0000-local-snapshot-file-…

07a4f55

…format.md Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>

Update text/0000-local-snapshot-file-format/0000-local-snapshot-file-…

119af9f

…format.md Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>

Update text/0000-local-snapshot-file-format/0000-local-snapshot-file-…

723e64c

…format.md Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>

GalRogozinski suggested changes Nov 1, 2020

View reviewed changes

luca-moser added 2 commits November 12, 2020 10:17

adds network id

0083cc0

removes milestone hashes

663e678

charlesthompson3 reviewed Nov 18, 2020

View reviewed changes

karimodm suggested changes Dec 11, 2020

View reviewed changes

Wollac added the deployed on testnet label Jan 6, 2021

Wollac added the discussion needed RFC has pending protocol team discussions label Jan 6, 2021

luca-moser added the ready for testnet label Jan 14, 2021

luca-moser force-pushed the local-snapshot-file-format branch 2 times, most recently from 6b0169f to 2535b7f Compare January 18, 2021 07:45

thibault-martinez reviewed Jan 18, 2021

View reviewed changes

luca-moser added 2 commits February 18, 2021 15:28

adjusts format

5ab8f36

update to table notation

f1d9c46

luca-moser force-pushed the local-snapshot-file-format branch from 2535b7f to f1d9c46 Compare February 23, 2021 10:39

lzpap merged commit 500c750 into iotaledger:main Dec 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Local snapshot file format #25

RFC: Local snapshot file format #25

luca-moser commented Aug 26, 2020 •

edited

GalRogozinski left a comment

GalRogozinski Aug 26, 2020

karimodm Aug 27, 2020

GalRogozinski left a comment

GalRogozinski Nov 1, 2020

GalRogozinski Nov 1, 2020

GalRogozinski Nov 1, 2020

karimodm Dec 11, 2020

GalRogozinski Nov 1, 2020

luca-moser commented Nov 5, 2020

GalRogozinski commented Nov 9, 2020

charlesthompson3 left a comment

karimodm Dec 11, 2020

karimodm Dec 11, 2020

karimodm Dec 11, 2020

karimodm Dec 11, 2020

karimodm Dec 11, 2020

karimodm Dec 11, 2020

luca-moser commented Jan 7, 2021

thibault-martinez Jan 18, 2021

lzpap commented Dec 15, 2021


		All types are serialized in little-endian and occur in the sequence of the rows defined below. Local snapshot files are compressed via zlib to further reduce size.

		SEP = solid entry point. `Array[T]` are prefixed with a varint denoting the length.

		* delete old transaction data below a given milestone.

		Current node implementations use a [local snapshot file format](https://github.com/iotaledger/iri-ls-sa-merger/tree/351020d3b5e342b6e9a41f2868575ab7ff8c251c#generating-an-export-file-from-a-localsnapshots-db) which only works with account based ledgers. For Chrysalis Phase 2 this file format has to be assimilated to support a UTXO based ledger.

		![](https://i.imgur.com/bt5BUpe.png)

		A delta ledger state local snapshot is denoted by the type byte `1`:


		This separation allows nodes to swiftly create new delta snapshot files, which then can be distributed with a companion full snapshot file to reconstruct a recent state.

		Unlike the current format, these new formats do not include spent addresses since this information is no longer held by nodes.


		![](https://i.imgur.com/e6WuufK.png)

		While the node producing such a full ledger state snapshot could theoretically pre-compute the actual snapshot milestone state, this is deferred to the consumer of the data to speed up local snapshot creation.


		# Rationale and alternatives

		* In conjunction with a companion full snapshot, a tool or node can "truncate" the data from a delta snapshot back to a single full snapshot. In that case, the `ledger_milestone_index` and `seps_milestone_index` would be the same. In the example above, given the full and delta snapshots, one could produce a new full snapshot for milestone 1350.

RFC: Local snapshot file format #25

RFC: Local snapshot file format #25

Conversation

luca-moser commented Aug 26, 2020 • edited

GalRogozinski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GalRogozinski left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luca-moser commented Nov 5, 2020

GalRogozinski commented Nov 9, 2020

charlesthompson3 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luca-moser commented Jan 7, 2021

Choose a reason for hiding this comment

lzpap commented Dec 15, 2021

luca-moser commented Aug 26, 2020 •

edited