-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Local snapshot file format #25
RFC: Local snapshot file format #25
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nodes will write the data straight to a file and bypass the DB, they they will miss out on all the automatic feature goodies DB brings ( such as integrity checks and rollbacks)...
Maybe it is a good idea to mention that it is suggested to save the data to a db and extract the information from it to create the file?
|
||
All types are serialized in little-endian and occur in the sequence of the rows defined below. Local snapshot files are compressed via zlib to further reduce size. | ||
|
||
SEP = solid entry point. `Array[T]` are prefixed with a varint denoting the length. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a one liner explanation of what is an SEP?
text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md
Outdated
Show resolved
Hide resolved
</tr> | ||
<tr> | ||
<td>Version</td> | ||
<td>byte</td> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version fields are specified as varint
s in all other RFCs.
text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md
Outdated
Show resolved
Hide resolved
text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md
Outdated
Show resolved
Hide resolved
text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md
Outdated
Show resolved
Hide resolved
text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md
Outdated
Show resolved
Hide resolved
text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md
Outdated
Show resolved
Hide resolved
text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md
Outdated
Show resolved
Hide resolved
text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md
Outdated
Show resolved
Hide resolved
text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md
Outdated
Show resolved
Hide resolved
…format.md Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>
…format.md Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>
…format.md Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>
…format.md Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>
…format.md Co-authored-by: Thibault Martinez <thibault.martinez.30@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few points/questions besides the comments:
- I am not sure I actually understand your ideas completely
In my imagination there should be one static snapshot file that represents the global snapshot and another one that just has the diff from the global that shouldn't change. I understand that you want people to share both files? - I suppose the node will only generate the diff file upon some api calls, or every interval according to configuration? Because it probably will be better to usually update data in db that takes care of edge cases for data dumps?
- Maybe it is time we start thinking about an authentication mechanism for shared snapshot files. Something that always bothered me is that people can just start spreading cooked snapshot files around. Now that we are redefining the snapshot file maybe it is a good idea to think of some solution! Maybe add a signature field to it? And the node can verify it against a configurable public key? (just the simplest idea in my mind)
* delete old transaction data below a given milestone. | ||
|
||
Current node implementations use a [local snapshot file format](https://github.com/iotaledger/iri-ls-sa-merger/tree/351020d3b5e342b6e9a41f2868575ab7ff8c251c#generating-an-export-file-from-a-localsnapshots-db) which only works with account based ledgers. For Chrysalis Phase 2 this file format has to be assimilated to support a UTXO based ledger. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add something about:
"Standardizing one single format across different node implementations"
I guess otherwise nodes could just share their dbs instead of files
|
||
### Formats | ||
|
||
> All types are serialized in little-endian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is non-numerical data encoded?
![](https://i.imgur.com/bt5BUpe.png) | ||
|
||
A delta ledger state local snapshot is denoted by the type byte `1`: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though I agree a node should keep track of the diff at every milestone, should the snapshot file keep track of it?
I would just expect a diff from the base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to include information at every milestone actually: especially given the fact that it will be a good thing to introduce MS data as well (proof of inclusion specifically).
|
||
# Drawbacks | ||
|
||
Nodes need to support this new format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you had nothing to fill in :-) ?
Maybe:
- We have 2 different snapshot files which may be confusing
- The size of the file will grow as the number of outputs increase
It is up to the nodes from where they get the data from for the snapshot files but they will read it from the database mainly. Have a look at the implementation in Hornet if you're interested in an implementation.
That is what this RFC is describing, two files. One containing a full ledger and one only being deltas for after that state. A global snapshot and full snapshot are equivalent.
You need to share both files if a node needs to bootstrap for the first time, as the delta files of course do not contain all the information needed for a node to bootstrap.
This is an implementation detail. Nodes will probably create these files up on request (trigger by an API call) and/or by defined interval. Please check out the Hornet code if you're interested in an implementation detail.
A node operator should download the snapshot files from a trusted source. I believe if you really want to ensure the data is correct, you'd have to expand the mechanism to compare multiple snapshot files from different sources. Perhaps a tool where a user defines a list of nodes can request snapshot data from them and then compare them against each other would work. In any case, faulty snapshots will cause the nodes to crash if the inclusion Merkle proof are off. |
Thanks for clarifying What worries me about people that can create their own full snapshot files (rather than having a static global snapshot file) is that they can be mixed up with the wrong delta files. From what we had in the past with spent-addresses this is what I expect to happen.
Hmm, I think you meant that this what will happen after snapshot is loaded and fresh milestones will come in, thus making the situation same as before. However, what if we add the entire milestone (it is 560 byte assuming it has 3 signatures) to the diff file? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here a few very minor edits, mostly for grammar and clarity; please let me know if you have any questions about them!
-Charles
text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md
Outdated
Show resolved
Hide resolved
text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md
Outdated
Show resolved
Hide resolved
text/0000-local-snapshot-file-format/0000-local-snapshot-file-format.md
Outdated
Show resolved
Hide resolved
|
||
Since a UTXO based ledger is much larger in size, this RFC proposes two formats for snapshot files: | ||
* A `full` format which represents a complete ledger state. | ||
* A `delta` format which only contains diffs (consumed and spent outputs) of milestones from a given milestone index onwards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consumed and spent
So far the only output type that can be consumed is a TX, which implies spending those funds. I would remove the "consumed" term as it implies that are already outputs that can be consumed without spending: such as Coordicie's mana pledges.
|
||
This separation allows nodes to swiftly create new delta snapshot files, which then can be distributed with a companion full snapshot file to reconstruct a recent state. | ||
|
||
Unlike the current format, these new formats do not include spent addresses since this information is no longer held by nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for removing them is the discontinuing of WOTS (if we finally come to an agreement in that regard).
|
||
![](https://i.imgur.com/e6WuufK.png) | ||
|
||
While the node producing such a full ledger state snapshot could theoretically pre-compute the actual snapshot milestone state, this is deferred to the consumer of the data to speed up local snapshot creation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am starting to get a bit confused about the terminology: "actual snapshot milestone" == "seps_miletone_index"?
address_type<byte> | ||
ed25519_address<array[32]> | ||
value<uint64> | ||
diffs<array[diffs_count]>: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no specific ordering of diffs you enforce here. You can have a diff for milestone X followed by a diff for milestone X+100. For integrity and validation sake it would be nicer to have an array[ledger_milestone_index - seps_milestone_index]
of an array of diffs between milestone X and X+1, sorted by X, starting from seps_milestone_index
and ending on ledger_milestone_index
.
|
||
* Is all the information to startup a node from the local snapshot available with the described format? | ||
* Can we get rid of the spent addresses or do we still need to keep them? | ||
* Do we need to account for different types of outputs already? (we currently only have them deposit to addresses) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid so... #32 already introduces a new output type that is crucial for validation.
|
||
# Rationale and alternatives | ||
|
||
* In conjunction with a companion full snapshot, a tool or node can "truncate" the data from a delta snapshot back to a single full snapshot. In that case, the `ledger_milestone_index` and `seps_milestone_index` would be the same. In the example above, given the full and delta snapshots, one could produce a new full snapshot for milestone 1350. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you mean that, after the truncation, the seps_milestone_index
of the Delta will be == to the ledger_milestone_index
of the Full. Correct?
We need to change the snapshot file format to following:
We switch away from message IDs, so one can compute whether the outputs within a milestone diff actually correspond to the milestone for that cone. |
6b0169f
to
2535b7f
Compare
consumed_outputs<array>: | ||
message_hash<array[32]> | ||
transaction_hash<array[32]> | ||
output_index<uint16> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is missing the target TransactionId.
2535b7f
to
f1d9c46
Compare
merged due to TIP refactor |
* add tip-23 * Update tip-0023.md * Update tip-0023.md * Update dynamic bytearray defs (#25) * Update with "Message" to "Block" renaming to align with IOTA 2.0 terminology Co-authored-by: Levente Pap <levente.pap@iota.org>
Rendered