Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Local snapshot file format #25

Merged
merged 21 commits into from
Dec 15, 2021
Merged
Changes from 7 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
+ Feature name: `local_snapshot_file_format`
+ Start date: 2020-08-25
+ RFC PR: [iotaledger/protocol-rfcs#0000](https://github.com/iotaledger/protocol-rfcs/pull/0000)

# Summary

This RFC defines a file format for local snapshots which is compatible with Chrysalis Phase 2.

# Motivation

Nodes create local snapshots to produce ledger representations at a point in time of a given milestone in order to be able to:
* start up from a recent milestone instead of having to synchronize from genesis
* delete transaction data below the given milestone

Current node implementations use a [local snapshot file format](https://github.com/iotaledger/iri-ls-sa-merger/tree/351020d3b5e342b6e9a41f2868575ab7ff8c251c#generating-an-export-file-from-a-localsnapshots-db) which only works with account based ledgers. For Chrysalis Phase 2 this file format has to be assimilated to support a UTXO based ledger.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add something about:
"Standardizing one single format across different node implementations"

I guess otherwise nodes could just share their dbs instead of files

# Detailed design

All types are serialized in little-endian and occur in the sequence of the rows defined below. Local snapshot files are compressed via zlib to further reduce size.

SEP = solid entry point. `Array[T]` are prefixed with a varint denoting the length.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a one liner explanation of what is an SEP?

luca-moser marked this conversation as resolved.
Show resolved Hide resolved

This format describes version 1:
<table>
<tr>
<th>Name</th>
<th>Type</th>
<th>Description</th>
</tr>
<tr>
<td>Version</td>
<td>byte</td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version fields are specified as varints in all other RFCs.

<td>
The version of the local snapshot file format.
</td>
</tr>
<tr>
<td>Milestone Index</td>
<td>uint64</td>
<td>
The index of the milestone of the local snapshot.
</td>
</tr>
<tr>
<td>Milestone Hash</td>
<td>ByteArray[32]</td>
<td>
The BLAKE2b-256 hash of the milestone payload.
</td>
</tr>
<tr>
<td>Timestamp</td>
<td>uint64</td>
<td>
The UNIX epoch timestamp in seconds of when this snapshot was created.
</td>
</tr>
<tr>
<td>SEPs</td>
<td>Array[ByteArray[32]]</td>
<td>
The BLAKE2b-256 hashes of the SEP messages at the cut off point of the given milestone.
</td>
</tr>
<tr>
<td>UTXOs</td>
<td colspan="2">
<details open="true">
<summary>Array[UTXO]</summary>
luca-moser marked this conversation as resolved.
Show resolved Hide resolved
<blockquote>
Describes the unspent transaction outputs per transaction.
</blockquote>
<table>
<tr>
<td><b>Name<b></td>
<td><b>Type</b></td>
<td><b>Description</b></td>
</tr>
<tr>
<td>Transaction Hash</td>
<td>ByteArray[32]</td>
<td>The BLAKE2b-256 hash of the transaction.</td>
</tr>
<tr>
<td>
Unspent Outputs
</td>
<td colspan="2">
<details open="true">
<summary>Array[Outputs]</summary>
<table>
<tr>
<td>Index</td>
<td>byte</td>
Wollac marked this conversation as resolved.
Show resolved Hide resolved
<td>The index of the output on the transaction.</td>
</tr>
<tr>
<td valign="top">Address <code>oneOf</code></td>
<td colspan="2">
<details>
<summary>WOTS Address</summary>
<table>
<tr>
<td><b>Name<b></td>
<td><b>Type</b></td>
<td><b>Description</b></td>
</tr>
<tr>
<td>Address Type</td>
<td>byte/varint</td>
<td>
Set to <strong>value 0</strong> to denote a <i>WOTS Address</i>.
</td>
</tr>
<tr>
<td>Address</td>
<td>ByteArray[49]</td>
<td>The T5B1 encoded WOTS address.</td>
</tr>
</table>
</details>
<details>
<summary>Ed25519 Address</summary>
<table>
<tr>
<td><b>Name<b></td>
<td><b>Type</b></td>
<td><b>Description</b></td>
</tr>
<tr>
<td>Address Type</td>
<td>byte/varint</td>
<td>
Set to <strong>value 1</strong> to denote an <i>Ed25519 Address</i>.
</td>
</tr>
<tr>
<td>Address</td>
<td>ByteArray[32]</td>
<td>The raw bytes of the Ed25519 address which is a BLAKE2b-256 hash of the Ed25519 public key.</td>
</tr>
</table>
</details>
</td>
</tr>
<tr>
<td>Value</td>
<td>uint64</td>
<td>The output value.</td>
</tr>
</table>
</details>
</td>
</tr>
</table>
</details>
</td>
</tr>
<tr>
<td>Integrity hash</td>
<td>ByteArray[32]</td>
<td>
The SHA256 hash of all the previous fields.
Wollac marked this conversation as resolved.
Show resolved Hide resolved
</td>
</tr>
</table>

# Drawbacks

Nodes need to support this new format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you had nothing to fill in :-) ?

Maybe:

  1. We have 2 different snapshot files which may be confusing
  2. The size of the file will grow as the number of outputs increase


# Rationale and alternatives

* Grouping the UTXO per transaction reduces the file size.
* Using zlib for compression yields a ~55 MB (from ~115 MB) file on a local snapshot with ~2 million outputs (on Ed25519 addresses) from ~1 million transactions and 150 SEPs.

Unlike the current format, this new format does no longer include:
* Spent addresses: since this information is no longer held by nodes.
* Seen milestones: as they can be requested via protocol messages.

# Unresolved questions

* Is all the information to startup a node from the local snapshot available with the described format?
* Can we get rid of the spent addresses or do we still need to keep them?
* Can we just use a byte for the index or do we need to use something different?
* Do we need to account for different types of outputs already? (we currently only have them deposit to addresses)