Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] initial foray into storing optional "relative" kernels #1260

Closed
wants to merge 10 commits into from

Conversation

antiochp
Copy link
Member

@antiochp antiochp commented Jul 13, 2018

For relative locktimes we need to be able to reference a prior kernel.
These will be optional and the vast majority of kernels will not need to reference a prior kernel.

Our MMR implementation only supports storing fixed length of bytes per entry so we cannot easily store optional data directly in the MMR.

So we will need to store "extra" data alongside the primary MMR.
I'm proposing storing these referenced kernel commitments in a data file alongside a leaf_set (bitmap) that represents the pos of kernels that include the optional referenced kernel commitment.

So we will have something like the following -

/kernel/pmmr_data.bin
/kernel/pmmr_hash.bin
/kernel/pmmr_leaf.bin
/kernel/pmmr_prun.bin
/kernel_extra/pmmr_data.bin
/kernel_extra/pmmr_leaf.bin

We can determine offsets for reading the /kernel_extra/pmmr_data.bin via the leaf_set.

Reading/writing kernels gets a bit more complicated -

  1. include the referenced kernel when hashing a kernel.
  2. include the referenced kernel when serializing the kernel.
  3. But - exclude the referenced kernel when serializing the kernel into the MMR.

To achieve (3) the kernel MMR no longer takes kernels directly. We wrap a TxKernel in a TxKernelEntry and store these in the kernel MMR (to serialize them correctly as fixed number of bytes, excluding the optional data).

This is similar to how we serialize OutputIdentifier into the output MMR, and not Output directly.

Why don't we do this with two full MMRs like we do with outputs and their corresponding rangeproofs?

  1. we do not want to add another root to each header
  2. we want to include the referenced kernel in the kernel hash

We don't need a full MMR of hashes (the optional data is hashed into the kernel hashes themselves) so we only need a data file and a corresponding way of offsetting correctly into it.

TODO -

  • implement TxKernelEntry and make it PMMRable
  • use TxKernelEntry for the kernel MMR (and not kernels directly)
  • add new kernel feature to represent one with a "relative lock height"
  • implement the new ExtraBackend for optional related data (the reference kernel in this case)
  • integrate the new ExtraBackend into the existing txhashset/kernel_mmr
  • implement rewind for the new ExtraBackend
  • thread real rel_kernel through to wallet impl
  • where does the wallet get a kernel from? cmd line arg?
  • use all of this to actually implement a proof of concept for "relative lock_height" on a kernel
  • migration process - how/why?

@antiochp
Copy link
Member Author

@quentinlesceller this overlaps slightly with our need to handle versioned MMRs.
But in this specific case we want to be able to add "optional" data to some MMR entries.
We cannot currently support this as our MMRs expect each entry to be of the same fixed number of bytes.

@antiochp
Copy link
Member Author

Looking for some early feedback/comments on this approach @ignopeverell @yeastplume.
Does this look like roughly the right direction to be going in with this?
I'm not yet convinced the level of abstraction is right given that the MMRs themselves are very generic.

@ignopeverell
Copy link
Contributor

Great job remembering the kernel will have to also sign the one it references :-) Ideally we'd store all the data together but I can't think of a way to do that that'd be better than the "extra". And the extra may be more generic and reusable overall (say to add optional data to outputs). So I'm good with the overall approach.

@antiochp antiochp self-assigned this Aug 29, 2018
@antiochp antiochp added this to the Mainnet milestone Aug 29, 2018
@antiochp
Copy link
Member Author

antiochp commented Sep 14, 2018

We may want to reference a previous kernel by its hash and not directly via the kernel excess for privacy reasons.
Given a pair of txs (neither on-chain yet) with k2 referencing a an earlier k1 - we would not know anything about k1 until the first tx was broadcast and accepted in a block.
If we referenced the kernel excess directly other parties would know in advance what to look for if they knew what tx2 looked like.

Also the hash is 32 bytes vs 33 bytes for the excess, so the MMR of extra data would be a slightly smaller.

Edit: the hash is also a lot harder to intentionally duplicate as we could take the signature into account when hashing. The excess itself can be trivially duplicated in another tx.

@antiochp
Copy link
Member Author

antiochp commented Mar 12, 2019

This is now 6 months old and subject to a wide variety of merge conflicts.
As a PoC there is some value here but it is due a rewrite either way.

The biggest thing that came out of this work is the question around how t support variable length data in our PMMRs.

Do we go with an approach like this with optional "extra" data to facilitate some limited flexibility around variable data? Or do we go further and actually support truly variable length data, presumably with variable offsets on a per position basis.

I still lean toward supporting optional "extra" data, possibly in a more flexible way than proposed in this PR. i.e. A core kernel MMR containing all the required fields then support optional lock_height and optional relative_kernel in separate MMR data files.

This would allow us to support -

  • plain kernel
  • kernel with associated lock_height
  • kernel with associated fee
  • kernel with associated relative_kernel

And all within a set of MMR data files that all still assume fixed length data.
The fixed length data does allow significant flexibility in terms of reading multiple adjacent pieces of data. We don't take advantage of this today but it would be nice to have this option.

Alternatively we could maintain a "length" associated with each data entry to account for the various optional data fields.

And maybe "length" is maintained in one additional MMR data file (and not as a prefix on each entry). That way we can read the length MMR to determine how much data to read from the actual data MMR. This would allow us to read multiple values with a bit of indirection (read n lengths, calculate total length of data to read, finally read the bytes).

Now that I think about this -

  • we currently maintain a shift_cache and a leaf_shift_cache
  • these are based on the current pruned-ness of the PMMR
  • we build these up in memory when we (re)read the prune_list
  • if we were to introduce a length position we may want to combine length, shift, leaf_shift and track these on disk (and not a simple in memory cache)
  • these basically define byte offsets and byte lengths required to read the data and hash files

@antiochp
Copy link
Member Author

Closing this. We can always refer back to this for some context on the overall approach.
Replaced a big chunk of the backend storage impl with #2690 which should hopefully give us a nice flexible way of handling MMRs with variable sized data elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants