Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bolt04: Variable `hop_payload` for the sphinx onion #619

Open
wants to merge 10 commits into
base: master
from

Conversation

4 participants
@cdecker
Copy link
Collaborator

commented Jun 6, 2019

This is the variable payload length variant of #593. It removes the concept of fixed-length frames, in favor of just prefixing the payload with its varint-encoded length, followed by the HMAC.

The varint is limited to 1 and 3 byte encodings, since the maximum length of the payload is 1300 - 32 - 3 bytes, and the 5 and 9 byte varints are not useful. The HMAC is not part of the payload since it allows us to fit more of the shorter lengths in the single byte varint.

I didn't update the example code yet, since the Go version was not migrated yet, but it should not be too much work to update that later.

@cdecker cdecker force-pushed the variable-onion branch from 5dad251 to 54f03d5 Jun 6, 2019

@t-bast
Copy link
Collaborator

left a comment

Good stuff, only nit comments from me.

Maybe the only thing that we could discuss is the option of including the HMAC inside the hop_payload to be able to remove it entirely from the last hop and save 32 bytes (because the fact that a hop is the last hop can be encoded more efficiently than with 32 0x00 bytes)?

EDIT: in fact it probably isn't a good idea to put the HMAC inside the hop_payload, because it would cost 2 additional bytes per hop (the type and length, and potentially might require 3 bytes instead of 1 for the hop_payload length) so it ends up costing more than 32 0x00 bytes in the last hop for long routes. You can ignore that comment.

Show resolved Hide resolved 04-onion-routing.md Outdated
Show resolved Hide resolved 04-onion-routing.md Outdated
Show resolved Hide resolved 04-onion-routing.md
Show resolved Hide resolved 04-onion-routing.md Outdated
Show resolved Hide resolved 04-onion-routing.md
Show resolved Hide resolved 04-onion-routing.md Outdated
Show resolved Hide resolved 04-onion-routing.md Outdated
Show resolved Hide resolved bolt04/onion-test-multi-frame.json Outdated

@cdecker cdecker moved this from Scheduled to Accepted in Specification Meeting Agenda Jun 10, 2019

@t-bast t-bast referenced this pull request Jun 11, 2019

Open

Multi frame onion #976

2 of 2 tasks complete
@t-bast

This comment has been minimized.

Copy link
Collaborator

commented Jun 11, 2019

@cdecker I think the feature flag that was included in #593 didn't make it to this PR, could you re-add it?

@t-bast

This comment has been minimized.

Copy link
Collaborator

commented Jun 14, 2019

@Roasbeef would you have time to review this PR to unblock merging it?

@Roasbeef
Copy link
Member

left a comment

In addition to lack of the specification for the varint, I think this PR is also missing:

  • the new feature bit selection
  • an updated section on the new feature generation

I also think that we should keep the existing realm field in place. Otherwise, we're more or less locked into this new format for the current onion version, and it also makes code that handles the current and legacy format more complex.

One other thing that stands out is that since the encoded integers for the hop payload are variably sized, there isn't a clear "max route length" any longer. Instead, it's now dependent on the encoded size of each of these fields which makes things a bit harder to analyze.

2. data:
* [`1`:`realm`]
* [`32`:`per_hop`]
* [`1-3`:`length`]

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jun 18, 2019

Member

I think we should leave the old section in place and have these new messages be defined in a distinct section as the "modern packet format". Even once we make the switch over the the spec level there're still be a number of nodes that haven't yet upgraded (we can't really force upgrades, other than make them appear to be very attractive) which prospective implementers of the spec will likely want to be able to interact with.

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jun 18, 2019

Member

We also cut off future upgrade paths by removing the realm here. Instead, we can keep the realm and use a fixed size 2-byte integer.

This comment has been minimized.

Copy link
@t-bast

t-bast Jun 18, 2019

Collaborator

Only if we want to upgrade out of TLV right?
If our stance is that TLV is the future we don't need a realm, the TLV type is the effective realm.
But if you think we need a way to move away from TLV in the future, in that case the realm would be needed.
I don't have a strong opinion on that, I think we still have the top-level onion version if we ever need to move away from TLV.

This comment has been minimized.

Copy link
@cdecker

cdecker Jun 24, 2019

Author Collaborator

I took a look at this again, and I think it's better to extract the common parts like the patch proposes, and then go into details further down. I'd like to treat the legacy version as a special case of the newer thing, rather than the other way around.


The `length` field determines both the length and the format of the `hop_payload` field; the following formats are defined:

- Legacy `hop_data` format, identified by a `0x00` length.

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jun 18, 2019

Member

by a single byte 0x00 length?

Otherwise will 0x00 and 0x00 0x00 0x00 both mean to parse the legacy format?

This comment has been minimized.

Copy link
@cdecker

cdecker Jun 18, 2019

Author Collaborator

Yes, a single byte 0x00 is more precise. The second option would be invalid for minimal encoding anyway :-)


For each hop in the route, in reverse order, the sender applies the
following operations:

- The _rho_-key and _mu_-key are generated using the hop's shared secret.
- The `hops_data` field is right-shifted by 65 bytes, discarding the last 65
- `shift_size` is defined as the length of the `hop_payload` plus the varint encoding of the length and the length of that HMAC. Thus if the payload is `l` then the `shift_size` is `1 + l + 32` for `l < 253`, otherwise `3 + l + 32` due to the varint encoding of `l`.

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jun 18, 2019

Member

This varint encoding isn't defined anywhere in this PR.

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jun 18, 2019

Member

The Go code below also needs to be updated as it still references constants which are fixed values rather than variable due to the new payload format.

This comment has been minimized.

Copy link
@cdecker

cdecker Jun 18, 2019

Author Collaborator

Yeah, I didn't implement the Go version, so I can replace the code with the C version only.

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jun 24, 2019

Member

Gotcha, after I update can slot it in.


For each hop in the route, in reverse order, the sender applies the
following operations:

- The _rho_-key and _mu_-key are generated using the hop's shared secret.
- The `hops_data` field is right-shifted by 65 bytes, discarding the last 65
- `shift_size` is defined as the length of the `hop_payload` plus the varint encoding of the length and the length of that HMAC. Thus if the payload is `l` then the `shift_size` is `1 + l + 32` for `l < 253`, otherwise `3 + l + 32` due to the varint encoding of `l`.

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jun 18, 2019

Member

I think it should be:

if the length of the payload is l

@Roasbeef

This comment has been minimized.

Copy link
Member

commented Jun 18, 2019

Hey @t-bast, I don't think I'm blocking anything here. This PR as is is underspecified, and there're comments that are a week or two old which are currently unaddressed.

@t-bast

This comment has been minimized.

Copy link
Collaborator

commented Jun 18, 2019

@Roasbeef there probably was some miscommunication here, but we had an action item in the IRC meeting to get your feedback on this PR (because the rest of us were ok with the proposal), which is why I think @cdecker was waiting for your comments before fixing the others (which were mainly nits).

@cdecker

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 18, 2019

@Roasbeef there probably was some miscommunication here, but we had an action item in the IRC meeting to get your feedback on this PR (because the rest of us were ok with the proposal), which is why I think @cdecker was waiting for your comments before fixing the others (which were mainly nits).

No, I'm just working on other stuff in the meantime, since we won't merge this before monday anyway, I thought I'd better work on more pressing issues first. @Roasbeef wasn't really blocking anything here :-)

@t-bast

This comment has been minimized.

Copy link
Collaborator

commented Jun 18, 2019

since we won't merge this before monday anyway

Damn, I had misunderstood the last meeting's outcome, I got my hopes up for nothing then 😬

I'd love to help move this forward though as TLV + variable-length onion are the two features that unblock many other cool things (with non-negligible user impact).

Is there something I can help with or are you good? I can for example pick up the update of the Go implementation if no-one has time to work on it.

@cdecker

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 18, 2019

Added a bunch of fixups mostly addressing @t-bast's feedback. Will see if it makes sense to keep the old payload section in place, or whether it just reads better by restructuring.

I'm not planning on implementing the Go version, so if @t-bast wants to do it, that'd work for me 😁

Sorry for the mixup, I was under the impression we wanted to discuss one last time on Monday.

@cdecker

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 18, 2019

In addition to lack of the specification for the varint, I think this PR is also missing:

  • the new feature bit selection

I cherry-picked the commits defining the feature bit from #593 now.

  • an updated section on the new feature generation

I'm not sure I understand what you mean by this.

I also think that we should keep the existing realm field in place. Otherwise, we're more or less locked into this new format for the current onion version, and it also makes code that handles the current and legacy format more complex.

I can restore that section if it makes sense in the text. As for lockin, that's pretty much guaranteed at this point, sadly. That was one of the advantages of the multi-frame proposal, namely that we'd be left with some bits to define future, non-TLV, payload types.

One other thing that stands out is that since the encoded integers for the hop payload are variably sized, there isn't a clear "max route length" any longer. Instead, it's now dependent on the encoded size of each of these fields which makes things a bit harder to analyze.

Indeed, I have now removed any reference to 20 being the maximum route length, since that's no longer true. This was incidentally also the main reason I was holding on to the 65 frame size, it made sure that 20 would really be the maximum length. Not sure if this is a feature or a bug with the variable length payload tbh 😉

@cfromknecht
Copy link
Contributor

left a comment

@cdecker a couple things from me, see inline comments

@t-bast i don't think we are blocked on the implementation when we're still hammering out the details of the proposal


1. type: `per_hop` (for `realm` 0)
- Legacy `hop_data` format, identified by a `0x00` length.
- `tlv_payload` format, identified by any other length.

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht Jun 19, 2019

Contributor

imo it'd be preferable to keep the realm byte/version in-tact, and just call this the version 1 hop_data format. i'm hesitant to nerf future versioning of the inner payload format, given that the upgrade path of the outer version is unclear at this point.

we could even move to signal the final hop by setting the high order realm bit instead of using an empty hmac, saving 32-num_hops bytes

2. types:
1. type: 2 (`amt_to_forward`)
2. data:
* [`integer`:`amt_to_forward`]

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht Jun 19, 2019

Contributor

perplexed that we are now okay with introducing a custom varint scheme.. 🤔

what is the rationale behind amt_to_forward and outgoing_cltv_value meriting a custom varint where as length and type did not?

This comment has been minimized.

Copy link
@t-bast

t-bast Jun 19, 2019

Collaborator

I'm not sure this means a custom varint encoding, I understood it simply as using TLV to encode an integer value...
Or does it refer to the previous proposal by rusty to optimize the encoding of integers in TLV?

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht Jun 19, 2019

Contributor

i assumed that if it were fixed-size, it would have been denoted as 8 like short_chan_id, but should clarify

This comment has been minimized.

Copy link
@cdecker

cdecker Jun 19, 2019

Author Collaborator

We can change the type (length) to read varint if that makes things easier 😉

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht Jun 19, 2019

Contributor

varint as in CompactSize?

This comment has been minimized.

Copy link
@t-bast

t-bast Jun 19, 2019

Collaborator

@cdecker correct me if I misunderstood, but here is an example to clarify. If we want to encode the amount 550, here is what the bytes would look like: 0x0203fd2602.
The type is 0x02 and the value is encoded as a varint (Bitcoin CompactSize): 0xfd2602. The length is redundant but necessary otherwise stream parsing would break.

Another option would be to encode it as 0x02020226 (or 0x02022602 if we prefer little-endian) and have the parser know that since it's supposed to be an int, it should be padded to the correct length (depending on the language used).

I agree with @cfromknecht that the proposal as-is could be interpreted in multiple ways, so it's worth clarifying :)

- The legacy `hop_data` is identified by a single `0x00`-byte prefix
- The variable length `hop_payload` is prefixed with a `varint` encoding
the length in bytes, excluding the prefix and the trailing HMAC.
- This specification makes us of available length encoded integer format,

This comment has been minimized.

Copy link
@t-bast

t-bast Jun 19, 2019

Collaborator
Suggested change
- This specification makes us of available length encoded integer format,
- This specification makes use of available length encoded integer format,
Show resolved Hide resolved 04-onion-routing.md
## Legacy `hop_data` payload format

The `hop_data` format is identified by a single `0x00`-byte length, for backward compatibility.
It's payload is defined as:

This comment has been minimized.

Copy link
@t-bast

t-bast Jun 19, 2019

Collaborator
Suggested change
It's payload is defined as:
Its payload is defined as:
2. types:
1. type: 2 (`amt_to_forward`)
2. data:
* [`integer`:`amt_to_forward`]

This comment has been minimized.

Copy link
@t-bast

t-bast Jun 19, 2019

Collaborator

I'm not sure this means a custom varint encoding, I understood it simply as using TLV to encode an integer value...
Or does it refer to the previous proposal by rusty to optimize the encoding of integers in TLV?

Show resolved Hide resolved 04-onion-routing.md
bytes that exceed its 1300-byte size.
- The `version`, `short_channel_id`, `amt_to_forward`, `outgoing_cltv_value`,
`padding`, and `HMAC` are copied into the following 65 bytes.
- The varint-serialized length, serialized `hop_payload` and `HMAC` are copied into the following `shift_size` bytes.
- The _rho_-key is used to generate 1300 bytes of pseudo-random byte stream
which is then applied, with `XOR`, to the `hops_data` field.

This comment has been minimized.

Copy link
@t-bast

t-bast Jun 19, 2019

Collaborator
Suggested change
which is then applied, with `XOR`, to the `hops_data` field.
which is then applied, with `XOR`, to the `hop_payloads` field.
@t-bast

This comment has been minimized.

Copy link
Collaborator

commented Jun 19, 2019

@t-bast i don't think we are blocked on the implementation when we're still hammering out the details of the proposal

I think that most of the proposal makes sense to implement now to be able to provide spec feedback. For example there's a mention that dropping the realm byte makes it hard in the code to handle legacy and new format. That wasn't really an issue for my implementation and I don't see how re-adding the realm byte would make the implementation simpler (it does provide a way out of TLV though, that argument remains valid).

But if you feel this is not necessary right now, I won't do anything on the go implementation, let me know if you need a hand at some point.


The reader:
- MUST return an error if `amt_to_forward` or `outgoing_cltv_value` are not present.
- MUST return an error if it is not the final node and `short_channel_id` is not present.

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht Jun 20, 2019

Contributor

Given that the proposal currently uses the HMAC to signal the exit hop, enforcing these requires a lookahead since the HMAC follows the TLV payload or remembering which types the TLV payload contained when checking the HMAC :( Ideally the final hop would be signaled before parsing the TLV payload, making it possible to enforce these requirements while parsing the packet linearly.

Perhaps this is another argument for signaling the final hop in the realm byte and dropping the final HMAC?

This comment has been minimized.

Copy link
@cdecker

cdecker Jun 24, 2019

Author Collaborator

I think we should add a signal in the TLV (making it a low type such that it gets processed first)?

This comment has been minimized.

Copy link
@t-bast

t-bast Jun 24, 2019

Collaborator

That means that the onion processing layer needs to parse and interpret the TLV stream to look for a particular TLV type though, isn't this a layering violation? Whereas signaling via the realm byte might be a bit cleaner, don't you think?

cdecker added some commits Mar 28, 2019

bolt04: Formatting cleanup and fold clarifications into conventions
The clarifications were tacked on after the fact, but they should really be
part of the conventions. I also updated the links to use the reference style,
which results in better text flow and makes it easier to read the source.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
bolt04: Describe the variable size `hop_payload`
Signed-off-by: Christian Decker <decker.christian@gmail.com>
bolt04: Amend the filler generation and onion decoding to varpayload
This actually introduces the variable size shift and filler generation.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
bolt04: Remove in-spec test vector in favor of JSON test vector
Signed-off-by: Christian Decker <decker.christian@gmail.com>
bolt04: Add the TLV types for the new payload format
Signed-off-by: Christian Decker <decker.christian@gmail.com>
bolt04: Add reference to varint in the Bitcoin proto documentation
Suggested-by: Roasbeef <@Roasbeef>
Signed-off-by: Christian Decker <@cdecker>

@cdecker cdecker force-pushed the variable-onion branch from 13033b7 to e6dbdba Jun 24, 2019

@Roasbeef

This comment has been minimized.

Copy link
Member

commented Jun 24, 2019

an updated section on the new feature generation

Oops, meant filler generation.

As for lockin, that's pretty much guaranteed at this point, sadly.

I don't think we're locked in if we leave the existing realm in place. I think all the prior iterations we went through have allowed us to explore the level of extensibility the current "realm" field as to offer. Would be shame if we closed off all future paths prematurely. I think we can get the best of both worlds by keeping the realm in place, and just using 2 fixed bytes for a length. 65KB is far larger than the max payload size in the onion, it simplifies parsing, and also allows us to avoid unnecessarily adding a var int into the protocol.

cdecker added some commits Jun 24, 2019

bolt04: Note that realm 0x01 is reserved and not usable for TLVs
As proposed during the spec meeting, realm 0x01 provides us with an escape
hatch should we ever come up with the new best thing since sliced bread.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
bolt04: Introduce the destination_signal to the tlv_payload
As discussed during the spec meeting this allows us not to use the 32 byte
HMAC to identify the last hop, and use a 2-byte signal instead.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
@t-bast

This comment has been minimized.

Copy link
Collaborator

commented Jun 25, 2019

I think all design comments (realm vs no realm, termination signaling via TLV) have been addressed during yesterday's meeting right?

There are still a few typos/nits and a clarification on the TLVs' integer notation, but apart from that are we missing something? What are the next steps before merging this?

@Roasbeef @cfromknecht are you happy with the current state of the proposal? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.