Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bolt04: Variable `hop_payload` for the sphinx onion #619

Merged
merged 10 commits into from Jul 26, 2019

Conversation

@cdecker
Copy link
Collaborator

commented Jun 6, 2019

This is the variable payload length variant of #593. It removes the concept of fixed-length frames, in favor of just prefixing the payload with its varint-encoded length, followed by the HMAC.

The varint is limited to 1 and 3 byte encodings, since the maximum length of the payload is 1300 - 32 - 3 bytes, and the 5 and 9 byte varints are not useful. The HMAC is not part of the payload since it allows us to fit more of the shorter lengths in the single byte varint.

I didn't update the example code yet, since the Go version was not migrated yet, but it should not be too much work to update that later.

@cdecker cdecker force-pushed the variable-onion branch from 5dad251 to 54f03d5 Jun 6, 2019

@t-bast
Copy link
Collaborator

left a comment

Good stuff, only nit comments from me.

Maybe the only thing that we could discuss is the option of including the HMAC inside the hop_payload to be able to remove it entirely from the last hop and save 32 bytes (because the fact that a hop is the last hop can be encoded more efficiently than with 32 0x00 bytes)?

EDIT: in fact it probably isn't a good idea to put the HMAC inside the hop_payload, because it would cost 2 additional bytes per hop (the type and length, and potentially might require 3 bytes instead of 1 for the hop_payload length) so it ends up costing more than 32 0x00 bytes in the last hop for long routes. You can ignore that comment.

04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Outdated Show resolved Hide resolved
bolt04/onion-test-multi-frame.json Outdated Show resolved Hide resolved
@t-bast

This comment has been minimized.

Copy link
Collaborator

commented Jun 11, 2019

@cdecker I think the feature flag that was included in #593 didn't make it to this PR, could you re-add it?

@t-bast

This comment has been minimized.

Copy link
Collaborator

commented Jun 14, 2019

@Roasbeef would you have time to review this PR to unblock merging it?

@Roasbeef
Copy link
Member

left a comment

In addition to lack of the specification for the varint, I think this PR is also missing:

  • the new feature bit selection
  • an updated section on the new feature generation

I also think that we should keep the existing realm field in place. Otherwise, we're more or less locked into this new format for the current onion version, and it also makes code that handles the current and legacy format more complex.

One other thing that stands out is that since the encoded integers for the hop payload are variably sized, there isn't a clear "max route length" any longer. Instead, it's now dependent on the encoded size of each of these fields which makes things a bit harder to analyze.

04-onion-routing.md Outdated Show resolved Hide resolved

The `length` field determines both the length and the format of the `hop_payload` field; the following formats are defined:

- Legacy `hop_data` format, identified by a `0x00` length.

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jun 18, 2019

Member

by a single byte 0x00 length?

Otherwise will 0x00 and 0x00 0x00 0x00 both mean to parse the legacy format?

This comment has been minimized.

Copy link
@cdecker

cdecker Jun 18, 2019

Author Collaborator

Yes, a single byte 0x00 is more precise. The second option would be invalid for minimal encoding anyway :-)


For each hop in the route, in reverse order, the sender applies the
following operations:

- The _rho_-key and _mu_-key are generated using the hop's shared secret.
- The `hops_data` field is right-shifted by 65 bytes, discarding the last 65
- `shift_size` is defined as the length of the `hop_payload` plus the varint encoding of the length and the length of that HMAC. Thus if the payload is `l` then the `shift_size` is `1 + l + 32` for `l < 253`, otherwise `3 + l + 32` due to the varint encoding of `l`.

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jun 18, 2019

Member

This varint encoding isn't defined anywhere in this PR.

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jun 18, 2019

Member

The Go code below also needs to be updated as it still references constants which are fixed values rather than variable due to the new payload format.

This comment has been minimized.

Copy link
@cdecker

cdecker Jun 18, 2019

Author Collaborator

Yeah, I didn't implement the Go version, so I can replace the code with the C version only.

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jun 24, 2019

Member

Gotcha, after I update can slot it in.


For each hop in the route, in reverse order, the sender applies the
following operations:

- The _rho_-key and _mu_-key are generated using the hop's shared secret.
- The `hops_data` field is right-shifted by 65 bytes, discarding the last 65
- `shift_size` is defined as the length of the `hop_payload` plus the varint encoding of the length and the length of that HMAC. Thus if the payload is `l` then the `shift_size` is `1 + l + 32` for `l < 253`, otherwise `3 + l + 32` due to the varint encoding of `l`.

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jun 18, 2019

Member

I think it should be:

if the length of the payload is l

@Roasbeef

This comment has been minimized.

Copy link
Member

commented Jun 18, 2019

Hey @t-bast, I don't think I'm blocking anything here. This PR as is is underspecified, and there're comments that are a week or two old which are currently unaddressed.

@t-bast

This comment has been minimized.

Copy link
Collaborator

commented Jun 18, 2019

@Roasbeef there probably was some miscommunication here, but we had an action item in the IRC meeting to get your feedback on this PR (because the rest of us were ok with the proposal), which is why I think @cdecker was waiting for your comments before fixing the others (which were mainly nits).

@cdecker

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 18, 2019

@Roasbeef there probably was some miscommunication here, but we had an action item in the IRC meeting to get your feedback on this PR (because the rest of us were ok with the proposal), which is why I think @cdecker was waiting for your comments before fixing the others (which were mainly nits).

No, I'm just working on other stuff in the meantime, since we won't merge this before monday anyway, I thought I'd better work on more pressing issues first. @Roasbeef wasn't really blocking anything here :-)

@t-bast

This comment has been minimized.

Copy link
Collaborator

commented Jun 18, 2019

since we won't merge this before monday anyway

Damn, I had misunderstood the last meeting's outcome, I got my hopes up for nothing then 😬

I'd love to help move this forward though as TLV + variable-length onion are the two features that unblock many other cool things (with non-negligible user impact).

Is there something I can help with or are you good? I can for example pick up the update of the Go implementation if no-one has time to work on it.

@cdecker

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 18, 2019

Added a bunch of fixups mostly addressing @t-bast's feedback. Will see if it makes sense to keep the old payload section in place, or whether it just reads better by restructuring.

I'm not planning on implementing the Go version, so if @t-bast wants to do it, that'd work for me 😁

Sorry for the mixup, I was under the impression we wanted to discuss one last time on Monday.

@cdecker

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 18, 2019

In addition to lack of the specification for the varint, I think this PR is also missing:

  • the new feature bit selection

I cherry-picked the commits defining the feature bit from #593 now.

  • an updated section on the new feature generation

I'm not sure I understand what you mean by this.

I also think that we should keep the existing realm field in place. Otherwise, we're more or less locked into this new format for the current onion version, and it also makes code that handles the current and legacy format more complex.

I can restore that section if it makes sense in the text. As for lockin, that's pretty much guaranteed at this point, sadly. That was one of the advantages of the multi-frame proposal, namely that we'd be left with some bits to define future, non-TLV, payload types.

One other thing that stands out is that since the encoded integers for the hop payload are variably sized, there isn't a clear "max route length" any longer. Instead, it's now dependent on the encoded size of each of these fields which makes things a bit harder to analyze.

Indeed, I have now removed any reference to 20 being the maximum route length, since that's no longer true. This was incidentally also the main reason I was holding on to the 65 frame size, it made sure that 20 would really be the maximum length. Not sure if this is a feature or a bug with the variable length payload tbh 😉

@cfromknecht
Copy link
Collaborator

left a comment

@cdecker a couple things from me, see inline comments

@t-bast i don't think we are blocked on the implementation when we're still hammering out the details of the proposal


1. type: `per_hop` (for `realm` 0)
- Legacy `hop_data` format, identified by a `0x00` length.
- `tlv_payload` format, identified by any other length.

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht Jun 19, 2019

Collaborator

imo it'd be preferable to keep the realm byte/version in-tact, and just call this the version 1 hop_data format. i'm hesitant to nerf future versioning of the inner payload format, given that the upgrade path of the outer version is unclear at this point.

we could even move to signal the final hop by setting the high order realm bit instead of using an empty hmac, saving 32-num_hops bytes

04-onion-routing.md Outdated Show resolved Hide resolved
- The legacy `hop_data` is identified by a single `0x00`-byte prefix
- The variable length `hop_payload` is prefixed with a `varint` encoding
the length in bytes, excluding the prefix and the trailing HMAC.
- This specification makes us of available length encoded integer format,

This comment has been minimized.

Copy link
@t-bast

t-bast Jun 19, 2019

Collaborator
Suggested change
- This specification makes us of available length encoded integer format,
- This specification makes use of available length encoded integer format,
04-onion-routing.md Show resolved Hide resolved
## Legacy `hop_data` payload format

The `hop_data` format is identified by a single `0x00`-byte length, for backward compatibility.
It's payload is defined as:

This comment has been minimized.

Copy link
@t-bast

t-bast Jun 19, 2019

Collaborator
Suggested change
It's payload is defined as:
Its payload is defined as:
04-onion-routing.md Outdated Show resolved Hide resolved
04-onion-routing.md Show resolved Hide resolved
bytes that exceed its 1300-byte size.
- The `version`, `short_channel_id`, `amt_to_forward`, `outgoing_cltv_value`,
`padding`, and `HMAC` are copied into the following 65 bytes.
- The varint-serialized length, serialized `hop_payload` and `HMAC` are copied into the following `shift_size` bytes.
- The _rho_-key is used to generate 1300 bytes of pseudo-random byte stream
which is then applied, with `XOR`, to the `hops_data` field.

This comment has been minimized.

Copy link
@t-bast

t-bast Jun 19, 2019

Collaborator
Suggested change
which is then applied, with `XOR`, to the `hops_data` field.
which is then applied, with `XOR`, to the `hop_payloads` field.
@t-bast

This comment has been minimized.

Copy link
Collaborator

commented Jun 19, 2019

@t-bast i don't think we are blocked on the implementation when we're still hammering out the details of the proposal

I think that most of the proposal makes sense to implement now to be able to provide spec feedback. For example there's a mention that dropping the realm byte makes it hard in the code to handle legacy and new format. That wasn't really an issue for my implementation and I don't see how re-adding the realm byte would make the implementation simpler (it does provide a way out of TLV though, that argument remains valid).

But if you feel this is not necessary right now, I won't do anything on the go implementation, let me know if you need a hand at some point.

04-onion-routing.md Outdated Show resolved Hide resolved

The reader:
- MUST return an error if `amt_to_forward` or `outgoing_cltv_value` are not present.
- MUST return an error if it is not the final node and `short_channel_id` is not present.

This comment has been minimized.

Copy link
@cfromknecht

cfromknecht Jun 20, 2019

Collaborator

Given that the proposal currently uses the HMAC to signal the exit hop, enforcing these requires a lookahead since the HMAC follows the TLV payload or remembering which types the TLV payload contained when checking the HMAC :( Ideally the final hop would be signaled before parsing the TLV payload, making it possible to enforce these requirements while parsing the packet linearly.

Perhaps this is another argument for signaling the final hop in the realm byte and dropping the final HMAC?

This comment has been minimized.

Copy link
@cdecker

cdecker Jun 24, 2019

Author Collaborator

I think we should add a signal in the TLV (making it a low type such that it gets processed first)?

This comment has been minimized.

Copy link
@t-bast

t-bast Jun 24, 2019

Collaborator

That means that the onion processing layer needs to parse and interpret the TLV stream to look for a particular TLV type though, isn't this a layering violation? Whereas signaling via the realm byte might be a bit cleaner, don't you think?

@cdecker cdecker force-pushed the variable-onion branch from 13033b7 to e6dbdba Jun 24, 2019

@Roasbeef

This comment has been minimized.

Copy link
Member

commented Jun 24, 2019

an updated section on the new feature generation

Oops, meant filler generation.

As for lockin, that's pretty much guaranteed at this point, sadly.

I don't think we're locked in if we leave the existing realm in place. I think all the prior iterations we went through have allowed us to explore the level of extensibility the current "realm" field as to offer. Would be shame if we closed off all future paths prematurely. I think we can get the best of both worlds by keeping the realm in place, and just using 2 fixed bytes for a length. 65KB is far larger than the max payload size in the onion, it simplifies parsing, and also allows us to avoid unnecessarily adding a var int into the protocol.

@t-bast

This comment has been minimized.

Copy link
Collaborator

commented Jun 25, 2019

I think all design comments (realm vs no realm, termination signaling via TLV) have been addressed during yesterday's meeting right?

There are still a few typos/nits and a clarification on the TLVs' integer notation, but apart from that are we missing something? What are the next steps before merging this?

@Roasbeef @cfromknecht are you happy with the current state of the proposal? :)

04-onion-routing.md Outdated Show resolved Hide resolved
@cdecker

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 29, 2019

I think we can get the best of both worlds by keeping the realm in place, and just using 2 fixed bytes for a length. 65KB is far larger than the max payload size in the onion, it simplifies parsing, and also allows us to avoid unnecessarily adding a var int into the protocol.

While I think we can discuss about leaving the realm byte intact, I'd definitely want to use the varint in there, it's a tiny change and it saves us up to 20 bytes in total. I'd actually argue that for a vast majority of cases we'll have payload lengths < 253 which results so we'd rarely ever use the 2 byte length prefix at all.

@cdecker

This comment has been minimized.

Copy link
Collaborator Author

commented Jun 29, 2019

an updated section on the new feature generation

Oops, meant filler generation.

The filler generation literally doesn't change, we just generate a bit more since we might be shifting a bit more.

|------|-------------------|-------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| Bits | Name | Description | Link |
|------|-------------------|-------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------|
| 0/1 | `var_onion_optin` | This node requires/supports variable payload routing onions | [routing onion specification](https://github.com/lightningnetwork/lightning-rfc/blob/master/04-onion-routing.md) |

This comment has been minimized.

Copy link
@t-bast

t-bast Jul 2, 2019

Collaborator

If we want to be able to combine those with the local features (such as in #571) it would be better to use bits that aren't currently assigned (instead of 0/1).
I'll let you sync with @rustyrussell to see what bit values would make sense: many are tentatively assigned in #605 but I think var_onion_optin should replace one of those (since it's coming first). We probably need to update the list in #605.


1. tlv: `tlv_payload`
2. types:
1. type: 0 (`destination_signal`)

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jul 3, 2019

Member

So we have both a signal here and also a final set of zero bytes in the HMAC? I'm not sure we need another signal on this level, given that in the underlying Sphinx format, the zero'd HMAC is used as a tag for the final destination. This would also create two ways that packet processors are required to know of in order to determine if this is the final hop or not. This framing layer sits above the current onion layer IMO, meaning we're crossing layers here in order to detect a terminal point in the route.

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jul 3, 2019

Member

Or will the HMAC be left in place, and this is just for the higher level TLV parser?

This comment has been minimized.

Copy link
@t-bast

t-bast Jul 4, 2019

Collaborator

I currently kept both conditions in my implementation. If either the next packet's hmac is 0x00 or if the first byte of the TLV stream is 0x00 it means we are the last hop.
Instead of using TLV code inside the onion processor, I translated this condition as the first byte after the varint length of the payload is 0x00 (since TLV streams are ordered by TLV type values and this one as type 0x00).
I don't have a strong opinion on what we should do here because it's not necessarily the onion processor layer that need to figure out that we are the destination, it might make more sense that it's the layer that interprets the payload that should figure out that we are the destination.

This comment has been minimized.

Copy link
@rustyrussell

rustyrussell Jul 22, 2019

Collaborator

If we leave the HMAC, we should continue to use it to signal termination.

Half-parsing the TLV this way is just nasty, as is the alternative of using type 0xFFFFFFFFFFFFFFFF to signal the HMAC (which, of course would always put it at the end), which loses more bytes across any normal route than skipping the terminal HMAC would!

* Empty
1. type: 2 (`amt_to_forward`)
2. data:
* [`integer`:`amt_to_forward`]

This comment has been minimized.

Copy link
@Roasbeef

Roasbeef Jul 3, 2019

Member

Still not clear what integer is from strictly reading this document. I think using varint here makes it more clear when combined with #622.

This comment has been minimized.

Copy link
@rustyrussell

rustyrussell Jul 22, 2019

Collaborator

Should be tu64, agreed.

@cdecker cdecker force-pushed the variable-onion branch from e068134 to 157a708 Jul 22, 2019

@cdecker

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 22, 2019

Rebased and squashed for the IRC meeting today.

@t-bast

t-bast approved these changes Jul 22, 2019

Copy link
Collaborator

left a comment

@Roasbeef
Copy link
Member

left a comment

LGTM 🦞

Verified against our implementation using the new BigSize varint. I think this is ready to land after a final rebase!

@rustyrussell

This comment has been minimized.

Copy link
Collaborator

commented Jul 24, 2019

I think the 0 type TLV marker still has to be removed? I see it's already done, thanks!

http://www.erisian.com.au/meetbot/lightning-dev/2019/lightning-dev.2019-07-22-20.04.html

@cfromknecht
Copy link
Collaborator

left a comment

🍾 LGTM 🍾

@t-bast

t-bast approved these changes Jul 24, 2019

Copy link
Collaborator

left a comment

Congrats all and thanks @cdecker for driving this PR all the way ;)

Addressed

cdecker added some commits Mar 28, 2019

bolt04: Formatting cleanup and fold clarifications into conventions
The clarifications were tacked on after the fact, but they should really be
part of the conventions. I also updated the links to use the reference style,
which results in better text flow and makes it easier to read the source.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
bolt04: Describe the variable size `hop_payload`
Signed-off-by: Christian Decker <decker.christian@gmail.com>
bolt04: Amend the filler generation and onion decoding to varpayload
This actually introduces the variable size shift and filler generation.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
bolt04: Remove in-spec test vector in favor of JSON test vector
Signed-off-by: Christian Decker <decker.christian@gmail.com>
bolt04: Add the TLV types for the new payload format
Signed-off-by: Christian Decker <decker.christian@gmail.com>
bolt04: Introduce the destination_signal to the tlv_payload
As discussed during the spec meeting this allows us not to use the 32 byte
HMAC to identify the last hop, and use a 2-byte signal instead.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
bolt09: Cleanup the feature bits tables and extract URLs into footer
The tables were a bit unreadable in the source view, and the globalfeatures
table was not rendering correctly. This is just a minor cleanup pass.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
Suggested-by: Janus Troelsen <@ysangkok>
bolt04: Remove TLV based termination signal
As discussed during the IRC meeting on 2019-07-22 this would have been a
duplication of signals. It was decided to use one for now, with the option of
coming back should we ever need the last 32 bytes of the onion.

@cdecker cdecker force-pushed the variable-onion branch from b06386f to 437d79b Jul 25, 2019

@cdecker

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 25, 2019

Rebased ^^

Here's the range-diff, to make it easier to look for changes during the rebase:

$ git range-diff $(git merge-base master origin/variable-onion)..origin/variable-onion master..variable-onion
 1:  4f4c3c3 =  1:  a93cc7f bolt04: Formatting cleanup and fold clarifications into conventions
 2:  dc7c2fc !  2:  29e7be9 bolt04: Describe the variable size `hop_payload`
    @@ -117,6 +117,6 @@
      When building the route, the origin node MUST use a payload for
      the final node with the following values:
     +
    - * `outgoing_cltv_value`: set to the final expiry specified by the recipient
    - * `amt_to_forward`: set to the final amount specified by the recipient
    - 
    + * `outgoing_cltv_value`: set to the final expiry specified by the recipient (e.g.
    +   `min_final_cltv_expiry` from a [BOLT #11](11-payment-encoding.md) payment invoice)
    + * `amt_to_forward`: set to the final amount specified by the recipient (e.g. `amount`
 3:  62d8d7b =  3:  2bff8e5 bolt04: Amend the filler generation and onion decoding to varpayload
 4:  cf0527b =  4:  3be749c bolt04: Remove in-spec test vector in favor of JSON test vector
 5:  be5add0 =  5:  74093fa bolt04: Add the TLV types for the new payload format
 6:  d0862a5 !  6:  6b2144c bolt04: Shut the spellchecker up
    @@ -14,12 +14,11 @@
      verifier
      verifiers
     @@
    - structs
    - CompactSize
    - encodings
    + BigSize
    + namespaces
    + tlvs
     +fips
     +rfc
     +varint
     +CompactSize
     +tlvs
    - \ No newline at end of file
 7:  b70ae1f =  7:  d3bb7f2 bolt07: Add feature bits for multi-frame support
 8:  82a73d2 =  8:  88e476f bolt04: Introduce the destination_signal to the tlv_payload
 9:  157a708 =  9:  35edd4e bolt09: Cleanup the feature bits tables and extract URLs into footer
10:  b06386f = 10:  437d79b bolt04: Remove TLV based termination signal
@t-bast

t-bast approved these changes Jul 25, 2019

Copy link
Collaborator

left a comment

ACK 437d79b

@rustyrussell

This comment has been minimized.

Copy link
Collaborator

commented Jul 26, 2019

Ack 437d79b

@cdecker please apply. I'd do it, but you deserve the honor :)

@cdecker cdecker merged commit d23f4b0 into master Jul 26, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
@cdecker

This comment has been minimized.

Copy link
Collaborator Author

commented Jul 26, 2019

Congrats everyone, this has been a long time in the making, but it got substantially better through the extensive feedback 👍 🎆

cdecker added a commit to cdecker/lightning that referenced this pull request Jul 26, 2019

sphinx: Switch to big-endian number encoding
See lightningnetwork/lightning-rfc#619 and
lightningnetwork/lightning-rfc#619 for discussion.

Signed-off-by: Christian Decker <decker.christian@gmail.com>

cdecker added a commit to cdecker/lightning that referenced this pull request Jul 26, 2019

sphinx: Switch to big-endian number encoding
See lightningnetwork/lightning-rfc#619 and
lightningnetwork/lightning-rfc#619 for discussion.

Signed-off-by: Christian Decker <decker.christian@gmail.com>

rustyrussell added a commit to ElementsProject/lightning that referenced this pull request Jul 30, 2019

sphinx: Switch to big-endian number encoding
See lightningnetwork/lightning-rfc#619 and
lightningnetwork/lightning-rfc#619 for discussion.

Signed-off-by: Christian Decker <decker.christian@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.