Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bolt 1: Specify that extensions to existing messages must use TLV #714

Merged
merged 5 commits into from Feb 28, 2020

Conversation

t-bast
Copy link
Collaborator

@t-bast t-bast commented Dec 9, 2019

The spec already prepared a hook to add additional information to existing
messages (additional bytes at the end of a message must be ignored).

Since we're using TLV in many places, it would make sense to use that optional
additional space at the end of each message to allow an optional tlv stream.

We keep the "additional bytes must be ignored" rule for what happens after that
tlv stream, giving us extra flexibility to add another type of optional content
to existing messages later if needed.

This was already proposed in #630 but rejected. I found additional reasons to
specify this now rather than on a per-message basis. We don't want to have two
competing and incompatible ways of extending existing messages.

Imagine node A and node B are peers, both running experiments by leveraging the
"ignore additional data" rule. However node A uses a TLV stream (as proposed in
this PR) while node B uses a different encoding. Unluckily node B also started his
encoding with something that can be interpreted as a varint, so node A interprets
this as a TLV stream, but it decodes as a two-bytes even number (which has a 50%
chance of happening). She (validly) thinks it's an even TLV she doesn't understand,
so she fails the connection and closes the channels.

This is a major inconvenience when experimenting with message extensions. I believe
what this PR proposes is reasonable and prevents these competing encodings to mess
up experimentations.

@t-bast t-bast requested review from cfromknecht, niftynei and pm47 Dec 9, 2019
@t-bast t-bast added the Meeting Discussion label Dec 9, 2019
@t-bast
Copy link
Collaborator Author

@t-bast t-bast commented Dec 9, 2019

Open question: should the TLV stream be prefixed with its length or not?

It seems like the gossip_queries tlv streams today don't prefix with the length (length is inferred from the overall message length).

The inconvenient when we don't prefix with the length is that we completely remove the possibility of having another bag of uninterpreted bytes at the end of the message (if we decide some day that we want to extend messages with something else than TLV). Maybe we don't care, maybe we do :)

I'm leaning towards removing the tlv stream length prefix: in that case this PR makes even more sense to do now, because we don't want to have some messages be extended with a TLV stream and others extended differently, that would be a big mess. WDYT @rustyrussell @cfromknecht @sstone ?

UPDATED: removed the length prefix in 6117c30.

@cdecker
Copy link
Collaborator

@cdecker cdecker commented Dec 9, 2019

If we limit message extensions to be TLV-based we should definitely drop the length prefix.

I'd like to point out that there is a small caveat for gossip broadcast messages (node_announcement, channel_announcement and channel_update) which requires that they include the fields in the signature, which is the main reason we went for a canonical encoding for TLVs (no duplicate types and monotonically increasing type order). Just as a caveat to the ignore word that is being used in the PR.

Other than that small wording issue, I'm definitely in favor of consolidating towards TLVs.

@t-bast
Copy link
Collaborator Author

@t-bast t-bast commented Dec 10, 2019

I added two commits:

  • 6117c30 simply removes the length-prefix as agreed during the spec meeting
  • 274a965 makes mandatory some message optional fields that would otherwise make parsing the extension non-deterministic: the commit message contains additional details. @Roasbeef @cfromknecht let me know if making these fields mandatory seems ok to you (thanks for raising the issue during the meeting).

@t-bast
Copy link
Collaborator Author

@t-bast t-bast commented Jan 23, 2020

Do you think we should re-work upfront_shutdown_script to move it to a TLV record instead of the current optional encoding? That would simplify the migration to TLV for all messages.

It could be done in an interesting backwards-compatible way:

  • assign type 0x00 to upfront_shutdown_script
  • then for all scripts that have a length < 253 the TLV encoding will perfectly match the current encoding

@halseth
Copy link
Contributor

@halseth halseth commented Jan 30, 2020

Do you think we should re-work upfront_shutdown_script to move it to a TLV record instead of the current optional encoding? That would simplify the migration to TLV for all messages.

It could be done in an interesting backwards-compatible way:

  • assign type 0x00 to upfront_shutdown_script
  • then for all scripts that have a length < 253 the TLV encoding will perfectly match the current encoding

Sounds easiest just to make it a mandatoy field? (similar to what this PR specifies now) Why would doing this "hack" make the migration to TLV simpler?

Is upfront_shutdown the only optional field across all wire messages that interfers with the TLV stream?

@t-bast
Copy link
Collaborator Author

@t-bast t-bast commented Jan 30, 2020

Thanks for the comment @halseth.

Sounds easiest just to make it a mandatoy field? (similar to what this PR specifies now) Why would doing this "hack" make the migration to TLV simpler?

If we change the format of open_channel to be (and do a similar thing for accept_channel):

  1. type: 32 (open_channel)
  2. data:
  • [chain_hash:chain_hash]
  • ...
  • [byte:channel_flags]
  • [open_tlvs:tlvs]
  1. tlvs: open_tlvs
  2. types:
    1. type: 0 (upfront_shutdown_script)
    2. data:
      • [...*byte:script]

Then we have a smooth upgrade path and we clean-up the spec by moving this field inside a TLV stream, because it will result in the exact same encoding as today. Since valid scripts have a length < 253, they will be encoded as either 0x0000 (when empty) or 0x00021234 (for a length 2 script 0x1234). Because of TLV streams canonical encoding, this will be the first record to be encoded. This allows us to either include or omit the upfront_shutdown_script field (whereas simply making it mandatory means we'd always be including it, even if it's an empty one 0x0000 which is a bit ugly). And it allows upgraded node to correctly decode data from non-upgraded nodes (in both cases where the field is present or absent) and encode data that non-upgraded nodes will understand (because they should ignore unknown trailing bytes).

Does that make sense? The more I think about it, the less hackish it seems. I think it nicely cleans up that field that otherwise feels a bit out of place, being optional but not a TLV.

Is upfront_shutdown the only optional field across all wire messages that interfers with the TLV stream?

It's the only one that's a bit painful to handle. See 274a965 for details (especially the commit message). The two other optional fields are quite easy to simply make mandatory or not touch.

t-bast added 5 commits Feb 4, 2020
The spec already prepared a hook to add additional information to existing
messages (additional bytes at the end of a message must be ignored).

Since we're using TLV in many places, it would make sense to use that optional
additional space at the end of each message to allow an optional tlv stream.

We keep the "additional bytes must be ignored" rule for what happens after that
tlv stream, giving us extra flexibility to add another type of optional content
to existing messages later if needed.
Clarify that extension bytes are included in signatures.
This applies to:

- option_upfront_shutdown_script: if you're not interested, just set the length to 0
- channel_reestablish commitment points: it makes sense to always include those regardless of whether `option_dataloss_protect` or `option_static_remotekey` are set

No need to change the `channel_update`'s `htlc_maximum_msat` because the `message_flags` encode its presence/absence.
It can still be either included or omitted without causing issues to the extension stream.
@TheBlueMatt made a point that always parsing a TLV stream
even when you don't expect anything is cumbersome.
If you don't expect any extension, you may ignore the tlv stream.

@rustyrussell noticed an unnecessary requirement now that
TLV stream isn't length-prefixed.
This makes more sense for future work.
It gives us more room to change the encoding or remove it.
@t-bast t-bast force-pushed the b01-message-extension branch from 274a965 to db31eaf Compare Feb 4, 2020
@t-bast
Copy link
Collaborator Author

@t-bast t-bast commented Feb 4, 2020

I made upfront_shutdown_script a TLV record in commit db31eaf.

If you all prefer, I can instead simply make the field mandatory, but please do consider the TLV option.
Having it as a TLV record allows us to later remove the sender requirement of always encoding an upfront_shutdown_script, or to change the preferred way of encoding it.
Having it as a mandatory field outside of the TLV stream means we can never get rid of it in the future.

Here are all the cases listed to show that backwards-compatibility fully works (but honestly you can easily convince yourself without going through all examples).

With option_upfront_shutdown_script set by both nodes

Legacy node including a shutdown_script

For example a P2WSH: 002011418a2d282a40461966e4f578e1fdf633ad15c1b7fb3e771d14361127233be1.
The encoded trailing bytes (after channel_flags) are 0022002011418a2d282a40461966e4f578e1fdf633ad15c1b7fb3e771d14361127233be1.

Updated nodes interpret that as:

  • 00: TLV type (upfront_shutdown_script)
  • 22: length
  • 002011418a2d282a40461966e4f578e1fdf633ad15c1b7fb3e771d14361127233be1: shutdown_scriptpubkey

Which works with the TLV definition.

Legacy node including an empty shutdown_script

The encoded trailing bytes (after channel_flags) are 0000.

Updated nodes interpret that as:

  • 00: TLV type (upfront_shutdown_script)
  • 00: length
  • ``: shutdown_scriptpubkey

Which works with the TLV definition.

Updated node including an empty upfront_shutdown_script TLV

The encoded trailing bytes (after channel_flags) are 0000.

Legacy nodes interpret that as:

  • 0000: shutdown_len

Which works with the legacy definition.

Updated node including an upfront_shutdown_script TLV

For example a P2WSH: 002011418a2d282a40461966e4f578e1fdf633ad15c1b7fb3e771d14361127233be1.
The encoded trailing bytes (after channel_flags) are 0022002011418a2d282a40461966e4f578e1fdf633ad15c1b7fb3e771d14361127233be1.

Legacy nodes interpret that as:

  • 0022: shutdown_len
  • 002011418a2d282a40461966e4f578e1fdf633ad15c1b7fb3e771d14361127233be1: shutdown_scriptpubkey

Which works with the legacy definition.

Updated node including additional TLVs

For example with an empty upfront_shutdown_script, but would work the same with a non-empty one.
The encoded trailing bytes (after channel_flags) are 0000 01012a (TLV record 1).

Legacy nodes interpret that as:

  • 0000: shutdown_len
  • 01012a: ignored trailing bytes (Bolt1: MUST ignore any additional data within a message beyond the length that it expects for that type)

Which works with the legacy definition.

With option_upfront_shutdown_script not set by at least one of the nodes

Legacy node including a shutdown_script

Works exactly the same as above.

Legacy node including an empty shutdown_script

Works exactly the same as above.

Legacy node not including shutdown_script

The encoded message stops after channel_flags.
Updated nodes interpret that as no open_channel_tlvs which is fine.

Updated node not including upfront_shutdown_script

The encoded message stops after channel_flags.
Legacy nodes interpret that as no shutdown_scriptpubkey which is fine.

Updated node including an empty upfront_shutdown_script TLV

Works exactly the same as above.

Updated node including an upfront_shutdown_script TLV

Works exactly the same as above.

Updated node including additional TLVs

Works exactly the same as above.

@t-bast
Copy link
Collaborator Author

@t-bast t-bast commented Feb 26, 2020

Last call for reviewers! As agreed during the previous meeting (where @rustyrussell and @Roasbeef ack-ed), I'll merge this PR probably tomorrow if no-one complains :)

I've implemented this in eclair and tested compatibility with lnd v0.9.1-beta.rc1 and c-lightning v0.8.1.

@t-bast t-bast merged commit 6ac177f into master Feb 28, 2020
2 checks passed
@t-bast t-bast deleted the b01-message-extension branch Feb 28, 2020
t-bast added a commit to ACINQ/eclair that referenced this issue Feb 28, 2020
Make DLP data mandatory in ChannelReestablish.
We make them mandatory to allow extending the message with TLVs.

Make upfront_shutdown_script a TLV record that we always include in
open_channel / accept_channel.

See lightning/bolts#714.
@Roasbeef
Copy link
Collaborator

@Roasbeef Roasbeef commented Feb 28, 2020

There weren't any explicit approvals on this :/

The general idea was acknowledged as being likely feasible (the backwards compat), but I (dunno about Rusty) haven't had a chance to actually closely review the changes.

@t-bast
Copy link
Collaborator Author

@t-bast t-bast commented Feb 29, 2020

I must admit I'm a bit surprised there...
Here is what was said at the meeting (ommitting the part where we discussed tlv namespace being per-message or global, which is orthogonal to this PR):

19:26:56 <roasbeef> all follows for me, and as decker said above we can only really do this once, so might as well do it early so we can enjoy the new bountiful world of tlv (actually optional fields)
...
19:27:37 <rusty> It's an ack from me, too.
...
19:31:54 <t-bast> great, so sounds like we have an ACK on this PR and can move the next topic, thanks for the review
19:32:17 <t-bast> #action t-bast give a few days for other people to potentially chime in on Github, then merge 714

And then the tracking issue (#735) was updated with the meeting actions, one of which was to merge this PR after a few days if no new reviewer objected to it.

If I had misunderstood your ACK, why didn't you say so when I created an action item to merge the PR? IRC isn't a perfect medium of communication so I thought that meant you had reviewed the change, which is my fault for interpreting this incorrectly, but you could have corrected this right when I said I'd merge the PR, or anytime after the meeting on the tracking issue?

We'd been discussing this PR for the last 3 meetings. The latest releases of lnd, c-lightning and eclair have already all implemented and shipped that behavior (sending a 0x0000 upfront_shutdown_script regardless of feature bits). I'm sorry if I misunderstood and merged faster than you expected, I can revert that change if you think it's not ready to be merged. Or I can do a follow-up PR to fix comments you may have. Let me know what you think is best to resolve this?

@t-bast
Copy link
Collaborator Author

@t-bast t-bast commented Feb 29, 2020

I reverted the commit for now. Will re-open this PR on monday and we'll take it from there.

@t-bast t-bast restored the b01-message-extension branch Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meeting Discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants