TLV failure message and length relaxation #1021

joostjager · 2022-09-02T09:30:07Z

Extend the failure message specification with a tlv stream in line with the onion tlv payload and p2p messages.

t-bast · 2022-09-02T14:10:00Z

Concept ACK. Note that this doesn't provide a lot of bytes for the tlv stream though, since we're limited to 256 bytes for the failure payload, but it's better than nothing.

An important point to note is that the tlv stream mustn't be length-prefixed (same as our tlv extensions at the end of each lightning message). It's worth adding a test vector to make sure people get that right.

joostjager · 2022-09-12T14:52:12Z

Yes, I can add a test vector. But I think I'll do it just for the failure message bytes excluding padding and encryption.

04-onion-routing.md

joostjager · 2022-09-12T18:19:09Z

Added test vector to ensure that implementers won't prepend the tlv stream with a length.

t-bast · 2022-09-12T19:39:14Z

IMHO it would really be more helpful to future readers to update the existing e2e test vector...
I can take a stab at it when implementing this change for eclair.

joostjager · 2022-09-13T10:45:34Z

Discussed in spec meeting: pair this change with an increase in the recommended failure message size. Any thoughts on the new size? I was thinking 1024, to bring it roughly in line with the forward pass payload?

t-bast · 2022-09-13T11:48:42Z

1024 sounds good to me since we don't have a real use-case yet, as long as all implementations change their behavior to handle potentially longer messages in the future (this way we can change the spec recommendation later if a use-case comes up that requires a bigger length without having to update intermediate nodes).

joostjager · 2022-09-13T12:03:14Z

So just to get it right, you're saying we don't change the recommendation yet? If that's indeed the intention, we have to be aware that a later increase allows for fingerprinting of the destination node. Perhaps this needs an activation date... All implementations switch to 1024 byte failure messages on Jan 1, 2023 :)

t-bast · 2022-09-13T12:07:17Z

So just to get it right, you're saying we don't change the recommendation yet?

No, I think it's a good idea to change the recommendation to something bigger, and 1024 sounds good. I'm just saying we can potentially go much bigger (we're only restricted by the whole message being 65kB), but since we don't know yet how we're going to use it, a first step to 1024 is probably a good idea.

But we should also change the spec to say that forwarding nodes should be able to handle messages with a different length than the recommended one, to give us freedom to change it in the future without requiring a network-wide upgrade. Does that make sense?

joostjager · 2022-09-13T12:10:34Z

Yes. Will also check to see what lnd does and does not support currently.

joostjager · 2022-09-13T14:20:21Z

It seems that lnd forwards the failure message fine. The only problem is that the payment sender isn't able to handle any other length than 256. So if we'd add a long but optional tlv extension to the failure message today, senders running lnd would interpret that as an unreadable failure message with all the undesired consequences for reputation tracking.

The fix lightningnetwork/lnd#6913 is small, but critical to unblock the usage of longer tlv failures.

joostjager · 2022-09-14T07:40:35Z

Updated PR with 1024 byte tlv failure test vector.

For my own reference: patch used to generate vector lightningnetwork/lnd@master...bottlepay:lnd:failure-message-test-vector and lightningnetwork/lightning-onion@master...joostjager:lightning-onion:failure-message-test-vector

Roasbeef · 2022-09-26T20:01:57Z

04-onion-routing.md

@@ -823,9 +823,8 @@ channel.
 ### Requirements

 The _erring node_:
-  - SHOULD set `pad` such that the `failure_len` plus `pad_len` is equal to 256.


Thinking about this change a bit more: I think we might need either a new feature bit or a new "empty" TLV key in the onion for the switch over to work properly. Without some sort of signal, then a destination or intermediate node doesn't know if the sender can actually handle/decrypt errors larger than 256 bytes. This is effectively hard coded in the spec and most implementations, so just changing it here without any additional context-specific signalling won't allow for a smooth switchover.

As an example, on the lnd side we may end up black listing a node for sending a larger errors (we have a hard check, and would discard it as internal network funny business or w/e). Instead, if we had the sender signal in the onion to each node (intermediate and also final) that they understand larger errors, then the error'ing node will know if they can attach additional context or not.

Yes, good point. The empty TLV key option looks to be the smallest in size. But if we invent a new failure message format in the future that for example fixes error attribution, we'd need another empty TLV key, or use the original TLV with a non-empty value. In the latter case, we need to specify it so that no implementation will trip over a non-empty value from the start.

Perhaps defining a TLV record for sender feature bits from the get go is the cleanest solution. Trade-off is obviously the added size, which I think is one byte extra per hop compared to an empty tlv. This assumes we start fresh with the assignment of failure message feature bits, to not exceed one byte initially.

Feedback from the spec meeting on this: since we don't have a short-term use-case for longer error messages, we can take the slow approach that requires less changes and no signaling:

Implementations add support now for reading failure messages of arbitrary length

In 1 year or so, we can consider that senders do support it and may start writing failure messages of arbitrary length

If a specific feature requires arbitrary length messages before that, the usage of that feature is an implicit signal that the sender knows how to read failure messages of arbitrary length, so it's ok to generate them

I think there is a short term use case: user defined extensions to the incorrect_details failure. For example for hodl invoices that are cancelled, more details could be added as to why it was cancelled. Or (just brainstorming now) return a "redirect" invoice in a tlv field.

Given the right extension points in a node implementation (fail hodl invoice with reason and expose failure tlv to user of the sending node), users could add and read their own data and this may exceed the current 256 byte limit. I don't think 3) applies to this scenario? LND's htlc interceptor API also allows users to return custom failure messages today, so one receiver-side extension point already exists.

You could argue that it is not only up to the lightning devs to decide that there is no current use case.

Regarding 2): Is it safe to assume that almost all senders support the feature after one year? I don't think there is a direct indication of the proportion of nodes that have upgraded, and this isn't a feature bit that is part of the node announcement either? It could be part of it, but it doesn't seem to be useful because nodes in a path don't know the key of the origin node anyway.

For inbound fees, we probably need this too. See lightningnetwork/lnd#6967, commit "signal failure message format".

The specification recommends using a length of 256 for onion errors, but it doesn't say that we should reject errors that use a different length. We may want to start creating errors with a bigger length than 256 if we need to transmit more data to the sender. In order to prepare for this, we keep creating 256-bytes onion errors, but allow receiving errors of arbitrary length. See the specification here: lightning/bolts#1021 Fixes #2438

TheBlueMatt · 2022-10-31T19:45:06Z

Right, that's what I meant by "no failure reason", sorry it was unclear. My point is that if you don't upgrade for two years suffering from degraded route selection performance seems like a perfectly fine outcome.

joostjager · 2022-10-31T20:06:36Z

Maybe inbound routing fees become a success, and then we need to wait for two years before we can return a complete fee_insufficient failure that exceeds 256 bytes. Or it does not become a success, exactly because the required failure cannot be returned. No fee update failure isn't ideal given the quirks that we sometimes see with gossip propagation.

Of course everyone can have their own opinion about the viability of protocol extensions, the value of experimentation and whether that's worth the trade-off with privacy.

TheBlueMatt · 2022-10-31T23:29:41Z

I'm kinda confused why a fee_insufficient failure is > 256 bytes? There's a ton of headroom above existing error messages to play with.

joostjager · 2022-11-01T08:57:04Z

If the fee is insufficient, it could be that the sender either used an outdated inbound fee policy, an outdated outbound fee policy or both. The routing node only sees a total fee paid, and doesn't know which one of the cases applies.

To cover all cases and ensure that the next attempt is successful, the routing node would need to return both the incoming channel update and the outgoing channel update with the fee_insufficient failure. 256 bytes is not enough for two channel updates.

Of course signed fee information can be returned in a more compact way, but that would deviate from the standard channel update format. I expect that to lead to lots of extra work and maintenance. The graph database for example would also need to support this new compact format.

t-bast · 2022-11-02T07:28:52Z

Of course signed fee information can be returned in a more compact way, but that would deviate from the standard channel update format. I expect that to lead to lots of extra work and maintenance. The graph database for example would also need to support this new compact format.

That sounds like the way forward if you want to do experimentation though? You really don't need to return complete channel updates, and returning only the values you're interested in lets you experiment today, without changes from any implementation nor the spec required.

joostjager · 2022-11-02T10:13:51Z

Yes, could do that. But the quicker way to go forward with the experiment is to just signal using a custom onion payload tlv record. Long failures are already relayed back, so no implementation changes required. And for the rest it's just between sender and routing node.

I thought that it would be nice to make that signaling official if we see a future need for long failure messages anyway. But if not, I'll just go ahead and do my own thing.

In that case, I can remove the increase of the recommended failure length to 1024 from this PR and note that longer failure messages are a risk because senders may not be able to parse.

The specification recommends using a length of 256 for onion errors, but it doesn't say that we should reject errors that use a different length. We may want to start creating errors with a bigger length than 256 if we need to transmit more data to the sender. In order to prepare for this, we keep creating 256-bytes onion errors, but allow receiving errors of arbitrary length. See the specification here: lightning/bolts#1021 Fixes #2438

joostjager · 2022-11-21T11:41:36Z

Removed increase of the recommended message length to 1024 from PR and added comment about old nodes not being able to interpret long messages.

Extend every onion failure with an optional tlv stream. See the specification here: lightning/bolts#1021

t-bast

ACK 1a48cdd

The test vectors match with eclair (see ACINQ/eclair#2455).

halseth

ACK 1a48cdd

Roasbeef

LGTM 🛹

lnd v0.16.0 is scheduled to ship with the relaxed behavior for the sender (the one decrypting the final payload).

t-bast · 2022-12-05T07:51:43Z

@joostjager do you have a branch of lnd where I can force a node to emit an error bigger than 256 bytes that contains dummy tlvs to test cross-compatibility?

joostjager · 2022-12-05T10:19:42Z

If you run an lnd routing node with lightningnetwork/lnd#6967 and the sender does not include enough fee, you will get back a long failure message with two channel updates. The sender does need to signal long failure compatibility through a tlv record though.

If this is all a bit too much for the purpose of interop testing, I can prep a simpler branch with a long failure for you.

t-bast · 2022-12-05T10:57:51Z

If this is all a bit too much for the purpose of interop testing, I can prep a simpler branch with a long failure for you.

Don't worry, this should work well enough for my tests. Can you simply update the description of lightningnetwork/lnd#6967 with the details of the tlv that needs to be set in the onion to opt-in to receive the long failure?

joostjager · 2022-12-05T11:38:19Z

Description updated.

Extend every onion failure with an optional tlv stream. See the specification here: lightning/bolts#1021

Extend every onion failure with an optional tlv stream. Added to the specification by: lightning/bolts#1021

t-bast mentioned this pull request Sep 8, 2022

Lightning Specification Meeting 2022/09/12 #1024

Closed

27 tasks

cdecker reviewed Sep 12, 2022

View reviewed changes

04-onion-routing.md Outdated Show resolved Hide resolved

joostjager force-pushed the failure-message-tlv branch 2 times, most recently from 3860351 to ff5b364 Compare September 12, 2022 18:17

joostjager force-pushed the failure-message-tlv branch 4 times, most recently from 82e35fd to 7baffed Compare September 14, 2022 07:36

joostjager force-pushed the failure-message-tlv branch from 7baffed to 954665d Compare September 14, 2022 09:08

joostjager changed the title ~~Onion failure message may be followed by a tlv stream~~ TLV failure message and recommended length to 1024 Sep 14, 2022

joostjager mentioned this pull request Sep 14, 2022

lnwire: increase onion failure message length to 1024 lightningnetwork/lnd#6916

Draft

joostjager force-pushed the failure-message-tlv branch 2 times, most recently from 51cc167 to 344440a Compare September 14, 2022 10:24

t-bast mentioned this pull request Sep 22, 2022

Lightning Specification Meeting 2022/09/26 #1028

Closed

27 tasks

Roasbeef requested changes Sep 26, 2022

View reviewed changes

t-bast mentioned this pull request Sep 27, 2022

Add support for onion errors larger than 256 bytes ACINQ/eclair#2438

Closed

joostjager mentioned this pull request Sep 27, 2022

htlcswitch: relax failure message length check lightningnetwork/lnd#6913

Merged

t-bast mentioned this pull request Sep 29, 2022

Add support for arbitrary length onion errors ACINQ/eclair#2441

Merged

t-bast mentioned this pull request Nov 2, 2022

Lightning Specification Meeting 2022/11/07 #1038

Closed

30 tasks

rustyrussell mentioned this pull request Nov 8, 2022

Handle non-standard onion reply size ElementsProject/lightning#5698

Merged

t-bast mentioned this pull request Nov 17, 2022

Lightning Specification Meeting 2022/11/21 #1041

Closed

28 tasks

BOLT 04: Update recommended failure message length and test vector

1a48cdd

joostjager force-pushed the failure-message-tlv branch from 720911a to 1a48cdd Compare November 21, 2022 11:40

t-bast added a commit to ACINQ/eclair that referenced this pull request Nov 22, 2022

Add tlv stream to onion failures

755ad64

Extend every onion failure with an optional tlv stream. See the specification here: lightning/bolts#1021

t-bast reviewed Nov 22, 2022

View reviewed changes

halseth approved these changes Nov 23, 2022

View reviewed changes

Roasbeef approved these changes Dec 2, 2022

View reviewed changes

t-bast mentioned this pull request Dec 5, 2022

Lightning Specification Meeting 2022/12/05 #1046

Closed

28 tasks

Roasbeef merged commit a0bbe47 into lightning:master Dec 5, 2022

joostjager changed the title ~~TLV failure message and recommended length to 1024~~ TLV failure message and length relaxation Dec 14, 2022

t-bast mentioned this pull request Dec 15, 2022

Lightning Specification Meeting 2022/12/19 #1047

Closed

27 tasks

t-bast added a commit to ACINQ/eclair that referenced this pull request Dec 16, 2022

Add tlv stream to onion failures

9167012

Extend every onion failure with an optional tlv stream. See the specification here: lightning/bolts#1021

t-bast mentioned this pull request Dec 30, 2022

Lightning Specification Meeting 2023/01/02 #1048

Closed

27 tasks

t-bast added a commit to ACINQ/eclair that referenced this pull request Jan 3, 2023

Add tlv stream to onion failures

f67f23c

Extend every onion failure with an optional tlv stream. See the specification here: lightning/bolts#1021

t-bast added a commit to ACINQ/eclair that referenced this pull request Jan 6, 2023

Add tlv stream to onion failures (#2455)

c9c5638

Extend every onion failure with an optional tlv stream. Added to the specification by: lightning/bolts#1021

joostjager mentioned this pull request Jan 18, 2023

draft: Upfront Fees to Mitigate Channel Jamming #1052

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TLV failure message and length relaxation #1021

TLV failure message and length relaxation #1021

joostjager commented Sep 2, 2022

t-bast commented Sep 2, 2022

joostjager commented Sep 12, 2022

joostjager commented Sep 12, 2022

t-bast commented Sep 12, 2022

joostjager commented Sep 13, 2022

t-bast commented Sep 13, 2022

joostjager commented Sep 13, 2022

t-bast commented Sep 13, 2022 •

edited

Loading

joostjager commented Sep 13, 2022

joostjager commented Sep 13, 2022

joostjager commented Sep 14, 2022 •

edited

Loading

Roasbeef Sep 26, 2022

joostjager Sep 27, 2022

t-bast Sep 27, 2022

joostjager Sep 27, 2022 •

edited

Loading

joostjager Oct 6, 2022

TheBlueMatt commented Oct 31, 2022

joostjager commented Oct 31, 2022

TheBlueMatt commented Oct 31, 2022

joostjager commented Nov 1, 2022

t-bast commented Nov 2, 2022

joostjager commented Nov 2, 2022 •

edited

Loading

joostjager commented Nov 21, 2022

t-bast left a comment

halseth left a comment

Roasbeef left a comment

t-bast commented Dec 5, 2022

joostjager commented Dec 5, 2022

t-bast commented Dec 5, 2022 •

edited

Loading

joostjager commented Dec 5, 2022

TLV failure message and length relaxation #1021

TLV failure message and length relaxation #1021

Conversation

joostjager commented Sep 2, 2022

t-bast commented Sep 2, 2022

joostjager commented Sep 12, 2022

joostjager commented Sep 12, 2022

t-bast commented Sep 12, 2022

joostjager commented Sep 13, 2022

t-bast commented Sep 13, 2022

joostjager commented Sep 13, 2022

t-bast commented Sep 13, 2022 • edited Loading

joostjager commented Sep 13, 2022

joostjager commented Sep 13, 2022

joostjager commented Sep 14, 2022 • edited Loading

Roasbeef Sep 26, 2022

Choose a reason for hiding this comment

joostjager Sep 27, 2022

Choose a reason for hiding this comment

t-bast Sep 27, 2022

Choose a reason for hiding this comment

joostjager Sep 27, 2022 • edited Loading

Choose a reason for hiding this comment

joostjager Oct 6, 2022

Choose a reason for hiding this comment

TheBlueMatt commented Oct 31, 2022

joostjager commented Oct 31, 2022

TheBlueMatt commented Oct 31, 2022

joostjager commented Nov 1, 2022

t-bast commented Nov 2, 2022

joostjager commented Nov 2, 2022 • edited Loading

joostjager commented Nov 21, 2022

t-bast left a comment

Choose a reason for hiding this comment

halseth left a comment

Choose a reason for hiding this comment

Roasbeef left a comment

Choose a reason for hiding this comment

t-bast commented Dec 5, 2022

joostjager commented Dec 5, 2022

t-bast commented Dec 5, 2022 • edited Loading

joostjager commented Dec 5, 2022

t-bast commented Sep 13, 2022 •

edited

Loading

joostjager commented Sep 14, 2022 •

edited

Loading

joostjager Sep 27, 2022 •

edited

Loading

joostjager commented Nov 2, 2022 •

edited

Loading

t-bast commented Dec 5, 2022 •

edited

Loading