Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

htlcswitch: relax failure message length check #6913

Merged
merged 5 commits into from
Dec 2, 2022

Conversation

joostjager
Copy link
Contributor

The spec does not dictate but only recommends a length of 256 bytes. Future tlv extensions may push the failure message length over this limit.

With this change, receivers can ignore the lengthier extensions without handling it as an unreadable failure. Before long tlv extensions can be used, a sufficiently large number of senders needs to be upgraded. Therefore the sooner this PR is released, the better.

Depends lightningnetwork/lightning-onion#59

@joostjager
Copy link
Contributor Author

joostjager commented Sep 14, 2022

Is there any chance that this can be part of v0.15.2? Strictly speaking lnd is in a vulnerable position. If other implementations would decide to deviate from the recommended failure message size, senders using lnd will see their mission control messed up because of all the penalties applied for unreadable failures.

@joostjager
Copy link
Contributor Author

Need to decide on how to signal sender capabilities as per lightning/bolts#1021 (comment)

@joostjager
Copy link
Contributor Author

joostjager commented Sep 27, 2022

This PR could be merged regardless of the signaling mechanism though. On its own, I don't think it does any harm and just makes lnd spec compliant.

The same fix in Eclair: ACINQ/eclair#2438

@@ -1235,10 +1235,6 @@ func DecodeFailure(r io.Reader, pver uint32) (FailureMessage, error) {
if err := ReadElement(r, &failureLength); err != nil {
return nil, fmt.Errorf("unable to read error len: %v", err)
}
if failureLength > FailureMessageLength {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this constant in use anywhere else? If so, we should check whether their use must be changed. If not, maybe it can be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, there is a lot. Need to take a look at that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fixed except for the length check here. See #6913 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think perhaps a less disruptive way to become closer to spec compliant is instead of removing this check just require the length to be at least FailureMessageLength.

We are already enforcing exactly FailureMessageLength in lightning_onion and it seems like we cannot easily just remove the check altogether without opening up a can of works wrt those converted errors.

If we instead start allowing larger errors than before, we can support the larger errors as we want (and that the spec says we should support), and still we can distinguish the malformed errors by their length, as no valid error message could be FailureMessageLength +4 including the HMAC.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a nice intermediate step. One thing to consider is that if we increase the recommended failure length to 1024 bytes eventually, we need to do something extra to also extend converted malformed failures to that length.

Perhaps when we detect a malformed failure, clip off the padding again, and re-add a longer padding block? Or store the unencrypted failures without padding at all, so that they can be padded right before returning them to the switch.

Copy link
Contributor Author

@joostjager joostjager Oct 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we stick to lengths >= 256, I don't think we need to make any further changes to this PR. Every failure reason != 260 is interpreted as a regular failure already and this would also work for longer reasons.

@joostjager
Copy link
Contributor Author

joostjager commented Oct 12, 2022

Short cut coming back at us: #3027 (comment)

Lightning Labs (@Crypt-iQ perhaps?) - how do we get rid of this?

@joostjager
Copy link
Contributor Author

joostjager commented Oct 13, 2022

I've created two draft PRs that show different ways of getting variable length failure messages though the channel state machine:

The easy alternative is to just allow longer failure message and no shorter ones: #6913 (comment). PR is implementing this currently.

Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

docs/release-notes/release-notes-0.16.0.md Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
@guggero guggero removed their request for review October 20, 2022 15:47
@joostjager joostjager force-pushed the relax-failure-len-check branch 3 times, most recently from 3fbb6f3 to 1828262 Compare November 3, 2022 13:19
lnwire/onion_error.go Outdated Show resolved Hide resolved
htlcswitch/link.go Show resolved Hide resolved
@joostjager joostjager force-pushed the relax-failure-len-check branch 2 times, most recently from 0db93ae to 09751ea Compare November 9, 2022 15:37
Copy link
Collaborator

@Crypt-iQ Crypt-iQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm once last test-related comment addressed

htlcswitch/link_test.go Show resolved Hide resolved
@lightninglabs-deploy
Copy link

@joostjager, remember to re-request review from reviewers when ready

@saubyk saubyk requested a review from carlaKC November 22, 2022 18:12
// Because the reason is unreadable for the payer
// anyway, we just replace it by a compliant-length
// series of random bytes.
msg.Reason = make([]byte, minimumFailReasonLength)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For debugging/visibility purposes, could it be an idea to pass back a static "I'm sorry the original error is unreadable"+pad error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose we'd actually handle this outcome differently (so going beyond debugging/visibility), this might motivate an attacker to corrupt failure messages in this exact way.

Also if the origin node receives this message, they won't know which node set it because this is part of the intermediate transform logic. They do know that it must have been an lnd node 😄

I like the suggestion, but not sure if it is actually good or beneficial. It should never happen with unmodified nodes, because all is set to 256 bytes currently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, not very likely to happen.

Copy link
Contributor

@halseth halseth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@carlaKC carlaKC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one minor comment then lgtm!

lnwire/onion_error.go Show resolved Hide resolved
@Roasbeef
Copy link
Member

Roasbeef commented Dec 2, 2022

@joostjager the dependent PR has been merged, so we can bump the dep here to unblock the CI.

This mock is used in the switch test TestUpdateFailMalformedHTLCErrorConversion.
But because the mock isn't very realistic, it doesn't detect problems
in the handling of malformed failures in the link.
Adds extra checks to make sure the failure message
is well-formed.
@joostjager
Copy link
Contributor Author

Removed go.mod replace and bumped lightning-onion dep.

This commit modifies the link behavior so that every failure
reason that we pass back is length-compliant (>=256 bytes).
htlcswitch/failure_test.go Outdated Show resolved Hide resolved
htlcswitch/failure_test.go Outdated Show resolved Hide resolved
htlcswitch/failure_test.go Outdated Show resolved Hide resolved
htlcswitch/failure_test.go Outdated Show resolved Hide resolved
htlcswitch/failure_test.go Outdated Show resolved Hide resolved
This fixes an incompatibility where lnd enforces a strict 256 byte
failure message, where as the spec sets this only as the recommended
length.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants