Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: htlc_maximum_msat is not optional anymore in channel updates #7290

Closed
yaslama opened this issue Jan 5, 2023 · 9 comments · Fixed by #7415
Closed

[bug]: htlc_maximum_msat is not optional anymore in channel updates #7290

yaslama opened this issue Jan 5, 2023 · 9 comments · Fixed by #7415
Assignees
Labels
bug Unintended code behaviour channels interop interop with other implementations
Milestone

Comments

@yaslama
Copy link
Contributor

yaslama commented Jan 5, 2023

Background

lnd forwards channel updates without htlc_maximum_msat which is now forbidden after lightning/bolts#999 was merged.
The problem is that in this case, other nodes (cln for instance) send back a warning message which triggers a disconnect.

Your environment

lnd v0.15.5-beta or master

Steps to reproduce

Run lnd and create a channel to a node running cln.

Expected behaviour

No channel update messages without htlc_maximum_msat

Actual behaviour

Some channel update messages without htlc_maximum_msat. Then a warning is issued by the other node (cln for instance). Then the peer is disconnected in lnd v0.15.5 because the warning message is "unknown" and in master because a warning triggers a disconnect for an active channel.

@yaslama
Copy link
Contributor Author

yaslama commented Jan 5, 2023

BTW the logs were the same as #7273 and we backported the warning message handling from master to see that the problem was the issue described here

@saubyk saubyk added the interop interop with other implementations label Jan 5, 2023
@saubyk saubyk added this to the v0.16.1 milestone Jan 5, 2023
@saubyk saubyk added the channels label Jan 5, 2023
@kingonly
Copy link

kingonly commented Jan 5, 2023

@saubyk this is spec incompatibly issue that has some nasty side effects. Can you please consider prioritizing to the next update?

@Roasbeef
Copy link
Member

Roasbeef commented Jan 5, 2023

Interesting, I think another operative question is: what software is sending out update w/o max sat set? FWIW, the spec doesn't say precisely what to do if you encounter such an update. A conservative thing to do would just be to forward it anyway, since it's older software that may not be keeping up lockstep with the frequent spec changes. If we started to reject this wholesale, this could lead to other unintended propagation issues (we've fixed a number of them in the upcoming 0.16 release).

in master because a warning triggers a disconnect for an active channel.

This should have been resolved with this PR: #6840 -- are you able to repro?

@kingonly
Copy link

kingonly commented Jan 5, 2023

We don't know what software is sending these updates, but we're seeing many.

If we started to reject this wholesale, this could lead to other unintended propagation issues (we've fixed a number of them in the upcoming 0.16 release).

I think messages that don't comply with the spec shouldn't be propagated. Are there any more that are being propagated?

@Roasbeef
Copy link
Member

Roasbeef commented Jan 5, 2023

I think this is the CLN commit that starts to reject (parsed as zero so fails validation?): ElementsProject/lightning@5553b85

I looked at this Eclair PR, but can't tell for sure if they're rejecting these updates themselves: https://github.com/ACINQ/eclair/pull/2299/commits

Looks like RL has been doing this since July or so (before that spec PR above was merged): https://github.com/lightningdevkit/rust-lightning/pull/1519/files


We don't know what software is sending these updates, but we're seeing many.

Gotcha, that's interesting, since the original motive for this was that from observations pretty much no updates were setting the value. If we're still seeing a lot, then I think that argues for pushing out the deployment of the spec modification.

From the original spec PR:

I have just downloaded a fresh batch of 134k channel updates from an LND peer; of those channel updates, 364 don't have an HTLC max set, which is 0.27%.
I have separately downloaded 161k updates from a different peer, and got 399 instances of an unset HTLC max, or 0.25%.
Indeed, every single of the 149964 channel updates seen by my node has his set.


I think messages that don't comply with the spec shouldn't be propagated. Are there any more that are being propagated?

Generally I'd agree, but this is a situation where something in the spec was merged that affects gossip, but not everyone has started to observe the new rules. If there're a lot of updates from routing nodes that would no longer propagate with strict enforcement, then we might want to consider holding off if the old assumptions weren't correct (tho ofc the change is trivial for the most part).

I guess w/e unknown impl/version is sending update w/o it set will just be slowly cut off from the network 🤷

@saubyk saubyk modified the milestones: v0.16.1, v0.16.0 Jan 6, 2023
@github-project-automation github-project-automation bot moved this to 🆕 New in lnd v0.16.0 Jan 6, 2023
@saubyk saubyk moved this from 🆕 New to 📋 Backlog in lnd v0.16.0 Jan 6, 2023
@yaslama
Copy link
Contributor Author

yaslama commented Jan 6, 2023

This should have been resolved with this PR: #6840 -- are you able to repro?

The peer was disconnected until I changed the line https://github.com/lightningnetwork/lnd/blob/master/peer/brontide.go#L1705 to return false instead of true (or commented the case).

@yaslama
Copy link
Contributor Author

yaslama commented Jan 6, 2023

Generally I'd agree, but this is a situation where something in the spec was merged that affects gossip, but not everyone has started to observe the new rules. If there're a lot of updates from routing nodes that would no longer propagate with strict enforcement, then we might want to consider holding off if the old assumptions weren't correct ([tho ofc the change is trivial]

In our logs, the channel updates without htlc_maximum_msat are very old (from 2019). In fact anyone can run a fake node which send such channel updates. I think that it's better to ignore any such invalid message and not forward them if we are sure that the last stable version(s) of the main known implementations are not generating them.

@Crypt-iQ
Copy link
Collaborator

This should have been resolved with this PR: #6840 -- are you able to repro?

The peer was disconnected until I changed the line https://github.com/lightningnetwork/lnd/blob/master/peer/brontide.go#L1705 to return false instead of true (or commented the case).

Can you re-link but with a "permalink" - that link now points to a comment since master changed. I can't see how either master or v0.15.5-beta receiving a warning results in disconnect. I think the CLN node here is sending a warning and then immediately disconnecting.

Do you have logs here? Or does the warning message give any more info? That would be very helpful

@Crypt-iQ
Copy link
Collaborator

We confirmed after testing with a live CLN node that this is because the channel_update sent to CLN didn't have the must_be_one bit set in the message_flags bitfield: https://github.com/lightning/bolts/blob/master/07-routing-gossip.md#the-channel_update-message

To fix this issue, we need to make sure that they 1) no longer send out ChannelUpdate with the bit un-set and 2) no longer accept ChannelUpdate with the bit un-set.

Point 1) is necessary because current nodes may already have these ChannelUpdates in their graph. If they were to perform a historical sync with a CLN node, they may send one of these older ChannelUpdates to the CLN node. There are six places that we send out ChannelUpdates:

Point 2) is so that we align with the spec and no longer accept the legacy encoding. There are two places where we process ChannelUpdate:

It may be possible to handle point 1) by simply performing a graph migration that checks for all of our own legacy ChannelUpdate and then adding checks to the reliable sender to ensure that we don't accidentally send out a legacy ChannelUpdate despite this migration.

As discussed offline, for point 2), we don't need to disconnect a peer for sending us this legacy ChannelUpdate. Instead, we can just drop the ChannelUpdate to ensure we don't segment older LND nodes. In the future, we can revisit this logic and potentially disconnect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Unintended code behaviour channels interop interop with other implementations
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants