Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

routing+discovery: fail non-maxHTLC channel updates in validation #7415

Merged
merged 3 commits into from
Feb 22, 2023

Conversation

bitromortac
Copy link
Collaborator

Change Description

Fixes #7290. We don't broadcast or forward any channel updates that are considered invalid by the spec (see change lightning/bolts@6fee63f), channel updates that don't set the must_be_one (or option_channel_htlc_max) flag. It may also fix #7273.

There are several places where we handle channel updates (indicated by @Crypt-iQ in #7290 (comment)).

Flows that are handled by this PR are:

  • channel funding: addToRouterGraph
  • receiving and broadcasting channel updates from a peer: ProcessRemoteAnnouncement
  • updating local channel policies: PropagateChanPolicyUpdate

The local graph may still contain older channel updates that don't set the flag. In order to not send those in a historical sync, we check them with ValidateOptionalFields. A migration to re-sign possible local legacy channel updates is not straightforward as it would require signing capabilities at migration time. Removing remote channel updates with a migration could be another way to fix this.

Should I rename the ChanUpdateOptionMaxHtlc field since it's required and is now named must_be_one in the spec?

@Crypt-iQ
Copy link
Collaborator

This is very close - not sure why CI isn't running. I think it makes sense to have a commit that renames ChanUpdateOptionMaxHtlc as well as ValidateOptionalFields since the bit is no longer optional.

The change to ValidateOptionalFields ensures LND doesn't relay updates without the bit because:

  • handleChanUpdate calls the function before adding the edge or returning it as a gossip message to relay:
    err = routing.ValidateChannelUpdateAnn(pubKey, chanInfo.Capacity, upd)
  • UpdatesInHorizon now uses the function in the second commit
  • when processing an onion-wrapped error from a node in the route, we'll call ValidateChannelUpdateAnn before adding the edge to the graph:
    err = ValidateChannelUpdateAnn(pubKey, ch.Capacity, msg)

The first two points mean that we won't relay these updates and will just drop them on the floor. The third point means that an onion-encrypted update won't get into our graph. Note that addToRouterGraph in current master will always set the bit when sending the update to the gossiper.

// conform to the spec (anymore).
err := routing.ValidateChannelUpdateFields(0, edge1)
if err != nil {
log.Debugf("Invalid channel update %v: %v",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be an error log?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to an error log, I was not really sure here if we should surface it by default

@bitromortac
Copy link
Collaborator Author

I tested the following regtest setup: A (patched LND - sends invalid update) - B (LND - this PR) - C (core-lightning)
Node B detects the invalid update and does not relay it anymore.

Copy link
Collaborator

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! Nice and simple diff 🔥

@@ -16,7 +16,7 @@ type ChanUpdateMsgFlags uint8
const (
// ChanUpdateOptionMaxHtlc is a bit that indicates whether the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment no longer matches the variable name

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, thank you!

// optional htlc_maximum_msat field is present in this ChannelUpdate.
ChanUpdateOptionMaxHtlc ChanUpdateMsgFlags = 1 << iota
// required htlc_maximum_msat field is present in this ChannelUpdate.
ChanUpdateRequiredMaxHtlc ChanUpdateMsgFlags = 1 << iota
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here re comment and var name

case <-time.After(2 * time.Second):
t.Fatal("remote announcement not processed")
}
require.Error(t, err, "we expect the update to be invalid")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be slightly stricter here and assert that the error contains max htlc flag not set for channel update

@@ -2921,8 +2941,8 @@ func TestOptionalFieldsChannelUpdateValidation(t *testing.T) {
t.Fatalf("expected chan update to error, instead got %v", err)
}

// The final update should succeed, since setting the flag 0 means the
// nonsense max_htlc field will just be ignored.
// The final update should not succeed, a channel update with no message
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add this extra "should fail case" but then still have a final test that succeeds.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fully agree, added 👍

capacityMsat := lnwire.NewMSatFromSatoshis(capacity)
if capacityMsat != 0 && maxHtlc > capacityMsat {
return errors.Errorf("max_htlc(%v) for channel "+
"update greater than capacity(%v)", maxHtlc,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: this error can now fit in 2 lines with the new indentation

// conform to the spec (anymore).
err := routing.ValidateChannelUpdateFields(0, edge1)
if err != nil {
log.Errorf("invalid channel update %v: %v",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Just to make the log a bit clearer, perhaps: "not sending invalid channel update..."

@ellemouton
Copy link
Collaborator

qq @Crypt-iQ:

when processing an onion-wrapped error from a node in the route, we'll call ValidateChannelUpdateAnn before adding the edge to the graph which means that an onion-encrypted update won't get into our graph.

I think there is a chance that we add an invalid update to our graph, then update LND to include this change and restart. Then there is a (small) chance that we call this with an invalid update. right?

Should we perhaps just validate the update there too?

@Crypt-iQ
Copy link
Collaborator

Crypt-iQ commented Feb 20, 2023

I think there is a chance that we add an invalid update to our graph, then update LND to include this change and restart.

For the onion-wrapped updates, those are an update for a node in the route (so not us). So if we add them to the graph and then restart, at least we won't relay them. We could have went with deleting local+remote updates in the graph that are missing the bit, but I was a bit concerned about edge cases.

Then there is a (small) chance that we call this with an invalid update. right?

This fetches our channel update from the graph. There is a chance that an old node has the bit un-set and upgrades and then fails backwards in the link/switch. In the linked issue (#7290 (comment)) I also pointed out a case where the gossiper's reliableSender could have an old update persisted, but it seemed like a super-edge case. I also realized later that it's possible for the switch to be restarted and call reforwardSettleFails and send a fail message backwards with an old-style update w/o the bit. This htlcswitch case seems more likely but I don't think it will cause a CLN node to disconnect an LND node because the failing node could be several hops away.

This is all a long-winded way of saying that the fix isn't perfect, but I think it's crucial to get it in 0.16.0. There is some risk that an old node tries to send an old-style update, but then they can just upgrade

We rename `ChanUpdateOptionMaxHtlc` to `ChanUpdateRequiredMaxHtlc`
as with the latest changes it is now required.

Similarly, rename `validateOptionalFields` to
`ValidateChannelUpdateFields`, export it to use it in a later commit.
We require channel updates to have the max HTLC message flag set.

Several flows need to pass that check before channel updates are
forwarded to peers:
* after channel funding: `addToRouterGraph`
* after receiving channel updates from a peer:
  `ProcessRemoteAnnouncement`
* after we update channel policies: `PropagateChanPolicyUpdate`
We skip syncing channel updates that don't conform to the spec.
Copy link
Collaborator

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noice Noice Noice 🔥

@guggero guggero merged commit b3e27f9 into lightningnetwork:master Feb 22, 2023
@bitromortac bitromortac deleted the 2302-max-htlc branch April 17, 2024 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants