Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not fail to apply RGS updates for removed channels #2046

Merged
merged 3 commits into from
Feb 28, 2023

Conversation

TheBlueMatt
Copy link
Collaborator

If we receive a Rapid Gossip Sync update for channels where we are
missing the existing channel data, we should ignore the missing
channel. This can happen in a number of cases, whether because we
received updated channel information via an onion error from an
HTLC failure or because we've partially synced the graph from a
peer over the standard lightning P2P protocol.

@TheBlueMatt TheBlueMatt added this to the 0.0.114 milestone Feb 23, 2023
@codecov-commenter
Copy link

codecov-commenter commented Feb 23, 2023

Codecov Report

Patch coverage: 58.33% and project coverage change: +1.21 🎉

Comparison is base (b5e5435) 87.25% compared to head (d2f5dc0) 88.47%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2046      +/-   ##
==========================================
+ Coverage   87.25%   88.47%   +1.21%     
==========================================
  Files         100      100              
  Lines       44480    50694    +6214     
  Branches    44480    50694    +6214     
==========================================
+ Hits        38810    44850    +6040     
- Misses       5670     5844     +174     
Impacted Files Coverage Δ
lightning-rapid-gossip-sync/src/lib.rs 77.61% <0.00%> (ø)
lightning/src/util/macro_logger.rs 91.48% <ø> (ø)
lightning-rapid-gossip-sync/src/processing.rs 91.70% <60.00%> (-1.13%) ⬇️
lightning-background-processor/src/lib.rs 81.88% <100.00%> (ø)
lightning/src/sync/nostd_sync.rs 42.10% <0.00%> (-0.40%) ⬇️
lightning/src/sync/mod.rs 50.00% <0.00%> (ø)
lightning/src/sync/test_lockorder_checks.rs 100.00% <0.00%> (ø)
lightning/src/ln/functional_tests.rs 97.53% <0.00%> (+0.24%) ⬆️
lightning/src/ln/functional_test_utils.rs 88.74% <0.00%> (+0.31%) ⬆️
lightning/src/routing/router.rs 93.75% <0.00%> (+0.97%) ⬆️
... and 11 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@naumenkogs
Copy link
Contributor

Doesn't this change simply reduces the amount of information provided to the caller? Not sure I understand why this is useful.

@wpaulino
Copy link
Contributor

wpaulino commented Feb 24, 2023

Doesn't this change simply reduces the amount of information provided to the caller? Not sure I understand why this is useful.

Somewhat. This is on the incremental update path, so the payload will only include the fields that were mutated based on the prior update. If you don't have a prior update, then it's possible (most likely?) you end up with an incorrect policy for said channel, allowing pathfinding to try the channel, only to end up with a failure. If we skip applying the update, we won't try the channel at all until we receive the full policy.

If you keep syncing the graph via RGS though, I would imagine you won't get the full update again, since you would just continue to receive incremental updates. If we were to apply the channel update, even with a potentially incorrect policy, we'll at least have a chance at obtaining the full policy once we try to route a HTLC over it and it fails.

@arik-so arik-so self-requested a review February 24, 2023 20:06
@arik-so arik-so self-assigned this Feb 24, 2023
@TheBlueMatt
Copy link
Collaborator Author

Doesn't this change simply reduces the amount of information provided to the caller? Not sure I understand why this is useful.

Well, we change from returning an Err and refusing to process further updates from the RGS file to continuing to process further updates and just log it. I suppose we could keep the Err return value and still process further updates, but its not really an error - it may just mean the user synced part of the graph from some other source.

@arik-so
Copy link
Contributor

arik-so commented Feb 24, 2023

This change is actually more fine-grained than that. Previously we were skipping missing updates if we didn't have the channel. However, if we had the channel but were missing data for the given direction, we would previously fail, whereas with this change we now skip it.

My question, however, is this: Under which circumstances can we lose only a channel's directional data, but either retain the channel or at the very least retain the other direction. If both directions' data are purged, do we then automatically also purge the channel from the network graph?

action: ErrorAction::IgnoreError,
})?;

if let Some(directional_info) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably just me, but I think the right-hand-side of an if let expression shouldn't have 4 chained expressions, plus one nested one. Can we extract a variable or two?

@TheBlueMatt
Copy link
Collaborator Author

Under which circumstances can we lose only a channel's directional data, but either retain the channel or at the very least retain the other direction.

In remove_stale_channels_and_tracking_with_time we remove directions individually when that direction has gone stale. We only remove the full channel data when both directions are stale.

@naumenkogs
Copy link
Contributor

naumenkogs commented Feb 27, 2023

The change looks good.
ACK b4f5f9b

naumenkogs
naumenkogs previously approved these changes Feb 27, 2023
@TheBlueMatt
Copy link
Collaborator Author

Rebased to fix some "the neighboring line changed" issues.

@wpaulino
Copy link
Contributor

In remove_stale_channels_and_tracking_with_time we remove directions individually when that direction has gone stale. We only remove the full channel data when both directions are stale.

Any reason we don't remove upon only one direction being stale (ref lightning/bolts#767)?

wpaulino
wpaulino previously approved these changes Feb 27, 2023
@TheBlueMatt
Copy link
Collaborator Author

Oh, I'm sorry, I misread remove_stale_channels_and_tracking_with_time, we do indeed remove if either direction is stale. Still, we only do so if we didn't receive the channel recently (otherwise we'd prune stuff while syncing, which would be bad), so you can still hit this issue.

@arik-so
Copy link
Contributor

arik-so commented Feb 28, 2023

Sorry, can you elaborate on the circumstances under which we'd hit this issue? When exactly would just one direction get pruned?

@TheBlueMatt
Copy link
Collaborator Author

So remove_stale_channels_and_tracking_with_time prunes one side at a time. It then removes the full channel data if either side has been removed but only if we've known about the channel for a while. If we haven't known about the channel for a while, we assume it's because we're currently syncing the graph (and we don't want to remove it - we'll learn about the other side of the channel in a minute or two). If we're doing P2P sync, this can definitely cause some channels with only one side.

@arik-so
Copy link
Contributor

arik-so commented Feb 28, 2023

So that basically means that this error should only have been occurring for people who mixed RGS with p2p syncing?

I thought some folks complained that when they downloaded an incremental update, it hit this snag after only applying the original full sync.

If we receive a Rapid Gossip Sync update for channels where we are
missing the existing channel data, we should ignore the missing
channel. This can happen in a number of cases, whether because we
received updated channel information via an onion error from an
HTLC failure or because we've partially synced the graph from a
peer over the standard lightning P2P protocol.
... by using explicit paths rather than requiring imports.
@TheBlueMatt
Copy link
Collaborator Author

So that basically means that this error should only have been occurring for people who mixed RGS with p2p syncing?

It looks like we currently only prune there, so as long as the RGS server always provides both sides of a channel RGS-only shouldn't cause it. However, if you get one side of a channel updated via a channel failure, it may have a newer timestamp than the other side, which could cause this as well. Finally, I'm not super convinced RGS will never generate such updates - I don't see any explicit filtering code for this, and while the LDK removal of stale directional updates should make it less likely, but not impossible.

Rebased on upstream to fix CI and included one (squashed fix):

diff --git a/lightning-rapid-gossip-sync/src/processing.rs b/lightning-rapid-gossip-sync/src/processing.rs
index 342517bc0..f1072e26b 100644
--- a/lightning-rapid-gossip-sync/src/processing.rs
+++ b/lightning-rapid-gossip-sync/src/processing.rs
@@ -171,7 +171,7 @@ impl<NG: Deref<Target=NetworkGraph<L>>, L: Deref> RapidGossipSync<NG, L> where L
                                let read_only_network_graph = network_graph.read_only();
                                if let Some(directional_info) =
                                        read_only_network_graph.channels().get(&short_channel_id)
-                                       .map(|channel| channel.get_directional_info(channel_flags)).unwrap_or(None)
+                                       .and_then(|channel| channel.get_directional_info(channel_flags))
                                {
                                        synthetic_update.cltv_expiry_delta = directional_info.cltv_expiry_delta;
                                        synthetic_update.htlc_minimum_msat = directional_info.htlc_minimum_msat;

@wpaulino wpaulino merged commit 0b1a64f into lightningdevkit:main Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants