Channel Graph Fetching from newer lightning nodes #615

TheBlueMatt · 2020-05-03T00:10:45Z

Newer versions of some lightning nodes (at least lnd, but not c-lightning) don't bother supporting sending the full routing table anymore in response to the initial_routing_sync bit. We should figure out what the minimal set of channel queries is to be able to tell such nodes to send us the full routing table at startup without taking zlib dependency. Long-term I'd probably prefer that we help get the sync stuff to a more robust redesign (see eg lightning/bolts#584 (comment) ) but certainly in the mean time we should at least be able to fetch the db.

naumenkogs · 2020-07-20T08:06:24Z

Related to #57, but seems more up-to-date.

naumenkogs · 2020-07-20T10:49:17Z

I'm planning to pick this up. To be clear, the plan is to use gossip_queries in a smart way, and avoiding zlib compression?

TheBlueMatt · 2020-07-20T20:28:34Z

Yep, gossip_queries kinda suck, but if we have to use then just to get the graph, we have to... still, ideally we don't implement more of the complexity than we have to.

bmancini55 · 2020-08-20T23:33:32Z

I would like to help if you all want the assistance. I have experience implementing gossip_queries in the node-lightning project and would be happy to share that experience or work on the implementation if its not too late.

TheBlueMatt · 2020-08-21T00:06:03Z

Cool! I don't believe @naumenkogs is working on this, at least based on his PR's, but I'll let him correct me if not. As for where, you should check out peer_handler.rs, at least where it handles initial_routing_sync on the init message (https://github.com/rust-bitcoin/rust-lightning/blob/master/lightning/src/ln/peer_handler.rs#L589 and https://github.com/rust-bitcoin/rust-lightning/blob/master/lightning/src/ln/peer_handler.rs#L717). If you need to interface with the network graph to get information, do so through the RoutingMessageHandler trait (https://github.com/rust-bitcoin/rust-lightning/blob/master/lightning/src/ln/msgs.rs#L600-L622).

naumenkogs · 2020-08-23T11:37:19Z

I was planning, but haven't started working on this issue yet, so please go ahead :)

bmancini55 · 2020-08-26T02:11:57Z

I'm still getting familiar with the codebase but wanted to share some thoughts and experiences on how we might tackle this.

At a high level, a minimalist implementation should only support gossip_queries and not bother with gossip_queries_ex. We will need to modify feature negotiation during init to signal gossip_queries. Once negotiated, init_routing_sync is ignored and they will not gossip to us until we send a gossip_timestamp_filter.

To keep things simple with gossip_timestamp_filter we can assume that we want all messages from the peer and should set the filter to be as broad as possible. If we set first_timestamp to 0, it is possible that we receive a routing table dump. In practice, this does not appear to be a reliable way to backfill routing table information, which would be nice.

When using query_channel_range, I have been requesting the entire block range when a node has been offline for more than a trivial amount of time. This ensures that channel_update and node_announcement messages aren't missed and we don't have to wait for the updates to propagate via relay if/when those are periodically updated.

Prior to a clarifications of the spec, reply_channel_range spanning multiple messages used inconsistent formats. The clarification helps establish the meaning of the full_information flag and enables proper sequencing of multipart replies. IMO, we should focus on implementing the current specification and ignore inconsistencies in the past.

Not supporting zlib presents a challenge in that there is no way to dictate the encoding sent in reply_channel_range. From observation:

LND sends plaintext responses (though this may change)
eclair sends zlib responses
c-lightning sends zlib responses

If we connect to c-lightning or eclair node, we are out of luck and will need to fail the sync and try with a different peer. It might be best to push for a new feature flag in the spec to signal zlib support.

In my implementation, I blackboxed the sync process. The two main benefits are:

we get success/failure outcome for the aggregate of the process and can try from a different peer if there is a problem
we can disable relay while we are performing a sync so we don't fill the buffer with old messages

I'll send a follow up with some thoughts on actual implementation as I get more familiar with the code. Thanks!

TheBlueMatt · 2020-08-26T15:39:25Z

It might be best to push for a new feature flag in the spec to signal zlib support.

Definitely. Am I correct, then, in my understanding that there is no reliable way to fetch the routing table from both an lnd node and a c-lightning node without taking a zlib dependency or without playing some game where we guess the remote node type and reconnect with different flags until things work?

I have been requesting the entire block range when a node has been offline for more than a trivial amount of time.

Regarding using block filters, does this imply that we would miss any channel announcements which were only broadcast a while (eg lets say a day) after the block where the channel was opened if we set anything other than a "give me everything" filter? Its totally reasonable for nodes to be offline when their commitment transaction confirms and only broadcast the channel announcement much later, which seems to imply other nodes would miss it due to requesting a filter of "all the stuff since I was last online, as determined by when the channel confirmed, not when the channel_announcement was broadcast"?

In my implementation, I blackboxed the sync process. The two main benefits are:

I'm not sure what you mean by this - are you implying that the RoutingMessageHandler has some ability to query the PeerHandler and learn about things across different peers, or are you saying this is just for testing? Maybe a demo PR would help.

bmancini55 · 2020-08-26T20:16:46Z

Definitely. Am I correct, then, in my understanding that there is no reliable way to fetch the routing table from both an lnd node and a c-lightning node without taking a zlib dependency or without playing some game where we guess the remote node type and reconnect with different flags until things work?

That is my understanding as well.

Regarding using block filters, does this imply that we would miss any channel announcements which were only broadcast a while (eg lets say a day) after the block where the channel was opened if we set anything other than a "give me everything" filter? Its totally reasonable for nodes to be offline when their commitment transaction confirms and only broadcast the channel announcement much later, which seems to imply other nodes would miss it due to requesting a filter of "all the stuff since I was last online, as determined by when the channel confirmed, not when the channel_announcement was broadcast"?

You are correct, if your time horizon prior to your last known block is too short you may miss a chan_ann message. For example:

Alice and Bob are both online at height 0
Alice broadcasts a funding tx and goes offline
The tx gets confirmed at height 100
Bob goes offline around block 400
Alice comes online around block 500 and broadcasts the chan_ann
Bob comes back online around block 700

Bob believes he has a complete view until he went offline at 400, but if his time horizon is short (say 1 days worth), he will construct his query for block 256+ and miss Alice's channel that is attached to block 100.

This might not be that common an issue since Alice's counterparty may fail the channel if she fails to send her funding_locked message in a reasonable amount of time. But it's still an issue.

In can also be resolved if Alice is active and sends periodic chan_upd messages. When Bob's node sees the chan_upd for an unknown channel he can use query_short_channel_ids to ask for the chan_ann + chan_upd + node_ann messages.

Another reason I've been fetching the full range is a related problem. Because query_channel_range is based on the block height of the channel and chan_upd and node_ann messages are timestamp based, you may also miss updates for older channels. For instance:

Alice has a channel from block 0
Bob has knowledge of this channel
Bob goes offline at block 500
Alice issues a chan_upd or updates her node and has new features and broadcasts a node_ann around block 600
Bob comes online at block 800
Bob syncs from his last known point of 500 less some horizon
Bob misses the chan_upd and node_ann messages because they are from a channel belonging to a block prior to the query range.

Doing a full sync is expensive, but it at least guarantees that we have all information that the peer has.

I'm not sure what you mean by this - are you implying that the RoutingMessageHandler has some ability to query the PeerHandler and learn about things across different peers, or are you saying this is just for testing? Maybe a demo PR would help.

Sorry for the confusion, sort of a aside on the implementation side of things. I was referring to the implementation in node-lightning which has a different architecture.

TheBlueMatt · 2020-08-26T21:20:29Z

Doing a full sync is expensive, but it at least guarantees that we have all information that the peer has.

RIght, sounds like we pretty much need to request a full sync always until we get a set-reconciliation protocol to replace the existing sync stuff.

Sorry for the confusion, ...

Ah, ok, looking forward to seeing your work here :)

bmancini55 · 2020-11-02T22:47:25Z

BTW, I created this issue in the lightning-rfc requesting zlib support negotiation: lightning/bolts#811

TheBlueMatt · 2021-11-10T20:19:06Z

I believe #1015 is the only thing left here.

TheBlueMatt · 2023-04-24T16:27:34Z

Compression got yanked from the spec, this is now complete.

TheBlueMatt added this to the 0.1 milestone May 3, 2020

bmancini55 mentioned this issue Sep 7, 2020

Adding gossip_queries messages and serializations #684

Merged

bmancini55 mentioned this issue Oct 26, 2020

Initiate gossip_queries #736

Merged

TheBlueMatt added the blocked on spec label Nov 10, 2021

TheBlueMatt closed this as completed Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Channel Graph Fetching from newer lightning nodes #615

Channel Graph Fetching from newer lightning nodes #615

TheBlueMatt commented May 3, 2020

naumenkogs commented Jul 20, 2020

naumenkogs commented Jul 20, 2020 •

edited

Loading

TheBlueMatt commented Jul 20, 2020

bmancini55 commented Aug 20, 2020

TheBlueMatt commented Aug 21, 2020

naumenkogs commented Aug 23, 2020

bmancini55 commented Aug 26, 2020

TheBlueMatt commented Aug 26, 2020

bmancini55 commented Aug 26, 2020

TheBlueMatt commented Aug 26, 2020

bmancini55 commented Nov 2, 2020

TheBlueMatt commented Nov 10, 2021

TheBlueMatt commented Apr 24, 2023

Channel Graph Fetching from newer lightning nodes #615

Channel Graph Fetching from newer lightning nodes #615

Comments

TheBlueMatt commented May 3, 2020

naumenkogs commented Jul 20, 2020

naumenkogs commented Jul 20, 2020 • edited Loading

TheBlueMatt commented Jul 20, 2020

bmancini55 commented Aug 20, 2020

TheBlueMatt commented Aug 21, 2020

naumenkogs commented Aug 23, 2020

bmancini55 commented Aug 26, 2020

TheBlueMatt commented Aug 26, 2020

bmancini55 commented Aug 26, 2020

TheBlueMatt commented Aug 26, 2020

bmancini55 commented Nov 2, 2020

TheBlueMatt commented Nov 10, 2021

TheBlueMatt commented Apr 24, 2023

naumenkogs commented Jul 20, 2020 •

edited

Loading