Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct connect for OnionMessage sending #2723

Merged
merged 18 commits into from Dec 8, 2023

Conversation

jkczyz
Copy link
Contributor

@jkczyz jkczyz commented Nov 9, 2023

Not all nodes support onion messages, so it may not be possible for a MessageRouter to find a complete path to an OnionMessage's destination. A direct connection to the first node in an OnionMessagePath is therefore required. This PR adds a ConnectionNeeded event to signal to users to connect to a node.

When sending an OnionMessage or replying to one -- but not when forwarding -- OnionMessenger will buffer messages for immediate recipients that are not peers and upon connection make them available for processing as normal. A new release_pending_connections method is added to the OnionMessageHandler trait for the PeerManager to call and provide the necessary ConnectionNeeded events. Thus, PeerManager now implements EventsProvider and is called in BackgroundProcessor accordingly.

PeerManager calls the new method in its own timer_ticks_occurred method. This allows OnionMessenger to drop any buffered messages for nodes that haven't been connected in a certain number of timer ticks.

secp_ctx: Secp256k1<secp256k1::All>,
message_router: MR,
offers_handler: OMH,
custom_handler: CMH,
}

/// [`OnionMessage`]s buffered to be sent.
enum OnionMessageBuffer {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking OnionMessageRecipient may be a better name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed as mentioned in a separate commit.

lightning/src/events/mod.rs Outdated Show resolved Hide resolved
@@ -516,18 +583,23 @@ where
pub fn send_onion_message<T: OnionMessageContents>(
Copy link
Contributor Author

@jkczyz jkczyz Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if this should still be pub? We send through the handlers, so this is not used internally outside of this module. Internally, we need to examine the return value to decide if a connection is needed. So users calling this won't trigger a connection request nor will the message be buffered for them, except if we previously decided that a connection is needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its public so users can send custom messages easily, which I think should support, if possible. We'll need to handle the needs-connection case internally here, though, if we want to keep it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface here isn't congruent with other sending, FWIW. This takes an OnionMessagePath instead of a Destination, so the assumption is that the MessageRouter has already been called prior to calling send_onion_message. We'd probably want to do message routing internally, I'd imagine?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, yea, I think so. I mean we don't really need the interface at all, but its nice to have, as long as it is easy to use.

Comment on lines 2404 to 2409
let nodes_to_connect =
self.message_handler.onion_message_handler.release_pending_connections();
let mut pending_events = self.pending_events.lock().unwrap();
for node_announcement in nodes_to_connect {
pending_events.push(Event::ConnectionNeeded { node_announcement });
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally, I had this in process_events, but I moved it to timer_tick_occurred so the timeout could be in terms of ticks. However, it has occurred to me that having it here means we may wait up to 10 seconds before even generating the events, which would result in bad UX. We could instead simply generate events directly in process_pending_events and process_pending_events_async, dropping the intermediary pending_events Vec. But, as before, we'd have to define the timeout in something other than ticks, or possibly add another method to OnionMessageHandler for calling in timer_tick_occurred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the OnionMessageHandler trait to have a timer_tick_occurred method so that dropping buffered messages is independent of providing events.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some high level design questions I'd like to understand first.

lightning/src/onion_message/messenger.rs Show resolved Hide resolved
lightning/src/onion_message/messenger.rs Outdated Show resolved Hide resolved
@@ -516,18 +583,23 @@ where
pub fn send_onion_message<T: OnionMessageContents>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its public so users can send custom messages easily, which I think should support, if possible. We'll need to handle the needs-connection case internally here, though, if we want to keep it.

/// [`MessageRouter`]: crate::onion_message::MessageRouter
/// [`Destination`]: crate::onion_message::Destination
/// [`OnionMessageHandler`]: crate::ln::msgs::OnionMessageHandler
ConnectionNeeded {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you end up going the event route vs having an "start opening a connection async" trait (or even just doing it via the onion message router)? I guess an event could let us reuse it later in ChannelManager to just tell the user to reconnect every time a peer disconnects we care about, rather than users having to poll that, but it does seem like a lot of indirection to get the needs-connection event out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would require adding another parameterization for users to implement (i.e., either a new trait on OnionMessenger or one on DefaultMessageRotuer), which would then need to bubble up to the Simple* aliases.

And since it needs to be async, the trait implementation wouldn't be enough. Users would need to write code to process connection requests in the background from what their trait enqueued. If we wrote some of this logic for them, then we'd need to do it for each language's IO layer. But that wouldn't fit well into any peer management logic that they may have (e.g., LDK Node's PeerStore).

Seems much simpler just to surface an event for them so that no additional parameterization and processing code is needed, other than handling one more event variant.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, fair, I guess. I think the current piping through of the needs-connected nodes is also a bit gross, but I'm not sure its more gross. One further thing to consider is what we want the eventual end state to be. If we want to to generate ConnectionNeeded events for peers with which we have a channel, we should stick with the event route (and probably start generating such events in the same release). If we don't, adding it via the router is easier to "walk back", and is at least only gross temporarily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we pipe the needs-connect nodes through OffersMessageHandler / CustomOMHandler with a new trait method? Then the ChannelManager could generate ConnectionNeeded events and we wouldn't have to trouble the background processor. I do see Jeff's argument that the current design aligns with the peer manager doing peer stuff though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... that wouldn't work for uses of send_onion_message or anyone using something other than ChannelManager to implement one of those traits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The custom trait implementations would need to support connecting to peers outbound, would it not work in some other way? I'm not convinced it's cleaner than the current approach though :/ just nicer in terms of the PeerMan/BgProcessor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that I follow. All custom implementations should work with the current approach of using OnionMessenger. Or are you saying a custom implementation of OnionMessageHandler (i.e., something other than OnionMessenger)? Note that CustomOnionMessageHandler and OffersMessageHandler are specific to OnionMessenger. Any implementations of those work fine with the current approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. Updated to have events processed directly from any OnionMessageHandler by requiring implementations to also implement EventsProvider. I was able to do so without adding another type parameter to BackgroundProcessor since the handler can be fetched from PeerManager.

return;
match self.send_onion_message(path, contents, reply_path) {
Err(SendError::InvalidFirstHop(first_node_id, onion_message)) => {
let network_graph = self.network_graph.deref().read_only();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is this the right place for this? We could avoid the network graph reference by pushing this into the onion message routing interface, and have it be allowed to return some kind of "My route requires you to connect to (Vec<SocketAddress>, PublicKey) flag". Alternatively, we could push it into the event handler, which probably already has such a reference, but some users implement that themselves so they'd be stuck doing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is this the right place for this? We could avoid the network graph reference by pushing this into the onion message routing interface, and have it be allowed to return some kind of "My route requires you to connect to (Vec<SocketAddress>, PublicKey) flag".

Eh, I suppose if DefaultMessageRouter will ultimately need a NetworkGraph reference, perhaps it would be better to have it there?

Having it in OnionMessenger was as a check against malicious actors sending onion messages containing bogus reply paths. Otherwise, we'd buffer messages that could never be delivered until the timeout drops them, which could be problematic if it causes us to exceed MAX_TOTAL_BUFFER_SIZE. So I figured that logic shouldn't be left to implementors of MessageRouter.

Also note that currently a MessageRouter is allowed to return a partial path if it can't find a full one back from the destination. And we'll direct connect to the start of that path. So returning a "needs connection" from there would at very least complicate the result type, if we want to support that still.

Further, it's a little roundabout to have MessageRouter tell OnionMessenger that it needs a connection to a node when OnionMessenger is the one that already is keeping track of the peers. And note that currently send_onion_message takes the path and contents, which it takes and creates an OnionMessage. So if MessageRouter is responsible for saying "needs connection" then we still need to call send_onion_message in order to have something to buffer. That at very least complicates the processing of MessageRouter's return value.

I'm a little torn on the right approach here.

Alternatively, we could push it into the event handler, which probably already has such a reference, but some users implement that themselves so they'd be stuck doing it.

Not sure I understand this. Are you saying to just have the event say needs connection to some node id and have the handler look it up in the NetworkGraph? We'd blindly buffer messages in that case, too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I suppose if DefaultMessageRouter will ultimately need a NetworkGraph reference, perhaps it would be better to have it there?

Makes sense to me.

Having it in OnionMessenger was as a check against malicious actors sending onion messages containing bogus reply paths. Otherwise, we'd buffer messages that could never be delivered until the timeout drops them, which could be problematic if it causes us to exceed MAX_TOTAL_BUFFER_SIZE. So I figured that logic shouldn't be left to implementors of MessageRouter.

Hmm, that's a fair point. That said, I'm not sure how an implementor of MessageRouter could screw that up - like if they're required to return a NetAddress they either give us something we concretely should try to connect to, or not.

Also note that currently a MessageRouter is allowed to return a partial path if it can't find a full one back from the destination. And we'll direct connect to the start of that path. So returning a "needs connection" from there would at very least complicate the result type, if we want to support that still.

True, though only in the sense that we'd have to add a Vec<NetAddress> to OnionMessagePath, which doesn't seem insane, even if a bit strange.

Further, it's a little roundabout to have MessageRouter tell OnionMessenger that it needs a connection to a node when OnionMessenger is the one that already is keeping track of the peers.

Mmm, true. I mean we do already pass the list of connected peers to the router, so its not like it shouldnt/wont have that info on-hand and for good reason. I guess in my head it doesn't seem wrong at all - the router's job is to look at the network graph plus the peers we have and where we want to get to and decide the best way to get there. If that means connecting, it can make that decision, but its up to the router, and once its decided, it seems like it should be able to communicate that fully to the messenger, which is responsible for actually sending the message (and, awkwardly, passing some event through the peermanager to the user to trigger the connection, but really that is a router-originated event).

And note that currently send_onion_message takes the path and contents, which it takes and creates an OnionMessage. So if MessageRouter is responsible for saying "needs connection" then we still need to call send_onion_message in order to have something to buffer. That at very least complicates the processing of MessageRouter's return value.

Yea, as discussed above istm we should change the send_onion_message API.

Not sure I understand this. Are you saying to just have the event say needs connection to some node id and have the handler look it up in the NetworkGraph? We'd blindly buffer messages in that case, too.

Yea, I hadn't considered that we're DC'ing in response to invoice_requests, not just send_payment_for_offers, so we can't really do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me go ahead and move NetworkGraph to DefaultMessageRouter. I guess the idea would be to add a Vec<SocketAddress> field to OnionMessagePath? I used NodeAnnouncement since it would give the user more information on deciding whether they want to connect to the node. But perhaps SocketAddresses are sufficient. And I suppose a MessageRouter may want to connect to a private node, so we should support that.

(and, awkwardly, passing some event through the peermanager to the user to trigger the connection, but really that is a router-originated event)

An alternative could be to parameterize BackgroundProcessor with OnionMessenger or MessageRotuer, but I figured we'd want to avoid adding more parameterization there. Seemed to make sense to have PeerManager be responsible as a more higher level abstraction for peer-related needs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let me go ahead and move NetworkGraph to DefaultMessageRouter. I guess the idea would be to add a Vec field to OnionMessagePath?

Yea, that was my thinking. I mean ideally we wouldn't require returning anything at all, we can figure out if we need to connect without. But to your point we kinda need to here for DoS reasons. One issue is in the ChannelManager if we want to generate the new events we have no idea what the network address is, which makes it hard to generate the events there.

I used NodeAnnouncement since it would give the user more information on deciding whether they want to connect to the node. But perhaps SocketAddresses are sufficient.

Yea, it just seemed like excess information that wasn't really required, and socket addresses are optional inside that so we kinda want to force them.

And I suppose a MessageRouter may want to connect to a private node, so we should support that.

I wasn't actually even thinking about that, but, yea, fair point.

An alternative could be to parameterize BackgroundProcessor with OnionMessenger or MessageRotuer, but I figured we'd want to avoid adding more parameterization there. Seemed to make sense to have PeerManager be responsible as a more higher level abstraction for peer-related needs.

Yea, seems fine, just a lot of piping stuff through to get from the onion message router to the background processor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related, we'll need to handle sciddir_or_pubkey somewhere. That is, going from Destination containing a BlindedPath with and scid to OnionMessagePath having a Destination containing a BlindedPath with a pubkey.

I suppose doing this in a MessageRouter is fine. But when wanting to construct such a blinded path, we'll need the NetworkGraph, too. This responsibility then may want to also live in MessageRouter, but the current draft is to have it in Router partly because ChannelManager already has one. So this may need to be rethought.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yea, good point. We can do that when we get there on the receiving side, though.

@jkczyz
Copy link
Contributor Author

jkczyz commented Nov 16, 2023

Updated per offline discussion. Let me know if this new approach looks good, and I'll write up some tests.

@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2023

Codecov Report

Attention: 55 lines in your changes are missing coverage. Please review.

Comparison is base (37150b4) 88.63% compared to head (0b83116) 89.49%.
Report is 22 commits behind head on main.

Files Patch % Lines
lightning/src/onion_message/messenger.rs 74.44% 40 Missing and 6 partials ⚠️
lightning-background-processor/src/lib.rs 86.84% 1 Missing and 4 partials ⚠️
lightning/src/events/mod.rs 0.00% 2 Missing ⚠️
lightning/src/onion_message/functional_tests.rs 98.71% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2723      +/-   ##
==========================================
+ Coverage   88.63%   89.49%   +0.85%     
==========================================
  Files         115      115              
  Lines       90446    97229    +6783     
  Branches    90446    97229    +6783     
==========================================
+ Hits        80170    87017    +6847     
+ Misses       7875     7829      -46     
+ Partials     2401     2383      -18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jkczyz
Copy link
Contributor Author

jkczyz commented Nov 16, 2023

Latest push is to fix CI. I'll need to rebase to fix the fuzz test.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good I think. Will give it another glance later.

lightning-background-processor/src/lib.rs Outdated Show resolved Hide resolved
lightning/src/events/mod.rs Outdated Show resolved Hide resolved
use lightning::events::EventsProvider;

let events = core::cell::RefCell::new(Vec::new());
peer_manager.onion_message_handler().process_pending_events(&|e| events.borrow_mut().push(e));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe after async functions in traits are stabilized, we can add an async event processing method to EventsProvider with a default implementation 🤔 I'm just wondering if this could be a temporary workaround.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, possibly. EventsProvider is kinda weird in that up until now it never really needed to be a trait.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the reason for this now? Can we not just add a process_pending_events_async method on OnionMessenger just like we do for ChannelManager?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BackgroundProcessor gets the OnionMessenger through PeerManager, which only knows it is something that implements OnionMessageHandler and EventsProvider. If we instead pass OnionMessenger to BackgroundProcessor, then we would need to add three more type parameters to BackgroundProcessor.

@@ -1580,6 +1580,9 @@ pub trait OnionMessageHandler {
/// drop and refuse to forward onion messages to this peer.
fn peer_disconnected(&self, their_node_id: &PublicKey);

/// Performs actions that should happen on startup and roughly every ten seconds thereafter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, docs tend to be more extensive on this method elsewhere. You may have been waiting for a concept ACK first though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded a bit, but note that other timer_tick_occurred methods are on structs whereas this is on a trait and thus is not associated with any concrete implementations.

@TheBlueMatt
Copy link
Collaborator

Concept ACK, I think it all looks good to me, can do a thorough review when val's comments are addressed.

@TheBlueMatt TheBlueMatt added this to the 0.0.119 milestone Nov 27, 2023
Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be good with a rebase to fix CI

lightning/src/onion_message/messenger.rs Outdated Show resolved Hide resolved

match message_buffers.entry(first_node_id) {
hash_map::Entry::Vacant(e) => match addresses {
None => Err(SendError::InvalidFirstHop(first_node_id)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message docs need updating if we want to use this here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... this is the only place it is used. What's missing / wrong with the docs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs say "The first hop does not support onion message forwarding." But we don't really know that they don't advertise OM support at this point, we only know that they're not connected to us.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right! This should now mean we don't know how to connect to the first hop.


match message_buffers.entry(first_node_id) {
hash_map::Entry::Vacant(e) => match addresses {
None => Err(SendError::InvalidFirstHop(first_node_id)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get to this point, it means that the peer was previously connected+known to support OMs but was disconnected since find_path was called, right? Given that, would it be better or is there some way to buffer it here instead?

Copy link
Contributor Author

@jkczyz jkczyz Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first_node_id isn't necessarily a peer anymore. DefaultMessageRouter will look up the node in the NetworkGraph if it isn't one. More generally, a MessageRouter can return a partial path back from the Destination, and we'll happily generated a ConnectionNeeded event with the first node in such a path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that if someone is using DefaultMessageRouter, there's a race where we are connected to a peer when find_path is initially called, but disconnected by the time we go to actually send the OM to that first-hop peer. A fix would be to always poll the network graph for socket addresses in find_path, but this may be too rare to bother.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that can occur because we don't hold the lock while finding a path. Seems like we might as well populate addresses unconditionally. Hmmm... but that won't work if the peer is unannounced (as we don't have the SocketAddresses handy) or if another implementation of MessageRouter fails to populate the field, for whatever reason.

I think we'd have to store the addresses from the Init message on connection. Then upon disconnection, we could transition to a third DisconnectedPeer state to hold on to the addresses until some timer ticks have passed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheBlueMatt Any thoughts on this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, also not sure its worth worrying about too much, it should be incredibly rare timing-wise; in general we're supposed to be tolerant of onion messages getting lost. If we're really worried about it we can unconditionally populate the addresses (we should really always be sending OMs to public nodes of they're not reliable anyway), but the extra graph lookup to send a message to our peer kinda sucks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alternative to the graph lookup is storing them on connection from the Init message. Either way, we can easily do it later given we don't persist any of this data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont have an address list in the Init, though? (well, our address, but not theirs).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right. I incorrectly remembered the purpose of those addresses.

Copy link
Contributor Author

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests and fixed a bug that was preventing the timer from ticking once the ConnectionNeeded event was generated. 🤦‍♂️

@TheBlueMatt
Copy link
Collaborator

Needs rebase.

@jkczyz
Copy link
Contributor Author

jkczyz commented Nov 30, 2023

Needs rebase.

Rebased

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only really one comment, basically LGTM. Feel free to squash.

lightning/src/onion_message/messenger.rs Outdated Show resolved Hide resolved
let mut new_pending_messages = VecDeque::new();
core::mem::swap(pending_messages, &mut new_pending_messages);
*self = OnionMessageBuffer::ConnectedPeer(new_pending_messages);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: else { debug_assert!(false); }, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is chained after .or_insert_with(|| OnionMessageBuffer::ConnectedPeer(VecDeque::new())), so it would cause test failures. I think it's fine without the check given in that case it is a no-op.

lightning/src/events/mod.rs Outdated Show resolved Hide resolved
use lightning::events::EventsProvider;

let events = core::cell::RefCell::new(Vec::new());
peer_manager.onion_message_handler().process_pending_events(&|e| events.borrow_mut().push(e));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the reason for this now? Can we not just add a process_pending_events_async method on OnionMessenger just like we do for ChannelManager?

// Buffer an onion message for a disconnected peer
disconnect_peers(&nodes[0], &nodes[1]);
assert!(nodes[0].messenger.next_onion_message_for_peer(nodes[1].node_id).is_none());
nodes[0].messenger.send_onion_message(message, destination, None).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nodes[0].messenger.send_onion_message(message, destination, None).unwrap();
nodes[0].messenger.send_onion_message(message, destination, None).unwrap();
assert!(nodes[0].messenger.next_onion_message_for_peer(nodes[1].node_id).is_none());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would causes us to hit the debug assertion in dequeue_message. We don't prevent someone from calling next_onion_message_for_peer for a disconnected peer. Perhaps we should return None? Calling it means the PeerManager state is out-of-sync with the OnionMessenger state.

Copy link
Contributor Author

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address the nits while waiting on responses for some outstanding comments.

use lightning::events::EventsProvider;

let events = core::cell::RefCell::new(Vec::new());
peer_manager.onion_message_handler().process_pending_events(&|e| events.borrow_mut().push(e));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BackgroundProcessor gets the OnionMessenger through PeerManager, which only knows it is something that implements OnionMessageHandler and EventsProvider. If we instead pass OnionMessenger to BackgroundProcessor, then we would need to add three more type parameters to BackgroundProcessor.


match message_buffers.entry(first_node_id) {
hash_map::Entry::Vacant(e) => match addresses {
None => Err(SendError::InvalidFirstHop(first_node_id)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TheBlueMatt Any thoughts on this?

@TheBlueMatt
Copy link
Collaborator

Needs rebase, but would be happy to see it squashed.

Onion messages are buffered for sending to the next node. Since the
network has limited adoption, connecting directly to a peer may be
necessary. Add an OnionMessageBuffer abstraction that can differentiate
between connected peers and those are pending a connection. This allows
for buffering messages before a connection is established and applying
different buffer policies for peers yet to be connected.
@jkczyz
Copy link
Contributor Author

jkczyz commented Dec 6, 2023

Needs rebase, but would be happy to see it squashed.

Rebased and squashed. Had to refactor the sending methods slightly to log onion message contents.

Comment on lines 599 to 680
let sender = self.node_signer
.get_node_id(Recipient::Node)
.map_err(|_| SendError::GetNodeIdFailed)?;

let peers = self.message_buffers.lock().unwrap()
.iter()
.filter(|(_, buffer)| matches!(buffer, OnionMessageBuffer::ConnectedPeer(_)))
.map(|(node_id, _)| *node_id)
.collect();

let path = self.message_router
.find_path(sender, peers, destination)
.map_err(|_| SendError::PathNotFound)?;

log_trace!(self.logger, "Constructing onion message {}: {:?}", log_suffix, contents);

let result = self.send_onion_message_using_path(path, contents, reply_path);

match result.as_ref() {
Err(SendError::GetNodeIdFailed) => {
log_warn!(self.logger, "Unable to retrieve node id {}", log_suffix);
},
Err(SendError::PathNotFound) => {
log_trace!(self.logger, "Failed to find path {}", log_suffix);
},
Err(e) => {
log_trace!(self.logger, "Failed sending onion message {}: {:?}", log_suffix, e);
},
Ok(SendSuccess::Buffered) => {
log_trace!(self.logger, "Buffered onion message {}", log_suffix);
},
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, this needs to be refactored a little more. Otherwise, not all the errors will be logged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed now.

OnionMessenger::send_onion_message takes an OnionMessagePath. This isn't
very useful as it requires finding a path manually. Instead, have the
method take a Destination and use OnionMessenger's MessageRouter to
construct the path. Later, this will allow for buffering messages where
the first node in the path isn't a direct connection.
MessageRouter::find_path returns a path to use when sending an onion
message. If the first node on the path is not connected or does not
support onion messages, sending will fail with InvalidFirstHop. Instead
of failing outright, buffer the message for later sending once the first
node is a connected peer.
When buffering onion messages for a node that is not connected as a
peer, it's possible that the node does not exist. Include a NetworkGraph
reference in DefaultMessageRouter so that it can be used to check if the
node actually exists. Otherwise, an malicious node may send an onion
message where the reply path's introduction node doesn't exist. This
would result in buffering messages that may never be delivered.
MessageRouter::find_path is given a Destination to reach via a set of
peers. If a path cannot be found, it may return a partial path such that
OnionMessenger can signal a direct connection to the first node in the
path is needed. Include a list of socket addresses in the returned
OnionMessagePath to allow OnionMessenger to know how to connect to the
node.

This allows DefaultMessageRouter to use its NetworkGraph to return
socket addresses for gossiped nodes.
When there isn't a direct connection with the Destination of an
OnionMessage, look up socket addresses from the NetworkGraph. This is
used to signal to OnionMessenger that a direct connection is needed to
send the message.
A MessageRouter may be unable to find a complete path to an onion
message's destination. This could because no such path exists or any
needs on a potential path don't support onion messages. Add an event
that indicates a connection with a node is needed in order to send the
message.
An OnionMessageHandler may buffer messages that can't be sent because
the recipient is not a peer. Have the trait extend EventsProvider so
that implementation so that an Event::ConnectionNeeded can be generated
for any nodes that fall into this category. Also, implement
EventsProvider for OnionMessenger and IgnoringMessageHandler.
OnionMessenger buffers onion messages for nodes that are pending a
connection. To prevent DoS concerns, add a timer_tick_occurred method to
OnionMessageHandler so that buffered messages can be dropped. This will
be called in lightning-background-processor every 10 seconds.
Some code hygiene before another parameter is added and rustfmt is
eventually used.
Simply to avoid excessive wrapping when possible.
OnionMessageHandler implementations now also implement EventsProvider.
Update lightning-background-processor to also process any events the
PeerManager's OnionMessageHandler provides.
lightning-background-processor processes events provided by the
PeerManager's OnionMessageHandler for when a connection is needed. If a
connection is not established in a reasonable amount of time, drop any
buffered onion messages by calling timer_tick_occurred.
Additional tests will be added needing a similar node struct, so
consolidate its usage.
Add tests for onion message buffering checking that messages are cleared
upon disconnection and timed out after MAX_TIMER_TICKS. Also, checks
that ConnectionNeeded events are generated.
@jkczyz
Copy link
Contributor Author

jkczyz commented Dec 6, 2023

Latest push should fix the cargo doc failure in CI.

Ok(())
}
e.get_mut().enqueue_message(onion_message);
Ok(SendSuccess::Buffered)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return value depend on the buffer's connected/pending state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll fix it in a follow-up.

///
/// Only needs to be set if a connection to the node is required. [`OnionMessenger`] may use
/// this to initiate such a connection.
pub addresses: Option<Vec<SocketAddress>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we say, like, first_hop_addresses or first_hop_socket_addresses? Just addresses doesn't tell me much in the context of an onion message path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Will address.

@TheBlueMatt TheBlueMatt merged commit 2d26679 into lightningdevkit:main Dec 8, 2023
15 checks passed
@jkczyz jkczyz mentioned this pull request Dec 8, 2023
TheBlueMatt added a commit that referenced this pull request Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants