Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split prefunded Channel into Inbound/Outbound channels #2077

Merged
merged 25 commits into from Jun 16, 2023

Conversation

dunxen
Copy link
Contributor

@dunxen dunxen commented Mar 6, 2023

Currently, funded and unfunded channels are represented by a single Channel struct. This ends up conflating certain state and makes it harder to reason about / less safe to call appropriate methods on Channel to advance state.

This aims to make prefunded vs. funded states more type-safe and logically separated. Prefunded channels are further split into InboundChannels and OutboundChannels. This makes it a compile-time error to call methods only meant for InboundChannels on OutboundChannels and vice versa. This Inbound/Outbound separation for channels using V1 channel establishment is still very useful when it comes to V2 channel establishment (dual-funded channels): it enhances type safety for what actions the V2 channel initiator (Outbound) and non-iniator (Inbound) can take on these objects once they support V2 channel establishment in future work.

Partially addresses #1621

@TheBlueMatt
Copy link
Collaborator

Does it make sense to drop the ChannelKind enum and instead just actually have three separate maps for channels in the ChannelManager?

@TheBlueMatt
Copy link
Collaborator

Without having actually reviewed the mega-diff, and aside from that, this LGTM. It would be really nice to split this up into at least like 2 or three commits - maybe first create the context object in a regular channel, then add the trait, then split or something like that?

@TheBlueMatt
Copy link
Collaborator

We'll want to coordinate on landing this, I think, given its going to conflict with the world. Get some concept ACKs, then all get online at once for an hour or two and get it landed.

@dunxen
Copy link
Contributor Author

dunxen commented Mar 7, 2023

Does it make sense to drop the ChannelKind enum and instead just actually have three separate maps for channels in the ChannelManager?

So I started this way actually, and it did seem simpler up until it concerned me a little when we're at the stage where we have the channel promoted when funding is generated/signed, and it seems a bit disjoint handling a second map while you're busy with an OccupiedEntry as there are a quite a few such cases.

I think I will split this out like you suggested (thanks!) and we can see how much more "fluff" the enum causes with matching.

@dunxen
Copy link
Contributor Author

dunxen commented Mar 8, 2023

Close to having this split, but with the way I'm doing things with the ChannelContext currently as a field with a getter, I'm really trying to avoid RefCell which seems to want to crop its head up due to constraints with some of the common method signatures. Will try some different patterns, otherwise will push the split (but still kinda broken) changes for ideas.

@TheBlueMatt
Copy link
Collaborator

Hmm, not sure why interior mutability is cropping up? When we're converting from one channel tor to another we should have full ownership of the object, no?

@dunxen
Copy link
Contributor Author

dunxen commented Mar 8, 2023

Hmm, not sure why interior mutability is cropping up? When we're converting from one channel tor to another we should have full ownership of the object, no?

Yeah, so that's fine, it's happening with the methods (ones where there's no converting) and we're trying to maybe read from context via the getter (self.get_context()) in the trait, but maybe we take in &mut self in the signature. Cases like that.

I'll push up some stuff in the morning. Better than my lame explanation :P

@TheBlueMatt
Copy link
Collaborator

Hmm, yea, not 100% sure why still, would love to look at code.

@TheBlueMatt
Copy link
Collaborator

Also, can you write up a quick explainer why this is still the right design in dual-funding? I guess we should call the structs InitiatorChannel and InitiateeChannel rather than inbound/outbound? Is there not sufficient functionality overlap between the initiator/initiatee sides that we should just have the same PreFundingChannel?

@dunxen
Copy link
Contributor Author

dunxen commented Mar 9, 2023

Also, can you write up a quick explainer why this is still the right design in dual-funding? I guess we should call the structs InitiatorChannel and InitiateeChannel rather than inbound/outbound? Is there not sufficient functionality overlap between the initiator/initiatee sides that we should just have the same PreFundingChannel?

I mean we totally could have just one struct for prefunded channels, as tracking who the initiator is will be kept in the context of the InteractiveTxConstructor struct. However, we still end up with the situation for methods involving accept/open that do not overlap and would be nice to have some type-safety around those two in the form of different structs. For V1 channels there are even more non-overlapping methods. I suppose we don't need to worry too much about improving the status quo for V1 channels in the codebase, then I'm happy to just have one PreFundingChannel.

@dunxen
Copy link
Contributor Author

dunxen commented Mar 10, 2023

Lol who knew channel refactoring would be painfully trickier than thought?

With the multi-chan-to-id-map approach there are a bunch of assumptions on the channel_by_id map to be made. It seems quite easy for an uncovered bug to slip through with a refactor.

With the enum-and-single-map approach, it gets pretty ugly, pretty fast.

So taking into account comments above, I'm thinking that we don't mingle V1 and V2 channels in the prefunded state at all. We let V1 channels continue as per usual from start to finish with the Channel struct and then we only introduce a new PendingChannel (or prefunded or whatever) struct for V2 channels which will hold onto the InteractiveTxConstructor object which advances through all its states. Then all the interactive message handling is separate from the V1 channel stuff. (I still feel like splitting that into Initiator/NonInitiator for type-safety around the first bits of the handshake, but just have the pending struct is already way better). We'll keep the trait for sharing the standard behaviour for pending channels.

I feel like there will be less surface area for bugs this way since we'd be touching a lot less. And the case of review and merge conflicts being crazy.

@TheBlueMatt
Copy link
Collaborator

I mean we totally could have just one struct for prefunded channels, as tracking who the initiator is will be kept in the context of the InteractiveTxConstructor struct. However, we still end up with the situation for methods involving accept/open that do not overlap and would be nice to have some type-safety around those two in the form of different structs.

Ah, okay, yea, I guess that makes sense, I just wanted to check that those methods aren't becoming more symmetric and the design would only make sense for legacy channels. If the design only makes sense for them and not v2 channels I'm okay with the awkwardness for legacy, ultimately up to you, as long as it makes some sense for v2 channels.

With the multi-chan-to-id-map approach there are a bunch of assumptions on the channel_by_id map to be made. It seems quite easy for an uncovered bug to slip through with a refactor.

Hmm, what specifically? We have a bunch of assumptions that channels are there, but that's almost exclusively post-funding. Pre-funding we don't care all that much about a channel - we can't have any HTLCs in it, so its pretty safe to just lose it.

@jkczyz jkczyz self-requested a review March 15, 2023 19:10
@dunxen
Copy link
Contributor Author

dunxen commented Mar 20, 2023

Hmm, what specifically? We have a bunch of assumptions that channels are there, but that's almost exclusively post-funding. Pre-funding we don't care all that much about a channel - we can't have any HTLCs in it, so its pretty safe to just lose it.

Yeah you're right. I've finally settled on that route (ignore the latest WIP push which was the enum route) and it's also turning out to be muuuch simpler to review.

@TheBlueMatt
Copy link
Collaborator

The channelmanager.rs state as of this PR appears to still have a single map for a Channel, is that intentional?

@dunxen dunxen force-pushed the 2023-02-splitchannelstate branch 2 times, most recently from 7c2f306 to e1d27b7 Compare March 26, 2023 10:44
@dunxen
Copy link
Contributor Author

dunxen commented Mar 26, 2023

The channelmanager.rs state as of this PR appears to still have a single map for a Channel, is that intentional?

Sorry, was outdated!

@dunxen dunxen marked this pull request as ready for review March 26, 2023 10:47
@codecov-commenter
Copy link

codecov-commenter commented Mar 26, 2023

Codecov Report

Patch coverage: 92.47% and project coverage change: -0.01 ⚠️

Comparison is base (74a9ed9) 90.38% compared to head (5d17790) 90.37%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2077      +/-   ##
==========================================
- Coverage   90.38%   90.37%   -0.01%     
==========================================
  Files         106      106              
  Lines       54219    54350     +131     
  Branches    54219    54350     +131     
==========================================
+ Hits        49005    49120     +115     
- Misses       5214     5230      +16     
Impacted Files Coverage Δ
lightning/src/ln/channel.rs 89.79% <ø> (+0.10%) ⬆️
lightning/src/ln/functional_test_utils.rs 87.29% <0.00%> (-0.87%) ⬇️
lightning/src/ln/channelmanager.rs 86.73% <91.92%> (-0.02%) ⬇️
lightning/src/ln/chanmon_update_fail_tests.rs 98.63% <100.00%> (ø)
lightning/src/ln/functional_tests.rs 98.22% <100.00%> (ø)
lightning/src/ln/onion_route_tests.rs 98.26% <100.00%> (ø)
lightning/src/ln/payment_tests.rs 97.61% <100.00%> (+<0.01%) ⬆️
lightning/src/ln/shutdown_tests.rs 98.17% <100.00%> (ø)

... and 4 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't super carefully review all the channel motion, but generally LGTM.

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
@dunxen
Copy link
Contributor Author

dunxen commented Mar 28, 2023

There is probably still so

Didn't super carefully review all the channel motion, but generally LGTM.

Thanks for another general review pass. The last commit is admittedly still not great for review. Thinking about splitting it up more. There's also some duplication I'm not super happy with now.

@dunxen
Copy link
Contributor Author

dunxen commented Apr 4, 2023

Spent the last while deduplicating some things in the final commit and resolving some nasty borrow conflicts and it's taken way longer than anticipated, but it has made me realise that we probably don't need to worry about generalising things too much.

When we encounter issues with Inbound/Outbound channels those are only ever "Close", but all we ever really need to do is discard and clean up maps. I've been able to simplify some things because of this and will get it up when all the tests are happy. There are a lot of moving parts and I regret touching this lol.

@dunxen dunxen force-pushed the 2023-02-splitchannelstate branch from 5d17790 to 773f785 Compare June 14, 2023 14:35
@dunxen
Copy link
Contributor Author

dunxen commented Jun 14, 2023

Apologies if I missed some feedback. I had to make some changes specifically around when we promote an OutboundV1Channel to a Channel. @wpaulino pointed out correctly that we should be doing that upon successful handling of funding_signed.

Don’t hesitate to point out something important and obvious I’ve missed after the latest push :)

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
}
/// Returns a HTLCStats about inbound pending htlcs
fn get_inbound_pending_htlc_stats(&self, outbound_feerate_update: Option<u32>) -> HTLCStats {
let context = self;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a couple occurrences of this that aren't necessary. Should reduce the changeset to use self instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I initially did this as a pre-diff to reduce interleaving on a large diff, but ended up breaking it up anyway so can remove.

/// if-we-removed-it-already-but-haven't-fully-resolved-they-can-still-send-an-inbound-HTLC
/// corner case properly.
pub fn get_available_balances(&self) -> AvailableBalances {
let context = &self;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly for this variation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address these in a follow-up I think. I believe they still make the diffs that follow more readable without interleaving.

lightning/src/ln/channelmanager.rs Show resolved Hide resolved
Comment on lines +1143 to 1145
pub fn is_funding_initiated(&self) -> bool {
self.channel_state >= ChannelState::FundingSent as u32
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, OutboundV1Channel and InboundV1Channel can neither be in this state. Thus, it's fine that we ignore the corresponding maps when writing ChannelManager. As of now Channel may or not be in this state, though. Going forward, what should be represented as a ChannelState variant vs defined by a specific channel struct type? I think preferably we don't have redundant state to avoid inconsistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would make sense just to have in Channel. I'll leave it as is for now and address it in a further "remove redundant state" PR

}
}

// A not-yet-funded outbound (from holder) channel using V1 channel establishment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here and on the inbound channel, should be a doc comment, not a comment-comment.

@@ -5536,7 +5736,7 @@ where
), chan),
// Note that announcement_signatures fails if the channel cannot be announced,
// so get_channel_update_for_broadcast will never fail by the time we get here.
update_msg: Some(self.get_channel_update_for_broadcast(chan.get()).unwrap()),
update_msg: Some(self.get_channel_update_for_broadcast(&chan.get()).unwrap()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: probably not even worth fixing but this looks unnecessary, the get should return a reference anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. From when I attempted making it a context 🤦‍♂️

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
msg,
});
match peer_state.channel_by_id.entry(chan.channel_id()) {
match peer_state.outbound_v1_channel_by_id.entry(chan.context.channel_id()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a huge fan of putting this back here, would kinda rather put it in channel_by_id here - indeed we don't have a monitor for this channel yet (we'll get one when we receive a funding_signed) but it does have a "real" channel_id at this point, and we have generated a funding tx which we need to pass a FundingAbandoned event for if we fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just weird to have funding_signed on Channel when it can only happen on OutboundV1Channel. We'll still get a DiscardFunding event here, so it's just the channel_id that's no longer the temporary one, but that can be documented.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke some offline with @dunxen and I wonder if we shouldn't have three stages here - the pre-funding stage with a temporary_channel_id key as inbound- and outbound-channels, the post-funding stage where we have a channel_id but haven't gotten to the channel_ready exchange as both inbound- and outbound-channels, and the operation stage where a channel is ready.

That would let us enforce the funding_signed, but also more importantly also remove access to the HTLC updates and commitment signing until the channel is actually ready for them.

I don't think this split should go in this PR, though, we're delayed enough as it is, but there's lots more splitting to do in the future.

Copy link
Contributor Author

@dunxen dunxen Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking forward, that further split makes sense and we might introduce enums for the maps if it makes that easier but not super necessary.

I think I will revert to having OutboundV1Channel::funding_signed on Channel::funding_signed so that at least we have temp_chan_id and chan_id consistency in the maps. Adding an extra state in between will clear be a better solution. Luckily there's room to do that in a further PR.

I know it's not ideal in with the splits we have now, but would we be okay with moving funding_signed back to Channel, @wpaulino? :)

I have not reverted that locally yet and have applied the other feedback. So would just like to hear some thoughts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM!

@dunxen dunxen force-pushed the 2023-02-splitchannelstate branch from 773f785 to d957f36 Compare June 15, 2023 20:45
Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to track all of the follow-up items in an issue so we don't forget and can better determine what should go in this release as well.

@@ -1695,6 +1715,21 @@ macro_rules! break_chan_entry {
}
}

macro_rules! try_v1_outbound_chan_entry {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could just be another arm of the existing try_chan_entry macro, no?

@@ -794,6 +794,17 @@ macro_rules! get_outbound_v1_channel_ref {
}
}

#[cfg(test)]
macro_rules! get_inbound_v1_channel_ref {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big deal, but this was added two commits ago, removed in the previous, and added back here.

@@ -6265,7 +6265,7 @@ impl<Signer: WriteableEcdsaChannelSigner> InboundV1Channel<Signer> {
Ok(chan)
}

pub fn inbound_is_awaiting_accept(&self) -> bool {
pub fn is_awaiting_accept(&self) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also rename get_funding_outbound_created.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okayyy, this LGTM, at least enough to merge it. Sadly there's a handful of places in channelmanager where we are only looking at channel_by_id and need to update to look at the other maps. Can you open a followups issue so that we can track these and @wpaulino's comments and make sure we hit at least the bugs before 0.0.116?

  • close_channel_internal should fallback to force-closure if the channel is in the inbound/outbound sets, though I'm not sure its super critical,
  • update_partial_channel_config should probably be updated to work on any channel
  • internal_shutdown should FC any pending channels
  • do_chain_event needs to call best_block_updated even on unfunded channels so we time out channels if they aren't funded in 2 weeks
  • peer_connected tries to remove outbound unfunded channels on reconnect but doesn't look at the new outbound map (which, now that I look at it, should be generating a ChannelClosed event, but isn't...ugh we need to consolidate the channel close codepaths)

};
($self: ident, $err: expr, $channel_context: expr, $channel_id: expr, PREFUNDED) => {
match $err {
// We should only ever have `ChannelError::Close` when prefunded channels error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean maybe that's true, but let's definitely fix this in a followup to handle the warn/ignore cases properly.

/// those explicitly stated to be allowed after shutdown completes, eg some simple getters).
/// Also returns the list of payment_hashes for channels which we can safely fail backwards
/// immediately (others we will have to allow to time out).
pub fn force_shutdown(&mut self, should_broadcast: bool) -> ShutdownResult {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to move this into the normal channel in the future, then the pre-funding one can just have an abandon method that doesn't take should_broadcast and is clearer.

@@ -3614,8 +3620,8 @@ fn test_peer_disconnected_before_funding_broadcasted() {
nodes[0].node.peer_disconnected(&nodes[1].node.get_our_node_id());
nodes[1].node.peer_disconnected(&nodes[0].node.get_our_node_id());

check_closed_event!(nodes[0], 1, ClosureReason::DisconnectedPeer);
check_closed_event!(nodes[1], 1, ClosureReason::DisconnectedPeer);
check_closed_event(&nodes[0], 1, ClosureReason::DisconnectedPeer, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ISTM this should be true, right? We should be doing a DiscardFunding here.

@@ -2166,19 +2257,19 @@ where
}

/// Helper function that issues the channel close events
fn issue_channel_close_events(&self, channel: &Channel<<SP::Target as SignerProvider>::Signer>, closure_reason: ClosureReason) {
fn issue_channel_close_events(&self, context: &ChannelContext<<SP::Target as SignerProvider>::Signer>, closure_reason: ClosureReason) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

God there's so many codepaths when closing channels, this called manually, or via something else or....good opportunity to clean this up a ton by unifying the various channel close codepaths.

@@ -2076,7 +2137,7 @@ where
Ok(temporary_channel_id)
}

fn list_channels_with_filter<Fn: FnMut(&(&[u8; 32], &Channel<<SP::Target as SignerProvider>::Signer>)) -> bool + Copy>(&self, f: Fn) -> Vec<ChannelDetails> {
fn list_funded_channels_with_filter<Fn: FnMut(&(&[u8; 32], &Channel<<SP::Target as SignerProvider>::Signer>)) -> bool + Copy>(&self, f: Fn) -> Vec<ChannelDetails> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now only called once, we should probably try to re-condense with list_channels or just drop this and inline it in its callsite.

@TheBlueMatt TheBlueMatt merged commit ae9e96e into lightningdevkit:main Jun 16, 2023
14 checks passed
@dunxen dunxen mentioned this pull request Jun 16, 2023
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants