-
Notifications
You must be signed in to change notification settings - Fork 421
Add a method to avoid re-persisting monitors during startup #3996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a method to avoid re-persisting monitors during startup #3996
Conversation
👋 Thanks for assigning @tnull as a reviewer! |
f968fa1
to
e45bfed
Compare
Yeah, I'd also lean towards not backporting this, also as it changes API (even though just extending). |
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3996 +/- ##
==========================================
- Coverage 88.93% 88.76% -0.17%
==========================================
Files 174 176 +2
Lines 124575 129396 +4821
Branches 124575 129396 +4821
==========================================
+ Hits 110790 114858 +4068
- Misses 11287 11929 +642
- Partials 2498 2609 +111
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
✅ Added second reviewer: @joostjager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to squash!
/// Using this over [`chain::Watch::watch_channel`] avoids re-persisting a [`ChannelMonitor`] | ||
/// that hasn't changed, slowing down startup. | ||
/// | ||
/// Note that this method *can* be used if additional blocks were replayed against the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understanding the assumptions and restrictions of this API is still relatively hard. In particular, how can users be sure which of the APIs on ChannelMonitor
are safe to use and which ones change(d) its state in some way that will not be replayed again on a restart.
?
And, say users currently just use safe APIs or don't access ChannelMonitor
, start using load_existing_monitor
, and then some time down the road decide they need to mess with ChannelMonitor
s for some reason. Do we expect them to remember the fineprint of this API?
IMO, it would be still very much preferable to end up with a single API method for loading monitors that does the optimal thing under the hood, depending on whether the monitor state is 'dirty' or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I agree in principle, but we also can't perfectly know - someone can in theory apply a ChannelMonitorUpdate
that won't be replayed on startup. But, ultimately, I think this will work in 99.99% of cases - a user deciding to do a Thing during their startup routing and then deciding not to it again if they hit a crash and restart seems kinda wild, unless they have a bug and don't persist but do delete the instructions to do the Thing.
Im open to ideas for how to better phrase this, I added a quick "basically this should never happen" when squasing tho:
$ git diff-tree -U5 d79e08f86 cfca2dc97
diff --git a/lightning/src/chain/chainmonitor.rs b/lightning/src/chain/chainmonitor.rs
index 951aad7f78..58a2dbcdb3 100644
--- a/lightning/src/chain/chainmonitor.rs
+++ b/lightning/src/chain/chainmonitor.rs
@@ -833,11 +833,11 @@ where
///
/// Note that this method *can* be used if additional blocks were replayed against the
/// [`ChannelMonitor`] or if a [`ChannelMonitorUpdate`] loaded from disk was replayed such that
/// it will replayed on startup, and in general can only *not* be used if you directly accessed
/// the [`ChannelMonitor`] and changed its state in some way that will not be replayed again on
- /// a restart.
+ /// a restart. Such direct access should generally never occur for most LDK-based nodes.
///
/// For [`ChannelMonitor`]s which were last serialized by an LDK version prior to 0.1 this will
/// fall back to calling [`chain::Watch::watch_channel`] and persisting the [`ChannelMonitor`].
/// See the release notes for LDK 0.1 for more information on this requirement.
///
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With ldk-node more and more being the predominant way to run this code, we can still ensure that it is safe in most cases.
d79e08f
to
cfca2dc
Compare
/// [`ChannelMonitor`]s which do not need to be persisted (i.e. were last written by LDK 0.1 or | ||
/// later) will be loaded without persistence and this method will return | ||
/// [`ChannelMonitorUpdateStatus::Completed`]. | ||
pub fn load_existing_monitor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure: for ldk-node, the idea is that it will call into this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I anticipate nearly every implementation will call this instead.
(32, pending_funding, optional_vec), | ||
(34, alternative_funding_confirmed, option), | ||
}); | ||
let written_by_0_1_or_later = payment_preimages_with_info.is_some(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add comment why this determines the version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was pretty self-explanitory given the code, but, sure :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't to me. payment_preimage_with_info
was also added in 0.1 I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, answered in diff. Ty.
Prior to LDK 0.1, in rare cases we could replay payment claims to `ChannelMonitor`s on startup, which we then expected to be persisted prior to normal node operation. This required re-persisting `ChannelMonitor`s after deserializing the `ChannelManager`, delaying startup in some cases substantially. In 0.1 we fixed this, moving claim replays to the background to run after the `ChannelManager` starts operating (and only updating/persisting changes to the `ChannelMonitor`s which need it). However, we didn't actually enable this meaningfully in our API - nearly all users use our `ChainMonitor` and the only way to get a chanel into `ChainMonitor` is through the normal flow which expects to persist. Here we add a simple method to load `ChannelMonitor`s into the `ChainMonitor` without persisting them.
cfca2dc
to
d377958
Compare
Added a comment: $ git diff-tree -U1 cfca2dc97 d37795800c
diff --git a/lightning/src/chain/channelmonitor.rs b/lightning/src/chain/channelmonitor.rs
index e22b04570c..c0941480af 100644
--- a/lightning/src/chain/channelmonitor.rs
+++ b/lightning/src/chain/channelmonitor.rs
@@ -6093,2 +6093,4 @@ impl<'a, 'b, ES: EntropySource, SP: SignerProvider> ReadableArgs<(&'a ES, &'b SP
});
+ // Note that `payment_preimages_with_info` was added (and is always written) in LDK 0.1, so
+ // we can use it to determine if this monitor was last written by LDK 0.1 or later.
let written_by_0_1_or_later = payment_preimages_with_info.is_some(); |
} | ||
|
||
// At every point where we get a monitor update, we should be able to send a useful monitor | ||
// to a watchtower and disk... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems out of place? Or maybe I don't understand the purpose of this fn. What I see is serialize the monitor, deserialize it and then load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.1; | ||
// Note that a ChannelMonitor might not round-trip exactly here as we have tests that were | ||
// serialized prior to LDK 0.1 and re-serializing them will flip the "written after LDK | ||
// 0.1" flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the consequence of this, or the intention of this message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to LDK 0.1, in rare cases we could replay payment claims to
ChannelMonitor
s on startup, which we then expected to be persisted prior to normal node operation. This required re-persistingChannelMonitor
s after deserializing theChannelManager
, delaying startup in some cases substantially.In 0.1 we fixed this, moving claim replays to the background to run after the
ChannelManager
starts operating (and only updating/persisting changes to theChannelMonitor
s which need it). However, we didn't actually enable this meaningfully in our API - nearly all users use ourChainMonitor
and the only way to get a chanel intoChainMonitor
is through the normal flow which expects to persist.Here we add a simple method to load
ChannelMonitor
s into theChainMonitor
without persisting them.I'm kinda on the fence about backporting this. On the one hand its a pretty nontrivial speedup for users who switch to the new interface. On the other hand, it is a "feature" and its kinda weird to backport a feature.