Do not broadcast commitment txn on Permanent mon update failure #1106

TheBlueMatt · 2021-10-05T05:21:15Z

See doc updates for more info on the edge case this prevents, and
there isn't really a strong reason why we would need to broadcast
the latest state immediately. Specifically, in the case of HTLC
claims (the most important reason to ensure we have state on chain
if it cannot be persisted), we will still force-close if there are
HTLCs which need claiming and are going to expire.

Surprisingly, there were no tests which failed as a result of this
change, but a new one has been added.

codecov · 2021-10-05T05:46:11Z

Codecov Report

Merging #1106 (5f848de) into main (107c6c7) will increase coverage by 1.71%.
The diff coverage is 90.00%.

❗ Current head 5f848de differs from pull request most recent head f99da12. Consider uploading reports for the commit f99da12 to get more accurate results

@@            Coverage Diff             @@
##             main    #1106      +/-   ##
==========================================
+ Coverage   90.40%   92.12%   +1.71%     
==========================================
  Files          68       66       -2     
  Lines       34796    40933    +6137     
==========================================
+ Hits        31458    37708    +6250     
+ Misses       3338     3225     -113

Impacted Files	Coverage Δ
lightning/src/ln/channelmanager.rs	`89.20% <50.00%> (+5.68%)`	⬆️
lightning/src/chain/channelmonitor.rs	`91.28% <100.00%> (+0.33%)`	⬆️
lightning/src/ln/chanmon_update_fail_tests.rs	`98.79% <100.00%> (+1.14%)`	⬆️
lightning/src/chain/mod.rs	`58.82% <0.00%> (-2.29%)`	⬇️
lightning-background-processor/src/lib.rs	`93.75% <0.00%> (-0.49%)`	⬇️
lightning/src/chain/onchaintx.rs	`94.27% <0.00%> (-0.48%)`	⬇️
lightning/src/ln/onion_utils.rs	`94.91% <0.00%> (-0.46%)`	⬇️
lightning-invoice/src/utils.rs	`84.09% <0.00%> (-0.18%)`	⬇️
lightning-persister/src/lib.rs	`94.21% <0.00%> (-0.10%)`	⬇️
lightning/src/ln/payment_tests.rs	`98.75% <0.00%> (-0.09%)`	⬇️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 107c6c7...f99da12. Read the comment docs.

lightning/src/chain/channelmonitor.rs

valentinewallace · 2021-10-12T17:23:03Z

lightning/src/chain/channelmonitor.rs

+	/// or, e.g. we've moved on to a different watchtower and cannot update with all watchtowers
+	/// that were previously informed of this channel).


Trying to parse this watchtower case -- in this situation, we're switching watchtowers, but have no way of contacting the old watchtower to just delete our data, therefore the channel needs to close via PermanentFailure?

Yea, but honestly its confusing, I replaced it with a better example.

lightning/src/chain/channelmonitor.rs

valentinewallace · 2021-10-12T18:05:30Z

lightning/src/chain/channelmonitor.rs

+	/// [`ChannelMonitor::get_latest_holder_commitment_txn`] once you've safely ensured no further
+	/// off-chain updates to the channel can occur.


Checking -- "ensure no further off-chain updates can occur" == "applied the final monitor update mentioned on L170"? Or?

It means "no ChannelManager still knows about this channel", I reworded it.

lightning/src/chain/channelmonitor.rs

valentinewallace · 2021-10-12T19:13:40Z

Bit confused how to implement these new requirements in the sample. I guess we now need a proxy layer implementing Persist between the FilesystemPersister and ChainMonitor?

TheBlueMatt · 2021-10-12T19:28:27Z

Bit confused how to implement these new requirements in the sample.

I think we, in general, need to think hard about how we handle disk failures in the sample/lightning-persister. If the disk gets unplugged while we're running, we should just shut down and move on, but there's no way to get an error out of the Persister. The way the code works today is unsafe in this condition - if the disk gets unplugged, we'll broadcast the latest state and keep going, if the user then plugs the disk back in and restarts we may revoke the now-broadcasted state.

I think the end result for the sample needs to be a "really definitely broadcast the latest state from the channelmonitor for channel X" command, which is marked unsafe. We should probably also have a programatic way to detect monitors in this condition and not just rely on the logs in ChannelManager deserialization, I'll work on that bit here.

TheBlueMatt · 2021-10-20T01:33:43Z

Rebased on upstream. Need to make sure all the updated content in the failure enum docs is still kept and should rename temporaryfailure -> asyncpersist or so.

codecov-commenter · 2022-07-17T20:11:55Z

Codecov Report

Base: 90.78% // Head: 91.21% // Increases project coverage by +0.43% 🎉

Coverage data is based on head (74745cb) compared to base (48d21ba).
Patch coverage: 89.09% of modified lines in pull request are covered.

❗ Current head 74745cb differs from pull request most recent head 52934e0. Consider uploading reports for the commit 52934e0 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1106      +/-   ##
==========================================
+ Coverage   90.78%   91.21%   +0.43%     
==========================================
  Files          86       87       +1     
  Lines       46631    50794    +4163     
  Branches    46631    50794    +4163     
==========================================
+ Hits        42335    46333    +3998     
- Misses       4296     4461     +165

Impacted Files	Coverage Δ
lightning/src/chain/mod.rs	`68.18% <ø> (ø)`
lightning/src/util/errors.rs	`72.22% <0.00%> (ø)`
lightning/src/chain/channelmonitor.rs	`91.21% <75.00%> (+<0.01%)`	⬆️
lightning/src/ln/channelmanager.rs	`88.11% <83.23%> (+3.05%)`	⬆️
lightning-invoice/src/payment.rs	`93.13% <87.50%> (+2.26%)`	⬆️
lightning/src/util/persist.rs	`95.23% <87.50%> (+0.50%)`	⬆️
lightning/src/chain/chainmonitor.rs	`97.78% <88.46%> (-0.28%)`	⬇️
lightning-persister/src/lib.rs	`93.45% <100.00%> (ø)`
lightning/src/ln/chanmon_update_fail_tests.rs	`97.72% <100.00%> (+<0.01%)`	⬆️
lightning/src/ln/channel.rs	`88.66% <100.00%> (ø)`
... and 17 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

TheBlueMatt · 2022-07-18T01:59:43Z

Finally getting back to the monitor work, finally updated this to include a monitor update failure type rework, but no further functional changes.

lightning/src/chain/channelmonitor.rs

lightning/src/ln/channelmanager.rs

lightning/src/chain/chainmonitor.rs

lightning/src/ln/channelmanager.rs

lightning/src/ln/chanmon_update_fail_tests.rs

TheBlueMatt · 2022-08-16T22:32:41Z

Oops, somehow I totally lost track of this PR having a pending review, so sorry about that! Rebased on latest git and addressed feedback.

See doc updates for more info on the edge case this prevents, and there isn't really a strong reason why we would need to broadcast the latest state immediately. Specifically, in the case of HTLC claims (the most important reason to ensure we have state on chain if it cannot be persisted), we will still force-close if there are HTLCs which need claiming and are going to expire. Surprisingly, there were no tests which failed as a result of this change, but a new one has been added.

TheBlueMatt · 2022-09-15T19:38:17Z

Rebased to address conflicts and went ahead and squashed as its been a while.

lightning/src/ln/channelmanager.rs

jkczyz · 2022-09-20T23:37:22Z

lightning/src/ln/channelmanager.rs

@@ -4492,7 +4492,7 @@ impl<Signer: Sign, M: Deref, T: Deref, K: Deref, F: Deref, L: Deref> ChannelMana
 					// We do not do a force-close here as that would generate a monitor update for
 					// a monitor that we didn't manage to store (and that we don't care about - we
 					// don't respond with the funding_signed so the channel can never go on chain).
-					let (_monitor_update, failed_htlcs) = chan.force_shutdown(true);
+					let (_monitor_update, failed_htlcs) = chan.force_shutdown(false);


Looks like there is not test for this scenario. All tests pass when reverting to true.

Its unobservable - note that the monitor_update, which is the place the bool is propagated to, is drop'd.

jkczyz · 2022-09-26T18:36:34Z

lightning/src/util/errors.rs

 	/// An attempt to call watch/update_channel returned an Err (ie you did this!), causing the
 	/// attempted action to fail.


Could you update the docs since it's no longer from an Err? Would be good to link to the corresponding method docs and maybe phrase the docs to be a little clearer, too.

+1, also a bit weird that it's still considered an APIError variant, but that might require some more brainstorming than a simple rename/doc update.

Yea, did update the docs, but agree - this shouldnt really be an error at all, it should just be Ok. I think there's only one(-ish) place where it Errs - sending payments - so it should be easy to do in a followup.

wpaulino · 2022-09-26T19:24:41Z

lightning/src/chain/mod.rs

+	/// When this is returned, [`ChannelManager`] will force-close the channel but *not* broadcast
+	/// our current commitment transaction. This avoids a dangerous case where a local disk failure
+	/// (e.g. the Linux-default remounting of the disk as read-only) causes [`PermanentFailure`]s
+	/// for all monitor updates. If we were to broadcast our latest commitment transaction and then
+	/// restart, we could end up reading a previous [`ChannelMonitor`] and [`ChannelManager`],
+	/// revoking our now-broadcasted state before seeing it confirm and losing all our funds.


IIUC, while the user could choose not to broadcast the latest commitment transaction, which has been revoked, it could still be done by LDK if it contained any pending HTLCs that are expiring soon.

Indeed, somewhat of a separate issue that should be addressed in #1593

lightning/src/ln/channel.rs

wpaulino · 2022-09-26T19:56:43Z

lightning/src/util/errors.rs

 	/// An attempt to call watch/update_channel returned an Err (ie you did this!), causing the
 	/// attempted action to fail.


+1, also a bit weird that it's still considered an APIError variant, but that might require some more brainstorming than a simple rename/doc update.

wpaulino

LGTM, feel free to squash.

lightning/src/ln/channel.rs

jkczyz

Yes, please squash.

lightning/src/util/errors.rs

TheBlueMatt · 2022-09-29T09:02:30Z

Addressed one more comment and squashed. Diff since last push:

$ git diff-tree -U1 fc18acb4 4ac58048
diff --git a/lightning/src/util/errors.rs b/lightning/src/util/errors.rs
index 83324eab6..ad6993542 100644
--- a/lightning/src/util/errors.rs
+++ b/lightning/src/util/errors.rs
@@ -48,7 +48,9 @@ pub enum APIError {
 	},
-	/// An attempt to call watch/update_channel returned a
-	/// [`ChannelMonitorUpdateStatus::InProgress`] indicating the persistence of a monitor update
-	/// is awaiting async resolution. Once it resolves the attempted action should complete
-	/// automatically.
+	/// An attempt to call [`chain::Watch::watch_channel`]/[`chain::Watch::update_channel`]
+	/// returned a [`ChannelMonitorUpdateStatus::InProgress`] indicating the persistence of a
+	/// monitor update is awaiting async resolution. Once it resolves the attempted action should
+	/// complete automatically.
 	///
+	/// [`chain::Watch::watch_channel`]: crate::chain::Watch::watch_channel
+	/// [`chain::Watch::update_channel`]: crate::chain::Watch::update_channel
 	/// [`ChannelMonitorUpdateStatus::InProgress`]: crate::chain::ChannelMonitorUpdateStatus::InProgress

lightning/src/util/errors.rs

When a `chain::Watch` `ChannelMonitor` update method is called, the user has three options: (a) persist the monitor update immediately and return success, (b) fail to persist the monitor update immediately and return failure, (c) return a flag indicating the monitor update is in progress and will complete in the future. (c) is rather harmless, and in some deployments should be expected to be the return value for all monitor update calls, but currently requires returning `Err(ChannelMonitorUpdateErr::TemporaryFailure)` which isn't very descriptive and sounds scarier than it is. Instead, here, we change the return type used to be a single enum (rather than a Result) and rename `TemporaryFailure` `UpdateInProgress`.

If we receive a monitor event from a forwarded-to channel which contains a preimage for an HTLC, we have to propogate that preimage back to the forwarded-from channel monitor. However, once we have that update, we're running in a relatively unsafe state - we have the preimage in memory, but if we were to crash the forwarded-to channel monitor will not regenerate the update with the preimage for us. If we haven't managed to write the monitor update to the forwarded-from channel by that point, we've lost the preimage, and, thus, money!

This much more accurately represents the error, indicating that a monitor update is in progress asynchronously and may complete at a later time.

TheBlueMatt · 2022-09-29T20:28:29Z

Oops, sorry, reordered the missing hunk into the right commit, no ultimate diff, though.

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from bd79c03 to e6966ce Compare October 5, 2021 06:02

valentinewallace reviewed Oct 12, 2021

View reviewed changes

TheBlueMatt added this to the 0.1 milestone Oct 13, 2021

TheBlueMatt mentioned this pull request Oct 13, 2021

Hide internals of ChainMonitor behind getter #1112

Merged

valentinewallace mentioned this pull request Oct 14, 2021

Persist ChannelMonitors after new blocks are connected #1108

Merged

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from 5f848de to 6ad8583 Compare October 20, 2021 01:33

TheBlueMatt marked this pull request as draft October 20, 2021 01:34

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from 6ad8583 to f99da12 Compare October 20, 2021 02:55

TheBlueMatt mentioned this pull request Apr 22, 2022

Improve Robustness of Inbound MPP Claims Across Restart #1434

Merged

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from f99da12 to 2572ac5 Compare July 17, 2022 20:11

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from 2572ac5 to 0fe1f28 Compare July 18, 2022 01:59

TheBlueMatt marked this pull request as ready for review July 18, 2022 01:59

TheBlueMatt added the Seeking Code Review label Jul 18, 2022

wpaulino requested review from wpaulino and valentinewallace July 18, 2022 22:52

wpaulino removed the Seeking Code Review label Jul 18, 2022

wpaulino removed the request for review from valentinewallace July 18, 2022 22:54

wpaulino added the Seeking Code Review label Jul 18, 2022

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from 0fe1f28 to b3ff257 Compare July 21, 2022 05:17

wpaulino reviewed Jul 28, 2022

View reviewed changes

tnull removed the Seeking Code Review label Aug 15, 2022

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from dc2888a to a9f656a Compare August 16, 2022 22:32

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from a9f656a to e3347f1 Compare August 16, 2022 22:39

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from 019aadf to 2ad3987 Compare August 24, 2022 21:20

TheBlueMatt mentioned this pull request Aug 25, 2022

Update Monitor Persistence Docs with Async Specifics #1684

Closed

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from 2ad3987 to 86a5a28 Compare August 25, 2022 22:57

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from 86a5a28 to 24f8279 Compare September 15, 2022 19:38

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from 24f8279 to 4825a96 Compare September 15, 2022 19:52

jkczyz reviewed Sep 20, 2022

View reviewed changes

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from 4825a96 to c0bb4f4 Compare September 25, 2022 07:50

jkczyz reviewed Sep 26, 2022

View reviewed changes

wpaulino reviewed Sep 26, 2022

View reviewed changes

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from c0bb4f4 to 74745cb Compare September 28, 2022 15:14

wpaulino reviewed Sep 28, 2022

View reviewed changes

lightning/src/ln/channel.rs Outdated Show resolved Hide resolved

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from 74745cb to fc18acb Compare September 28, 2022 21:11

jkczyz reviewed Sep 29, 2022

View reviewed changes

lightning/src/util/errors.rs Outdated Show resolved Hide resolved

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from fc18acb to 4ac5804 Compare September 29, 2022 09:02

jkczyz reviewed Sep 29, 2022

View reviewed changes

lightning/src/util/errors.rs Show resolved Hide resolved

TheBlueMatt added 6 commits September 29, 2022 20:27

Rename Channel::monitor_update_failed to monitor_updating_paused

721a858

Rename handle_monitor_err!() handle_monitor_update_res!

fb0a481

Rename APIError::MonitorUpdateFailed to MonitorUpdateInProgress

f961dae

This much more accurately represents the error, indicating that a monitor update is in progress asynchronously and may complete at a later time.

Clarify ambiguous comment in persist methods

52934e0

TheBlueMatt force-pushed the 2021-10-no-perm-err-broadcast branch from 4ac5804 to 52934e0 Compare September 29, 2022 20:28

wpaulino approved these changes Sep 29, 2022

View reviewed changes

jkczyz approved these changes Sep 29, 2022

View reviewed changes

TheBlueMatt merged commit 7544030 into lightningdevkit:main Sep 29, 2022

tnull mentioned this pull request Aug 29, 2023

Replace KVStorePersister with KVStore #2472

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not broadcast commitment txn on Permanent mon update failure #1106

Do not broadcast commitment txn on Permanent mon update failure #1106

TheBlueMatt commented Oct 5, 2021

codecov bot commented Oct 5, 2021 •

edited

Loading

valentinewallace Oct 12, 2021

TheBlueMatt Oct 12, 2021

valentinewallace Oct 12, 2021

TheBlueMatt Oct 12, 2021

valentinewallace commented Oct 12, 2021

TheBlueMatt commented Oct 12, 2021

TheBlueMatt commented Oct 20, 2021

codecov-commenter commented Jul 17, 2022 •

edited

Loading

TheBlueMatt commented Jul 18, 2022

TheBlueMatt commented Aug 16, 2022

TheBlueMatt commented Sep 15, 2022

jkczyz Sep 20, 2022

TheBlueMatt Sep 25, 2022

jkczyz Sep 26, 2022

wpaulino Sep 26, 2022

TheBlueMatt Sep 28, 2022

wpaulino Sep 26, 2022

TheBlueMatt Sep 28, 2022

wpaulino Sep 26, 2022

wpaulino left a comment

jkczyz left a comment

TheBlueMatt commented Sep 29, 2022

TheBlueMatt commented Sep 29, 2022

		/// or, e.g. we've moved on to a different watchtower and cannot update with all watchtowers
		/// that were previously informed of this channel).

		/// [`ChannelMonitor::get_latest_holder_commitment_txn`] once you've safely ensured no further
		/// off-chain updates to the channel can occur.

		/// An attempt to call watch/update_channel returned an Err (ie you did this!), causing the
		/// attempted action to fail.

Do not broadcast commitment txn on Permanent mon update failure #1106

Do not broadcast commitment txn on Permanent mon update failure #1106

Conversation

TheBlueMatt commented Oct 5, 2021

codecov bot commented Oct 5, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

valentinewallace commented Oct 12, 2021

TheBlueMatt commented Oct 12, 2021

TheBlueMatt commented Oct 20, 2021

codecov-commenter commented Jul 17, 2022 • edited Loading

Codecov Report

TheBlueMatt commented Jul 18, 2022

TheBlueMatt commented Aug 16, 2022

TheBlueMatt commented Sep 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wpaulino left a comment

Choose a reason for hiding this comment

jkczyz left a comment

Choose a reason for hiding this comment

TheBlueMatt commented Sep 29, 2022

TheBlueMatt commented Sep 29, 2022

codecov bot commented Oct 5, 2021 •

edited

Loading

codecov-commenter commented Jul 17, 2022 •

edited

Loading