Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix (and test) threaded payment retries #2009

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

The new in-ChannelManager retries logic does retries as two
separate steps, under two separate locks - first it calculates
the amount that needs to be retried, then it actually sends it.
Because the first step doesn't udpate the amount, a second thread
may come along and calculate the same amount and end up retrying
duplicatively.

Because we generally shouldn't ever be processing retries at the
same time, the fix is trivial - simply take a lock at the top of
the retry loop and hold it until we're done.

This resolves (I believe) all the pending followups from #1916.

@TheBlueMatt TheBlueMatt added this to the 0.0.114 milestone Feb 3, 2023
@valentinewallace
Copy link
Contributor

New test is failing

@TheBlueMatt
Copy link
Collaborator Author

Oops, new test was a bite racy itself. Should be good now.

@codecov-commenter
Copy link

codecov-commenter commented Feb 7, 2023

Codecov Report

Base: 87.25% // Head: 87.87% // Increases project coverage by +0.62% 🎉

Coverage data is based on head (0a5a906) compared to base (9f10203).
Patch coverage: 90.30% of modified lines in pull request are covered.

❗ Current head 0a5a906 differs from pull request most recent head 77a0f77. Consider uploading reports for the commit 77a0f77 to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2009      +/-   ##
==========================================
+ Coverage   87.25%   87.87%   +0.62%     
==========================================
  Files         100      101       +1     
  Lines       44110    47662    +3552     
  Branches    44110    47662    +3552     
==========================================
+ Hits        38488    41885    +3397     
- Misses       5622     5777     +155     
Impacted Files Coverage Δ
lightning/src/ln/functional_test_utils.rs 91.25% <0.00%> (+2.72%) ⬆️
lightning/src/sync/fairrwlock.rs 75.00% <0.00%> (ø)
lightning/src/sync/mod.rs 50.00% <50.00%> (ø)
lightning/src/sync/debug_sync.rs 80.00% <92.85%> (-0.35%) ⬇️
lightning/src/ln/payment_tests.rs 96.70% <93.10%> (+0.18%) ⬆️
lightning/src/ln/channelmanager.rs 85.84% <100.00%> (-0.01%) ⬇️
lightning/src/ln/outbound_payment.rs 85.14% <100.00%> (+4.93%) ⬆️
lightning/src/sync/nostd_sync.rs 37.50% <100.00%> (+2.20%) ⬆️
lightning/src/sync/test_lockorder_checks.rs 100.00% <100.00%> (ø)
lightning/src/util/test_utils.rs 72.94% <100.00%> (+7.99%) ⬆️
... and 13 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@TheBlueMatt
Copy link
Collaborator Author

Grr hit thread 'ln::payment_tests::test_threaded_payment_retries' panicked at 'assertion failed: peer.try_lock().is_ok()', lightning/src/ln/channelmanager.rs:3732:17 which means I have to update the lockorder tests again...

@TheBlueMatt
Copy link
Collaborator Author

Done, hopefully it passes now, had to add two more commits upfront, though.

@TheBlueMatt
Copy link
Collaborator Author

All CI failures were due to crates.io being down, I kicked them.

pub(crate) enum LockHeldState {
HeldByThread,
NotHeldByThread,
Unsupported,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment explaining when a lock held state cannot be determined would be helpful

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better, I cfg-flagged it so its not even there for test builds.

Copy link
Contributor

@alecchendev alecchendev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass, some nits and questions :)

Comment on lines -116 to +120
#[cfg(feature = "std")] // If we put this on the `if`, we get "attributes are not yet allowed on `if` expressions" on 1.41.1
impl<'a> Drop for TestRouter<'a> {
fn drop(&mut self) {
if std::thread::panicking() {
return;
#[cfg(feature = "std")] {
if std::thread::panicking() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come it's okay to go against this comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that comment originally to get CI to pass, but this way also works and is better so we can get test coverage on no-std. Jeff pointed out the fix here: #1916 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh I didn't even realize it was using an extra scope, cool 👍

lightning/src/ln/payment_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/payment_tests.rs Outdated Show resolved Hide resolved
@@ -248,6 +262,13 @@ impl<T> Mutex<T> {
}
}

impl <T> LockTestExt for Mutex<T> {
Copy link
Contributor

@alecchendev alecchendev Feb 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a general question, but how come rust-lightning has its own synchronization primitives like Mutex and RwLock? From briefly looking at the fields I'm guessing it's to help with development/testing purposes, so I was also wondering if that has any meaningful impact on performance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, we don't actually use our own custom locks, we always stub out to the std ones, except in two cases:

a) in tests, we wrap them in all kinds of debug testing to ensure we don't have lockorder violations,
b) the FairRwLock provides fairness guarantees that the std RwLock does not (though its a relatively thin wrapper around RwLock).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a better look at the locks and that makes more sense now 👍

lightning/src/ln/payment_tests.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
Comment on lines -116 to +120
#[cfg(feature = "std")] // If we put this on the `if`, we get "attributes are not yet allowed on `if` expressions" on 1.41.1
impl<'a> Drop for TestRouter<'a> {
fn drop(&mut self) {
if std::thread::panicking() {
return;
#[cfg(feature = "std")] {
if std::thread::panicking() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that comment originally to get CI to pass, but this way also works and is better so we can get test coverage on no-std. Jeff pointed out the fix here: #1916 (comment)

@@ -478,6 +480,7 @@ impl OutboundPayments {
FH: Fn() -> Vec<ChannelDetails>,
L::Target: Logger,
{
let _single_thread = self.retry_lock.lock().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return early if we fail to acquire the lock?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh? I mean, we could, but I don't want to bother adding more code than necessary (and we probably always want to at least run once, so we'd have to do the whole rigamarole that peer_handler does), and we really shouldnt have two threads calling this, at least as long as we only generate one PendingHTLCsForwardable event.

lightning/src/sync/debug_sync.rs Show resolved Hide resolved
@TheBlueMatt
Copy link
Collaborator Author

Rebased and addressed comments.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after squash

lightning/src/ln/payment_tests.rs Show resolved Hide resolved
@TheBlueMatt
Copy link
Collaborator Author

Squashed and removed the spurious newline.

arik-so
arik-so previously approved these changes Feb 16, 2023
lightning/src/sync/mod.rs Outdated Show resolved Hide resolved
In anticipation of the next commit(s) adding threaded tests, we
need to ensure our lockorder checks work fine with multiple
threads. Sadly, currently we have tests in the form
`assert!(mutex.try_lock().is_ok())` to assert that a given mutex is
not locked by the caller to a function.

The fix is rather simple given we already track mutexes locked by a
thread in our `debug_sync` logic - simply replace the check with a
new extension trait which (for test builds) checks the locked state
by only looking at what was locked by the current thread.
The new in-`ChannelManager` retries logic does retries as two
separate steps, under two separate locks - first it calculates
the amount that needs to be retried, then it actually sends it.
Because the first step doesn't udpate the amount, a second thread
may come along and calculate the same amount and end up retrying
duplicatively.

Because we generally shouldn't ever be processing retries at the
same time, the fix is trivial - simply take a lock at the top of
the retry loop and hold it until we're done.
@TheBlueMatt
Copy link
Collaborator Author

Removed spurious cfg tag inclusion:

$ git diff-tree -U2 77a0f7746 d98632973
diff --git a/lightning/src/sync/mod.rs b/lightning/src/sync/mod.rs
index bbf3998f7..50ef40e29 100644
--- a/lightning/src/sync/mod.rs
+++ b/lightning/src/sync/mod.rs
@@ -20,7 +20,4 @@ pub use debug_sync::*;
 mod test_lockorder_checks;
 
-#[cfg(all(any(feature = "_bench_unstable", not(test)), feature = "std"))]
-
-
 #[cfg(all(feature = "std", any(feature = "_bench_unstable", not(test))))]
 pub(crate) mod fairrwlock;

@TheBlueMatt TheBlueMatt merged commit d0b8f45 into lightningdevkit:main Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants