Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow async events processing without holding total_consistency_lock #2199

Merged

Conversation

tnull
Copy link
Contributor

@tnull tnull commented Apr 18, 2023

Fixes #2003.

Unfortunately, the RAII types used by RwLock are not Send, which is why they can't be held over await boundaries. In order to allow asynchronous events processing in multi-threaded environments, we here allow to process events without holding the total_consistency_lock. We do so by cloning the events and only draining and persisting the queue after they have successfully been processed.

The first commit reverts a prior commit of #2177, as we now want the behavior of the two process_event methods to diverge, i.e., want to avoid cloning in the sync case.

I tried to be minimally invasive as the event processing will receive a general overhaul with #2167 and follow-ups and any more substantial changes would likely only make sense after they have landed.

@tnull
Copy link
Contributor Author

tnull commented Apr 18, 2023

Currently fails due to a previously-silent panic in BP tests that, due to the behavior of the tokio runtime, wasn't surfaced and caught before. Looking into that.

@codecov-commenter
Copy link

codecov-commenter commented Apr 18, 2023

Codecov Report

Patch coverage: 94.11% and project coverage change: +1.04 🎉

Comparison is base (2ebbe6f) 91.34% compared to head (a5358d0) 92.38%.

❗ Current head a5358d0 differs from pull request most recent head f2453b7. Consider uploading reports for the commit f2453b7 to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2199      +/-   ##
==========================================
+ Coverage   91.34%   92.38%   +1.04%     
==========================================
  Files         102      104       +2     
  Lines       50470    61358   +10888     
  Branches    50470    61358   +10888     
==========================================
+ Hits        46103    56688   +10585     
- Misses       4367     4670     +303     
Impacted Files Coverage Δ
lightning/src/ln/channelmanager.rs 91.65% <75.00%> (+2.48%) ⬆️
lightning-background-processor/src/lib.rs 83.51% <100.00%> (+6.40%) ⬆️
lightning-net-tokio/src/lib.rs 78.41% <100.00%> (ø)

... and 64 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@tnull tnull force-pushed the 2023-04-fix-async-event-processing branch 3 times, most recently from 9d6077b to 5467f97 Compare April 18, 2023 14:36
@tnull
Copy link
Contributor Author

tnull commented Apr 18, 2023

Currently fails due to a previously-silent panic in BP tests that, due to the behavior of the tokio runtime, wasn't surfaced and caught before. Looking into that.

Correction: after fixing the CI script, this should now really fail until we fix the bug..

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
@TheBlueMatt TheBlueMatt added this to the 0.0.115 milestone Apr 18, 2023
Just two trivial compiler warnings that are unrelated to the changes
made here.
Currently the BP `futures` tests rely on `std`. In order to actually
have them run, we should enable `std`, i.e., remove
`--no-default-features`.
@tnull tnull force-pushed the 2023-04-fix-async-event-processing branch from 5467f97 to c9cfd20 Compare April 19, 2023 09:13
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry can you not break out the macro? Not because it's wrong here but because there's a lot more complexity coming in a followup PR in there and we'll just have to add it again.

@tnull tnull force-pushed the 2023-04-fix-async-event-processing branch from c9cfd20 to dd48d55 Compare April 20, 2023 10:43
@tnull
Copy link
Contributor Author

tnull commented Apr 20, 2023

Sorry can you not break out the macro? Not because it's wrong here but because there's a lot more complexity coming in a followup PR in there and we'll just have to add it again.

Alright, dropped the revert commit and now also cloning in the sync case.

@tnull tnull force-pushed the 2023-04-fix-async-event-processing branch from dd48d55 to d7de357 Compare April 20, 2023 10:46
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
let mut pending_events = $self.pending_events.lock().unwrap();
pending_events.drain(..num_events);
processed_all_events = pending_events.is_empty();
$self.pending_events_processor.store(false, Ordering::Release);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this only happen if !processed_all_events? Not a big deal either way, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean if we processed all events? Yeah, I think I'd leave it as is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no, I mean literally just move the setter here into a check for if we're about to go around again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, had understood as much, but we def. need to reset in the case we leave the method. We could have moved the compare_exchange out of the loop and only reset the flag on exit, but given that it's a rare edge case anyways I thought it made sense to leave as is.

Unfortunately, the RAII types used by `RwLock` are not `Send`, which is
why they can't be held over `await` boundaries. In order to allow
asynchronous events processing in multi-threaded environments, we here
allow to process events without holding the `total_consistency_lock`.
@tnull tnull force-pushed the 2023-04-fix-async-event-processing branch from a5358d0 to f2453b7 Compare April 21, 2023 16:05
sender.send(()).unwrap();
match sender.send(()) {
Ok(()) => {},
Err(std::sync::mpsc::SendError(())) => println!("Persister failed to notify as receiver went away."),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why?

Copy link
Contributor Author

@tnull tnull Apr 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we're shutting the other task down after the first send. However, we also persist again on shutdown, which triggers a second send, which would panic as the receiver is already gone at that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment for why this is ok would be helpful

Copy link
Contributor

@alecchendev alecchendev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I think! Making sure I'm getting this right, do types need to implement Send across await boundaries because in a multi-threaded environments, a task waiting on a future to complete may be moved to execute on another thread?

// we can be sure no other persists happen while processing events.
let _read_guard = $self.total_consistency_lock.read().unwrap();
let mut processed_all_events = false;
while !processed_all_events {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this is all run in a while loop? IIUC there may be other events added to pending_events by other async tasks while handling the events, which is how we end up not having processed all events, but why do we keep processing until pending_events is empty as opposed to just processing the events that were present when we first call this function? I guess does it make much of a difference or is it more just that we might as well do it while we're here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we no longer allow multiple processors to run at the same time - if one process_events call starts, and makes some progress, then an event is generated, causing a second process_events call to happen, the second call might return early, but there's some events there the user expects to have processed. Thus, we need to make sure the first process_events goes around again and processes the remaining events.

lightning-background-processor/src/lib.rs Show resolved Hide resolved
sender.send(()).unwrap();
match sender.send(()) {
Ok(()) => {},
Err(std::sync::mpsc::SendError(())) => println!("Persister failed to notify as receiver went away."),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a comment for why this is ok would be helpful

lightning/src/ln/channelmanager.rs Show resolved Hide resolved
@TheBlueMatt TheBlueMatt merged commit 5f96d13 into lightningdevkit:main Apr 22, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch total_consistency_lock to a Send RwLock variant
6 participants