-
Notifications
You must be signed in to change notification settings - Fork 421
Restore lazy flag to KVStore::remove
#4189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore lazy flag to KVStore::remove
#4189
Conversation
|
👋 Thanks for assigning @joostjager as a reviewer! |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4189 +/- ##
==========================================
- Coverage 88.87% 88.84% -0.04%
==========================================
Files 180 180
Lines 137863 137870 +7
Branches 137863 137870 +7
==========================================
- Hits 122522 122485 -37
- Misses 12532 12573 +41
- Partials 2809 2812 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| /// potentially get lost on crash after the method returns. Therefore, this flag should only be | ||
| /// set for `remove` operations that can be safely replayed at a later time. | ||
| /// | ||
| /// All removal operations must complete in a consistent total order with [`Self::write`]s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure if this would even work. For example in FilesystemStore, if we simply call remove and leave the decision on when to sync the changes to disk to the OS, how could we be certain that the ordering is preserved? IIRC we basically concluded this can't be guaranteed, especially since different guarantees on different platforms might vary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that man 2 unlink states:
If the name was the last link to a file but any processes still have the file open, the file will remain in existence until the last file descriptor referring to it is closed.
That means that if we have a concurrent read, we may defer the actual deletion, allowing it to interact with following writes, e.g.:
| t1 | t2 | t3 |
| READ | unlink | |
| READ | ... | WRITE |
| READ | ... | |
| READ | ... | SYNC |
| READ | SYNC | |
Although, given we use rename for write, I do wonder if the unlink would simply get lost here as it would apply to the original file that is dropped already anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid this, maybe it is possible to constrain lazy removes to keys that won't ever be written again in the future?
I've read the context of this PR now, and it seems the perf issue was around monitor updates. I think those are never re-written?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure if this would even work. For example in FilesystemStore, if we simply call remove and leave the decision on when to sync the changes to disk to the OS, how could we be certain that the ordering is preserved?
Filesystems provide an order, the only thing they dont provide without an fsync is any kinds of guarantee its on disk. I don't think this is a problem.
Although, given we use rename for write, I do wonder if the unlink would simply get lost here as it would apply to the original file that is dropped already anyways?
Yes, that is how it should work on any reasonable filesystem. In theory its possible for some filesystems to fail the write rename part because the file still exists, but that's unrelated to the remove, that's just the read+write being at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filesystems provide an order, the only thing they dont provide without an fsync is any kinds of guarantee its on disk. I don't think this is a problem.
A file that should have been removed, but is still there. Is that not a pb?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the explicit point of the lazy flag - it allows a store to not guarantee that the entry will be removed if there's a crash/ill-timed restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this offline, man 3p unlink had me convinced this is safe to do on POSIX.
|
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there more details on why/when the lazy flag is important?
|
In this case its important for perf when removing a large number of monitor updates on an fsstore (requiring |
|
1k updates is a lot. Do you think it adds much over let's say 50 updates? Maybe that also makes the perf problem go away without lazy flag... |
|
1k updates seems entirely reasonable for a node doing lots of forwarding. Not being able to pick a reasonable update count just because of an API limitation in how we do removals seems like a pretty weird limitation, no? |
The question was just whether 1000 is reasonable, and you made it clear that it is 👍 |
This reverts commit 561da4c. A user pointed out, when looking to upgrade to LDK 0.2, that the `lazy` flag is actually quite important for performance when using a `MonitorUpdatingPersister`, especially in synchronous persistence mode. Thus, we add it back here. Fixes lightningdevkit#4188
In the previous commit we reverted 561da4c. One of the motivations for it (in addition to `lazy` removals being somewhat less, though still arguably useful in an async context) was that the ordering requirements of `lazy` removals is somewhat unclear. Here we simply default to the simplest safe option, requiring a total order across all `write` and `remove` operations to the same key, `lazy` or not.
9973d78 to
0f9548b
Compare
|
Fixed rustfmt $ git diff-tree -U1 9973d780a 0f9548bf8
diff --git a/lightning/src/util/persist.rs b/lightning/src/util/persist.rs
index 78fdba2113..5d34603c96 100644
--- a/lightning/src/util/persist.rs
+++ b/lightning/src/util/persist.rs
@@ -1084,4 +1084,3 @@ where
let latest_update_id = current_monitor.get_latest_update_id();
- self
- .cleanup_stale_updates_for_monitor_to(&monitor_key, latest_update_id, lazy)
+ self.cleanup_stale_updates_for_monitor_to(&monitor_key, latest_update_id, lazy)
.await?; |
|
Only a rustfmt change since @joostjager ack'd, so landing. |
|
Backported to 0.2 in #4193 |
|
🥳 |
|
Thanks for landing this! I think the comments above covered everything. We use the I was curious if there are possible simplifications, such as all |
A user pointed out, when looking to upgrade to LDK 0.2, that the
lazyflag is actually quite important for performance when usinga
MonitorUpdatingPersister, especially in synchronous persistencemode.
Thus, we add it back here.