Skip to content

Conversation

tnull
Copy link
Contributor

@tnull tnull commented Sep 24, 2025

Alternative to #4113.

The utility of the lazy flag was always not entirely clear mod some cloud environments where you actually could save an explicit call in some scenarios by batching the remove with subsequent calls. However, given the recent addition of the async KVStore introduced addtional ordering constraints its unclear how implementation could actually still benefit from the 'eventual' consistency properties originally envisioned.

As the lazy flag then just amounts to a bunch of additonal complexity everywhere, we here simply drop it from the KVStore/KVStoreSync interfaces, simplifying implementations on both ends.

@tnull tnull added this to the 0.2 milestone Sep 24, 2025
@ldk-reviews-bot
Copy link

ldk-reviews-bot commented Sep 24, 2025

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I mean I'm definitely open to this, but the claim that it has no utility is obviously somewhat dubious given FilesystemStore does use it. Avoiding the fsync when removing in the FS Store is a pretty nontrivial speedup, and given we're about to start calling remove in the background processor (for lightning-liquidity pruning), it seems like a nice thing to have.

ISTM, though, that in fact ~all our removes are lazy, so arguably we could just make it the default.

In any case, I agree its a bit confusing wrt ordering requirements, but maybe we could clarify it somewhat by being explicit that lazy has no impact on any ordering requirements or any other logic and it only means that the file is allowed to re-appear after a crash?

@joostjager
Copy link
Contributor

pretty nontrivial speedup

Do you think it makes a meaningful difference in practice? What should be thought of, removal of a channel with a large number of updates? If it's just theoretical, I'd also favor doing only the atomic one always.

@TheBlueMatt
Copy link
Collaborator

Avoiding an fsync for removal is a pretty huge speedup, its the difference between queuing the removal in memory and actually hitting disk. Now, whether we care about the speedup in removals is a different question. Given old-channel archiving is currently a manual process I'm not sure it matters too much (though probably we want to start doing it in the BP soon, so there it might), but removing one or two channels at once probably also isn't the biggest time-suck in the world. The new LSPS persistence stuff might be more sensitive, though - we already do them in the BP and we could get quite a few of them. We can parallelize the removals, but that might be all the more reason to avoid the fsync - if we're removing a bunch of files from the same dir and fsyncing the dir over and over again it might end up being somewhat single-threaded.

All of that is somewhat speculative, of course. Given we're moving towards async in most places, I wouldn't mind dropping the lazy flag generally for monitor-removal, but I am somewhat concerned about the LSPS persistence stuff trying to remove + fsync a bunch of stuff in parallel in the same folder at the same time. Maybe we don't anticipate much churn in the persisted state for LSPS*?

joostjager
joostjager previously approved these changes Sep 25, 2025
Copy link
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, I am in favor of simplifying things. Especially if the benefits of keeping lazy are not all that clear in practice.

Did a brief scan for lightning-liquidity removes. It seems that's only happening when a client has no more channels and also isn't connected. Doesn't look very impactful to do a sync remove in that case, but leaving final judgement to @tnull

@tnull tnull force-pushed the 2025-09-dont-be-lazy branch 2 times, most recently from e109229 to dce16bf Compare September 25, 2025 09:51
@tnull
Copy link
Contributor Author

tnull commented Sep 25, 2025

Rebased to address silent conflicts, have yet to look into why fuzzer previously failed: https://github.com/lightningdevkit/rust-lightning/actions/runs/17969992988/job/51110133106?pr=4116

@tnull tnull force-pushed the 2025-09-dont-be-lazy branch from dce16bf to 09f9621 Compare September 25, 2025 09:55
@tnull
Copy link
Contributor Author

tnull commented Sep 25, 2025

Rebased to address silent conflicts, have yet to look into why fuzzer previously failed: https://github.com/lightningdevkit/rust-lightning/actions/runs/17969992988/job/51110133106?pr=4116

Fixed, thanks for the hint @joostjager .

@tnull
Copy link
Contributor Author

tnull commented Sep 25, 2025

All of that is somewhat speculative, of course. Given we're moving towards async in most places, I wouldn't mind dropping the lazy flag generally for monitor-removal, but I am somewhat concerned about the LSPS persistence stuff trying to remove + fsync a bunch of stuff in parallel in the same folder at the same time. Maybe we don't anticipate much churn in the persisted state for LSPS*?

I think it should be fine for now, at least until we gain more insight into user data that indicates otherwise. Note that in the LSPS context a remove would only happen once we had a peer state and then that got stale. And it's also not slower than the peer updating that state in some way, e.g., by updating a webhook or the LSP resetting the LSPS2 state due to a forwarding failure. So, personally still in favor of simplifying the interface - we can always make it more complex once we learn about a specific user requirement.

The utility of the `lazy` flag was always not entirely clear mod some
cloud environments where you actually could save an explicit call in
some scenarios by batching the remove with subsequent calls. However,
given the recent addition of the async `KVStore` introduced addtional
ordering constraints its unclear how implementation could actually still
benefit from the 'eventual' consistency properties originally
envisioned.

As the `lazy` flag then just amounts to a bunch of additonal complexity
everywhere, we here simply drop it from the `KVStore`/`KVStoreSync`
interfaces, simplifying implementations on both ends.
@tnull tnull force-pushed the 2025-09-dont-be-lazy branch from 09f9621 to 561da4c Compare September 25, 2025 10:43
Copy link

codecov bot commented Sep 25, 2025

Codecov Report

❌ Patch coverage is 58.33333% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.54%. Comparing base (d076584) to head (561da4c).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
lightning-persister/src/fs_store.rs 42.10% 7 Missing and 4 partials ⚠️
lightning/src/util/test_utils.rs 60.00% 2 Missing ⚠️
lightning-background-processor/src/lib.rs 0.00% 1 Missing ⚠️
lightning/src/util/persist.rs 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4116      +/-   ##
==========================================
+ Coverage   88.53%   88.54%   +0.01%     
==========================================
  Files         179      179              
  Lines      134329   134300      -29     
  Branches   134329   134300      -29     
==========================================
- Hits       118923   118918       -5     
+ Misses      12656    12618      -38     
- Partials     2750     2764      +14     
Flag Coverage Δ
fuzzing 21.81% <0.00%> (+0.01%) ⬆️
tests 88.38% <58.33%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright

@TheBlueMatt
Copy link
Collaborator

This is trivial, landing.

@TheBlueMatt TheBlueMatt merged commit e929517 into lightningdevkit:main Sep 25, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants