Background eviction #4191

SirTyson · 2024-02-13T18:34:40Z

Description

This PR enables background eviction scans when running BucketListDB. This increases the time we have to scan for expired entries to evict and will improve eviction throughput.

It is currently in draft form, as it is based on #4172. This will also fail CI at the moment since it has not been updated with the new XDR changes.

TODO:
~~Additional test for entries that should be evicted in ledger N but are recreated during N tx application.~~ Done
~~Graceful shutdown for background eviction thread.~~ Not required, background eviction scan is sufficiently short lived.
~~Rebase on updated XDR.~~ Done

Checklist

Reviewed the contributing document
Rebased on top of master (no merge commits)
Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
Compiles
Ran all tests
If change impacts performance, include supporting evidence per the performance document

MonsieurNicolas · 2024-02-14T18:48:13Z

I would recommend adding a flag of sorts that continues to perform eviction in the foreground:

That way we can merge this without performing an in depth performance analysis (avoids the PR to rot).
This gives us time to understand the impact of performing so much work in the background.
- Hopefully we'll have protocol 20 live on pubnet soon, so we can test against the real pubnet dataset without having to fork it.
- Right now large merges happen but not that often, where as this potentially cranks up the amount of disk transfer by quite a bit.
  - We may need to tweak I/O priority or hints given to the OS as to avoid trashing disk caches. Right now the only thing we do with worker thread is call runCurrentThreadWithLowPriority which gives them a nice value of 5.

marta-lokhova

Sorry I'm late with reviewing, I only did a partial review so far. Thanks for taking a stab at this! I found it too difficult to reason about the changes unfortunately, due to a few things:

With the current interface, both background and main threads have access to basically every component of core, because both hold references to the Application class. This is very dangerous, and makes it nearly impossible to verify correctness of code. Maybe we should take a step back, and start with a diagram of interactions/dependencies between LedgerManager, BucketManager, etc followed by a sketch of interface changes (like for simplicity start with public API only, we can focus on intra-class synchronization later).
I think a design issue that makes things complicated is that the snapshot manages itself so it has to query LedgerManager for latest ledgers, and BucketManager for snapshot mutex (it feels weird that it has to acquire a mutex held by someone else to modify its own data). Why do we need state in snapshot class in the first place? It could be a const wrapper around buckets (with no access to any of the managers) maintained by some higher-level abstraction, which just swaps copies with the new snapshot once ledger close is done. We could introduce a new class that manages this hand-off with some invariance: e.g. only main thread can refresh the snapshot, while background threads can only retrieve a const copy of state to do whatever is needed.
As part of the interface cleanup, can we think about encapsulation and removal of unneeded mutexes. For example, with the changes suggested above, I think you won't need LCL mutex. We should encapsulate mutexes as much as possible, potentially inside the "handoff manager", so that callers don't need to worry about synchronization (note that we'll still need to keep track of ledger numbers, but conceptually it's part of "the snapshot", so it would all be guarded with one mutex).
I'd suggest incorporating additional invariance, so we can detect bugs faster and blow up rather than go into UB land. Some ideas here: assert main vs non-main thread, future state management like we do in FutureBucket, etc

src/bucket/BucketManager.h

marta-lokhova · 2024-03-06T01:38:15Z

src/bucket/BucketManager.h

@@ -207,9 +207,11 @@ class BucketManager : NonMovableOrCopyable
    virtual void maybeSetIndex(std::shared_ptr<Bucket> b,
                               std::unique_ptr<BucketIndex const>&& index) = 0;

-    virtual std::unique_ptr<SearchableBucketListSnapshot const>
+    virtual std::unique_ptr<SearchableBucketListSnapshot>


the removal of const here doesn't look right: any consumer of snapshots shouldn't be able to modify the underlying object.

While I agree in principal, the only members of SearchableBucketListSnapshot are mSnapshotManager, which is already const, and std::unique_ptr<BucketListSnapshot const> mSnapshot{};. Because SearchableBucketListSnapshot checks and automatically updates mSnapshot on each load, to be const correct mSnapshot would have to be mutable. I don't think this makes much sense to have the only piece of non-const state be mutable, and I don't think it's dangerous because the underlying BucketListSnapshot is const regardless.

src/ledger/LedgerManagerImpl.cpp

marta-lokhova · 2024-03-06T22:28:57Z

src/bucket/BucketManagerImpl.cpp

    }

-    auto iter = mBucketListDBBulkTimers.find(label);
-    if (iter == mBucketListDBBulkTimers.end())
+    mEvictionFuture.wait();


this is redundant: get calls wait already

marta-lokhova · 2024-03-06T22:33:11Z

src/bucket/BucketManagerImpl.cpp

-    if (numEntries != 0)
+    ZoneScoped;
+
+    if (!mEvictionFuture.valid())


this doesn't look right: why isn't the future valid if we get to this state (resolving the eviction scan)?

My understanding is std::future::valid() returns false if the future has been default initialized or has had a call to get. It returns true if the future has been moved from (i.e. mEvictionFuture = std::packaged_task<EvictionResult()>::get_future()) but has not yet been called by get. Essentially, valid returns true if there is ready or pending work, false if the work is done and has been retrieved, or if no work has been set yet.

This particular state is possible on startup, when we're closing our first ledger and haven't had a chance to call startBackgroundEvictionScan as part of closeLedger yet.

So is this only the case for tests then where we close v20 genesis ledger? In practice, genesis is always protocol 0, so it shouldn't need to do any eviction at all, right?

No, by "closing our first ledger", I don't mean the genesis ledger, but the first ledger that a node closes since that node's startup. Like during catchup after assuming state, we'd hit this path on the first call to ledgerClose, since the assume state path doesn't kick off the background eviction scan.

Like during catchup after assuming state, we'd hit this path on the first call to ledgerClose, since the assume state path doesn't kick off the background eviction scan.

why not? it should, right? otherwise, we'd stall on that first ledger and have to block until the scan is done. Seems like we should kick off the eviction scan in LedgerManagerImpl::setLastClosedLedger which is called by catchup

marta-lokhova · 2024-03-06T22:35:30Z

src/bucket/BucketManagerImpl.cpp

+void
+BucketManagerImpl::resolveBackgroundEvictionScan(
+    AbstractLedgerTxn& ltx, uint32_t ledgerSeq,
+    LedgerKeySet const& modifiedKeys)
 {


please use asserts like threadIsMain in critical codepaths like this one

marta-lokhova · 2024-03-06T22:38:35Z

src/bucket/BucketManagerImpl.cpp

-        auto& metric =
-            mApp.getMetrics().NewTimer({"bucketlistDB", "bulk", label});
-        iter = mBucketListDBBulkTimers.emplace(label, metric).first;
+        startBackgroundEvictionScan(ledgerSeq);


we're on the main thread here scanning synchronously, so why not just do it with scanForEvictionLegacySQL that is much less of a foot gun?

While scanForEvictionLegacySQL and the background eviction scan should behave identically, there implementations are fairly different. The goal is to have background scanning by default and eventually deprecate the synchronous scan. For this reason, I think it's safer to only call variant of the scan, and of the two I default to the background scan when it's enabled. Personally, I think it's less of a footgun only calling into one implementation instead of mixing and matching 2.

marta-lokhova · 2024-03-06T23:35:27Z

src/bucket/BucketManagerImpl.cpp

-    auto iter = mBucketListDBBulkTimers.find(label);
-    if (iter == mBucketListDBBulkTimers.end())
+    mEvictionFuture.wait();
+    auto evictionCandidates = mEvictionFuture.get();


I think we should reset the current future as soon as we retrieve the result: some compilers keep lambdas alive as long as the future is alive (even if it's invalidated). I'm not sure this is specifically a problem with what we capture at the moment, but it's definitely a footgun.

I'm not sure what you mean by reset here? Looking at the interface of std::future, there's no functions I can see like reset.

Oh I mean stop referencing the underlying object so it's destroyed properly, like we do in FutureBucket

stellar-core/src/bucket/FutureBucket.cpp

Line 267 in 7fc7671

mOutputBucketFuture = std::shared_future<std::shared_ptr<Bucket>>();

That should only be an issue in shared_future, not regular future though right? Like there's no shared state being referenced here that can be kept alive, mEvictionFuture.get() calls move on the underlying state into auto evictionCandidates

oh yea, you're right, I didn't see it's just std::future. in this case yeah, get should release shared state as per the standard (note that std::move doesn't actually guarantee a move here, and the object can still be copied, but it doesn't matter as long as the future releases its shared state)

src/bucket/BucketManagerImpl.cpp

marta-lokhova · 2024-03-12T23:21:23Z

src/bucket/BucketManagerImpl.cpp

+        });
+
+    mEvictionFuture = task->get_future();
+    mApp.postOnBackgroundThread(bind(&task_t::operator(), task),


Current worker threads are all low priority, so I think we need a high-priority thread for eviction scans, since they are time-sensitive.

MonsieurNicolas · 2024-03-20T18:13:27Z

I would recommend adding a flag of sorts that continues to perform eviction in the foreground:

* That way we can merge this without performing an in depth performance analysis (avoids the PR to rot).

* This gives us time to understand the impact of performing so much work in the background.
  
  * Hopefully we'll have protocol 20 live on pubnet soon, so we can test against the real pubnet dataset without having to fork it.
  * Right now large merges happen but not that often, where as this potentially cranks up the amount of disk transfer by quite a bit.
    
    * We may need to tweak I/O priority or hints given to the OS as to avoid trashing disk caches. Right now the only thing we do with worker thread is call `runCurrentThreadWithLowPriority` which gives them a nice value of 5.

actually something else related to this (as I am seeing reports in Discord of something that could be related to this):
if we switch IO patterns to be spiky, there is a potential to exhaust burst capacity when using bucketsdb on top of EBS. When volumes run out of burst capacity, all IO becomes extremely slow (depending on quotas), which for sure will impact not just core's ability to close ledgers, but also other collocated services like Horizon.

SirTyson · 2024-04-10T03:04:18Z

src/util/Thread.cpp

 void
 runCurrentThreadWithLowPriority()
 {
-    constexpr auto const LOW_PRIORITY_NICE = 5;
+    runCurrentThreadWithPriority(THREAD_PRIORITY_LOWEST);


I'm not sure about changing worker threads from THREAD_PRIORITY_BELOW_NORMAL to THREAD_PRIORITY_LOWEST. Any input from Windows folks would be appreciated.

graydon

Looks .. fairly plausible! I mean I didn't see earlier iterations of this but as it stands I don't see anything serious left to change. A few minor things:

I'm a bit nervous about a thread / packaged task capturing an unsynchronized & to the statistics structure. I guess it's the only writer and the only reader looks at the stats once the task is complete, but it's a somewhat implicit / distant synchronization. I might like to see these change to atomics, or be bundled together with the result as a value type that gets returned from the packaged task rather than a lambda capture, or something.
The number of tests that have an isUsingBucketListDB branch that basically skips subsequent integrity checking makes me uneasy; I realize we haven't got rid of the legacy SQL path yet so you're still getting logic coverage from the other branch, but I assume we're going to someday, and it'd be nice to actuall do the integrity checking against the bucketlist DB too! So .. maybe at least file a followup issue to refactor the integrity checks there so that these branches can go away?
It'd be good to figure out what happened with the test hashes removed from the json files and reconcile to what they're "supposed" to be on master currently.

SirTyson · 2024-04-24T20:13:33Z

I'm a bit nervous about a thread / packaged task capturing an unsynchronized & to the statistics structure. I guess it's the only writer and the only reader looks at the stats once the task is complete, but it's a somewhat implicit / distant synchronization. I might like to see these change to atomics, or be bundled together with the result as a value type that gets returned from the packaged task rather than a lambda capture, or something.

While probably not strictly necessary, to be on the safe side I've refactored the stats to be thread safe. The medida metric themselves are still passed by reference, but the library is thread safe and these should always outlive the eviction scan task, so I think we're ok there.

The number of tests that have an isUsingBucketListDB branch that basically skips subsequent integrity checking makes me uneasy; I realize we haven't got rid of the legacy SQL path yet so you're still getting logic coverage from the other branch, but I assume we're going to someday, and it'd be nice to actuall do the integrity checking against the bucketlist DB too! So .. maybe at least file a followup issue to refactor the integrity checks there so that these branches can go away?

Opened #4295.

It'd be good to figure out what happened with the test hashes removed from the json files and reconcile to what they're "supposed" to be on master currently.

I think something here got mixed up in a rebase, after rebasing on master and regenerating the meta files everything seems fine.

Marta verbally agreed this was ready to land, but is on vacation

graydon · 2024-04-25T20:04:56Z

r+ 3b8a91e

marta-lokhova · 2024-05-02T18:32:46Z

Marta verbally agreed this was ready to land, but is on vacation

Unfortunately, I didn't have a chance to review/approve v2 of the PR since I was out. My previous review is on v1, and I've resolved some comments there.

SirTyson requested review from marta-lokhova, dmkozh and graydon February 13, 2024 18:34

SirTyson self-assigned this Feb 21, 2024

SirTyson force-pushed the background-eviction branch from 9dbb1ae to 2848532 Compare February 26, 2024 21:21

marta-lokhova previously requested changes Mar 7, 2024

View reviewed changes

marta-lokhova reviewed Mar 12, 2024

View reviewed changes

SirTyson force-pushed the background-eviction branch from 68ae3c5 to 05e5388 Compare April 10, 2024 02:49

SirTyson commented Apr 10, 2024

View reviewed changes

SirTyson force-pushed the background-eviction branch from 05e5388 to 1941c3c Compare April 10, 2024 03:04

SirTyson marked this pull request as ready for review April 10, 2024 03:04

SirTyson force-pushed the background-eviction branch 2 times, most recently from c9b2cbe to 4bf9087 Compare April 10, 2024 18:24

graydon requested changes Apr 24, 2024

View reviewed changes

SirTyson added 7 commits April 24, 2024 10:16

Background eviction scan framework

3c8d4d7

Implemented background eviction scan

b07d3ab

Added EXPERIMENTAL_BACKGROUND_EVICTION_SCAN config

f4866ec

Additional test coverage for background eviction scan

67948ba

Refactor and test fixes

9fb7a3e

Added medium priority threads and cleanup

4e94bc4

Threadsafe EvictionStatistics

3b8a91e

SirTyson force-pushed the background-eviction branch from 4bf9087 to 3b8a91e Compare April 24, 2024 20:07

graydon approved these changes Apr 25, 2024

View reviewed changes

latobarita merged commit 45b8f9d into stellar:master Apr 25, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Background eviction #4191

Background eviction #4191

SirTyson commented Feb 13, 2024 •

edited

MonsieurNicolas commented Feb 14, 2024

marta-lokhova left a comment

marta-lokhova Mar 6, 2024

SirTyson Apr 10, 2024

marta-lokhova Mar 6, 2024

marta-lokhova Mar 6, 2024

SirTyson Apr 10, 2024

marta-lokhova Apr 10, 2024

SirTyson Apr 10, 2024

marta-lokhova Apr 10, 2024

marta-lokhova Mar 6, 2024

marta-lokhova Mar 6, 2024

SirTyson Apr 10, 2024

marta-lokhova Mar 6, 2024

SirTyson Apr 10, 2024

marta-lokhova Apr 10, 2024

SirTyson Apr 10, 2024

marta-lokhova Apr 10, 2024

marta-lokhova Mar 12, 2024

MonsieurNicolas commented Mar 20, 2024

SirTyson Apr 10, 2024

graydon left a comment

SirTyson commented Apr 24, 2024

graydon commented Apr 25, 2024

marta-lokhova commented May 2, 2024

Background eviction #4191

Background eviction #4191

Conversation

SirTyson commented Feb 13, 2024 • edited

Description

Checklist

MonsieurNicolas commented Feb 14, 2024

marta-lokhova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MonsieurNicolas commented Mar 20, 2024

Choose a reason for hiding this comment

graydon left a comment

Choose a reason for hiding this comment

SirTyson commented Apr 24, 2024

graydon commented Apr 25, 2024

marta-lokhova commented May 2, 2024

SirTyson commented Feb 13, 2024 •

edited