BucketListDB Bucket Apply Optimization #4114

SirTyson · 2024-01-02T20:09:05Z

Description

Resolves #4113

This change improves Bucket apply times by approximately 80% when BucketListDB is enabled. In the current catchup flow, we first apply buckets then index those buckets. The Bucket apply step scans the entire BucketList even though only offers need to be applied. This PR changes the ordering such that we index buckets then apply them. This allows us to use the index to only scan sections of the BucketList that contain offers.

Bucket apply currently writes every entry it sees to the SQL DB. Because the BucketList is implemented via a LSMT, many versions of a given entry may exist in the BucketList, with only the most recent version being valid. This means Bucket apply must iterate through the BucketList in reverse order such that the last version of a key written is the most recent. This causes significant write amplification, as many versions of the same entry are written to disk even though only the most recent version is valid.

When BucketListDB is enabled, this change now applies buckets in-order so that we must only write any key a single time. However, to avoid writing out of date entries to the DB, in-order application requires that every key written to the DB is recorded in an in-memory set. This set would be too large if all entry types are being applied. However, when BucketListDB is enabled, we must only apply offer entries, making this optimization possible.

Checklist

Reviewed the contributing document
Rebased on top of master (no merge commits)
Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
Compiles
Ran all tests
If change impacts performance, include supporting evidence per the performance document

src/bucket/BucketIndexImpl.cpp

graydon

LGTM!

marta-lokhova

Curious to see catchup application time after this change!

marta-lokhova · 2024-04-18T04:20:15Z

src/catchup/ApplyBucketsWork.h

@@ -49,6 +51,10 @@ class ApplyBucketsWork : public Work

    bool mDelayChecked{false};

+    static uint32_t startingLevel(bool offersOnly);


nit on naming: could we rename this parameter to BucketListDBEnabled or something like that? Replaying only offers is basically an implementation detail of BucketListDB, that callers don't need to know about.

Also, I think you don't need to pass offersOnly parameter, both ApplyBucketsWork and BucketApplicator have access to App's config to decide if BL should be applied in reverse.

marta-lokhova · 2024-04-18T04:26:25Z

src/bucket/BucketApplicator.h

@@ -28,6 +28,10 @@ class BucketApplicator
    BucketInputIterator mBucketIter;
    size_t mCount{0};
    std::function<bool(LedgerEntryType)> mEntryTypeFilter;
+    bool const mNewOffersOnly;
+    UnorderedSet<LedgerKey>& mSeenKeys;


do you need randomized hashing of UnorderedSet?

I don't think so. This is just storing seen offer keys, so I think any kind of attack triggering lots of hash collisions would have to be very sophisticated and pretty worthless, as the result would be a marginally slowed catch up.

so.. how about std::unordered_set then? :)

src/bucket/BucketApplicator.cpp

marta-lokhova · 2024-04-18T04:37:07Z

src/catchup/ApplyBucketsWork.cpp

+        }
+        else
+        {
+            mFinishedIndexBucketsWork = true;


No need for mFinishedIndexBucketsWork or mSpawnedIndexBucketsWork: just use the pointer returned by addWork to determine if work started/finished.

(same applies to mSpawnedAssumeStateWork; would be nice to clean that up too, but no worries if not, since it's outside of the scope of this PR)

marta-lokhova · 2024-04-18T04:40:35Z

src/catchup/ApplyBucketsWork.cpp

+            // The current size of this set is 1.6 million during BucketApply
+            // (as of 12/20/23). There's not a great way to estimate this, so
+            // reserving with some extra wiggle room
+            mSeenKeys.reserve(2'000'000);


thoughts on expected memory consumption here? (keys are pretty small, but would be good to understand why we saw OOM kills in CI with this change)

It's pretty variable and correlated to high DEX activity rather than BucketList size. Currently, it takes about 180 mb. I also tested a period of very high DEX activity in 2018 which consumed 1.15 GB. For replaying all history, I would estimate the upper bound is around 1.2 GB (I found an expensive period, but I'm not sure if it's the absolute max).

Looks like the peak memory usage of core on any given range is 3.8 GB during catchup (total, not just the offer apply index).

marta-lokhova · 2024-04-18T04:42:54Z

src/catchup/ApplyBucketsWork.cpp

+    mFirstBucket.reset();
+    mSecondBucket.reset();
+    mFirstBucketApplicator.reset();
+    mSecondBucketApplicator.reset();


mFinishedIndexBucketsWork and mSpawnedIndexBucketsWork aren't reset here; assuming those will go away, you'd need to reset Work pointer instead.

marta-lokhova · 2024-04-18T04:45:44Z

src/catchup/ApplyBucketsWork.cpp

-    if (!mSpawnedAssumeStateWork)
+    // This status string only applies to step 2 when we actually apply the
+    // buckets.
+    if (mFinishedIndexBucketsWork && !mSpawnedAssumeStateWork)


Should we log something meaningful about indexing?

I don't think we need to, as IndexBucketsWork already prints logs regarding indexing progress.

marta-lokhova · 2024-04-18T05:14:53Z

src/catchup/ApplyBucketsWork.cpp

@@ -174,31 +230,54 @@ ApplyBucketsWork::startLevel()
    auto& level = getBucketLevel(mLevel);
    HistoryStateBucket const& i = mApplyState.currentBuckets.at(mLevel);

-    bool applySnap = (i.snap != binToHex(level.getSnap()->getHash()));
-    bool applyCurr = (i.curr != binToHex(level.getCurr()->getHash()));
+    bool applyFirst = mOffersOnly


all the new ternary operators are getting a bit hard to follow; that aside, this function applying the whole level has always been kind of strange. like, is there actually a reason to apply the whole level and keep track of snap & curr? Alternatively, we could just have an iterator of buckets in the correct order of application, and doWork could pick up one bucket at a time to process. This way I think we don't need to track snap and curr at all. This does require quite a bit of refactoring though, so maybe we can at least open an issue to clean this up?

Yeah, I agree that this is worth doing but should happen post merge. I've opened an issue for it here #4290.

marta-lokhova

thanks for the updates, LGTM! Just a minor code cleanup request. Please squash commits also.

marta-lokhova · 2024-04-19T19:16:07Z

src/catchup/ApplyBucketsWork.cpp

-    if (!mSpawnedAssumeStateWork)
+    // Step 1: index buckets. Step 2: apply buckets. Step 3: assume state
+    bool isUsingBucketListDB = mApp.getConfig().isUsingBucketListDB();
+    if (isUsingBucketListDB)


nit: I'd prefer if we could write this flow such that it's consistent with Work usage across the codebase:

if (!mIndexBucketsWork) { // Spawn indexing work for the first time mIndexBucketsWork = addWork<IndexBucketsWork>(mBucketsToIndex); return State::WORK_RUNNING; } else if (mIndexBucketsWork->getState() != BasicWork::State::WORK_SUCCESS) { // Exit early if indexing work is still running, or failed return mIndexBucketsWork->getState(); } // Otherwise, continue with next steps

marta-lokhova · 2024-04-19T19:19:41Z

src/catchup/ApplyBucketsWork.cpp

+        // If indexing has not finished or finished and failed, return result
+        // status. If indexing finished and succeeded, move on and spawn assume
+        // state work.
+        auto status = checkChildrenStatus();


Please don't use checkChildrenStatus to determine the state of mIndexBucketsWork, as this parent work might have other children, which would cause the function to return unexpected results. To query state of each individual work, use that work's getState method.

SirTyson · 2024-04-19T21:47:20Z

thanks for the updates, LGTM! Just a minor code cleanup request. Please squash commits also.

Done!

marta-lokhova · 2024-04-19T22:33:37Z

r+ 004a4ba

SirTyson requested review from marta-lokhova and sisuresh January 2, 2024 20:09

SirTyson force-pushed the bl-bucket-apply branch 2 times, most recently from 4e16098 to cf799df Compare February 14, 2024 20:14

sisuresh reviewed Apr 10, 2024

View reviewed changes

src/bucket/BucketIndexImpl.cpp Show resolved Hide resolved

SirTyson force-pushed the bl-bucket-apply branch from cf799df to d7adda1 Compare April 12, 2024 18:54

graydon self-requested a review April 16, 2024 23:09

graydon approved these changes Apr 16, 2024

View reviewed changes

marta-lokhova requested changes Apr 18, 2024

View reviewed changes

SirTyson force-pushed the bl-bucket-apply branch from d7adda1 to 574f14f Compare April 18, 2024 22:29

marta-lokhova approved these changes Apr 19, 2024

View reviewed changes

Apply Buckets in-order when BucketListDB enabled

004a4ba

SirTyson force-pushed the bl-bucket-apply branch from 574f14f to 004a4ba Compare April 19, 2024 21:47

sisuresh mentioned this pull request Apr 19, 2024

Increase parallel catchup memory burst limit for incoming BucketsDB optimization stellar/supercluster#159

Merged

latobarita merged commit 25525d4 into stellar:master Apr 19, 2024
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BucketListDB Bucket Apply Optimization #4114

BucketListDB Bucket Apply Optimization #4114

SirTyson commented Jan 2, 2024

graydon left a comment

marta-lokhova left a comment

marta-lokhova Apr 18, 2024

marta-lokhova Apr 18, 2024

marta-lokhova Apr 18, 2024

SirTyson Apr 18, 2024

marta-lokhova Apr 18, 2024

marta-lokhova Apr 18, 2024

marta-lokhova Apr 18, 2024

marta-lokhova Apr 18, 2024

SirTyson Apr 18, 2024

SirTyson Apr 19, 2024

marta-lokhova Apr 18, 2024

marta-lokhova Apr 18, 2024

SirTyson Apr 18, 2024

marta-lokhova Apr 18, 2024

SirTyson Apr 18, 2024

marta-lokhova left a comment

marta-lokhova Apr 19, 2024

marta-lokhova Apr 19, 2024

SirTyson commented Apr 19, 2024

marta-lokhova commented Apr 19, 2024

		@@ -49,6 +51,10 @@ class ApplyBucketsWork : public Work

		bool mDelayChecked{false};

		static uint32_t startingLevel(bool offersOnly);

BucketListDB Bucket Apply Optimization #4114

BucketListDB Bucket Apply Optimization #4114

Conversation

SirTyson commented Jan 2, 2024

Description

Checklist

graydon left a comment

Choose a reason for hiding this comment

marta-lokhova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marta-lokhova left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SirTyson commented Apr 19, 2024

marta-lokhova commented Apr 19, 2024