Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BucketListDB Bucket Apply Optimization #4114

Merged
merged 1 commit into from
Apr 19, 2024

Conversation

SirTyson
Copy link
Contributor

@SirTyson SirTyson commented Jan 2, 2024

Description

Resolves #4113

This change improves Bucket apply times by approximately 80% when BucketListDB is enabled. In the current catchup flow, we first apply buckets then index those buckets. The Bucket apply step scans the entire BucketList even though only offers need to be applied. This PR changes the ordering such that we index buckets then apply them. This allows us to use the index to only scan sections of the BucketList that contain offers.

Bucket apply currently writes every entry it sees to the SQL DB. Because the BucketList is implemented via a LSMT, many versions of a given entry may exist in the BucketList, with only the most recent version being valid. This means Bucket apply must iterate through the BucketList in reverse order such that the last version of a key written is the most recent. This causes significant write amplification, as many versions of the same entry are written to disk even though only the most recent version is valid.

When BucketListDB is enabled, this change now applies buckets in-order so that we must only write any key a single time. However, to avoid writing out of date entries to the DB, in-order application requires that every key written to the DB is recorded in an in-memory set. This set would be too large if all entry types are being applied. However, when BucketListDB is enabled, we must only apply offer entries, making this optimization possible.

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

@SirTyson SirTyson force-pushed the bl-bucket-apply branch 2 times, most recently from 4e16098 to cf799df Compare February 14, 2024 20:14
Copy link
Contributor

@graydon graydon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@marta-lokhova marta-lokhova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to see catchup application time after this change!

@@ -49,6 +51,10 @@ class ApplyBucketsWork : public Work

bool mDelayChecked{false};

static uint32_t startingLevel(bool offersOnly);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit on naming: could we rename this parameter to BucketListDBEnabled or something like that? Replaying only offers is basically an implementation detail of BucketListDB, that callers don't need to know about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think you don't need to pass offersOnly parameter, both ApplyBucketsWork and BucketApplicator have access to App's config to decide if BL should be applied in reverse.

@@ -28,6 +28,10 @@ class BucketApplicator
BucketInputIterator mBucketIter;
size_t mCount{0};
std::function<bool(LedgerEntryType)> mEntryTypeFilter;
bool const mNewOffersOnly;
UnorderedSet<LedgerKey>& mSeenKeys;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need randomized hashing of UnorderedSet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. This is just storing seen offer keys, so I think any kind of attack triggering lots of hash collisions would have to be very sophisticated and pretty worthless, as the result would be a marginally slowed catch up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so.. how about std::unordered_set then? :)

src/bucket/BucketApplicator.cpp Outdated Show resolved Hide resolved
}
else
{
mFinishedIndexBucketsWork = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for mFinishedIndexBucketsWork or mSpawnedIndexBucketsWork: just use the pointer returned by addWork to determine if work started/finished.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same applies to mSpawnedAssumeStateWork; would be nice to clean that up too, but no worries if not, since it's outside of the scope of this PR)

// The current size of this set is 1.6 million during BucketApply
// (as of 12/20/23). There's not a great way to estimate this, so
// reserving with some extra wiggle room
mSeenKeys.reserve(2'000'000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on expected memory consumption here? (keys are pretty small, but would be good to understand why we saw OOM kills in CI with this change)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty variable and correlated to high DEX activity rather than BucketList size. Currently, it takes about 180 mb. I also tested a period of very high DEX activity in 2018 which consumed 1.15 GB. For replaying all history, I would estimate the upper bound is around 1.2 GB (I found an expensive period, but I'm not sure if it's the absolute max).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the peak memory usage of core on any given range is 3.8 GB during catchup (total, not just the offer apply index).

mFirstBucket.reset();
mSecondBucket.reset();
mFirstBucketApplicator.reset();
mSecondBucketApplicator.reset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mFinishedIndexBucketsWork and mSpawnedIndexBucketsWork aren't reset here; assuming those will go away, you'd need to reset Work pointer instead.

if (!mSpawnedAssumeStateWork)
// This status string only applies to step 2 when we actually apply the
// buckets.
if (mFinishedIndexBucketsWork && !mSpawnedAssumeStateWork)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we log something meaningful about indexing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to, as IndexBucketsWork already prints logs regarding indexing progress.

@@ -174,31 +230,54 @@ ApplyBucketsWork::startLevel()
auto& level = getBucketLevel(mLevel);
HistoryStateBucket const& i = mApplyState.currentBuckets.at(mLevel);

bool applySnap = (i.snap != binToHex(level.getSnap()->getHash()));
bool applyCurr = (i.curr != binToHex(level.getCurr()->getHash()));
bool applyFirst = mOffersOnly
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the new ternary operators are getting a bit hard to follow; that aside, this function applying the whole level has always been kind of strange. like, is there actually a reason to apply the whole level and keep track of snap & curr? Alternatively, we could just have an iterator of buckets in the correct order of application, and doWork could pick up one bucket at a time to process. This way I think we don't need to track snap and curr at all. This does require quite a bit of refactoring though, so maybe we can at least open an issue to clean this up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that this is worth doing but should happen post merge. I've opened an issue for it here #4290.

Copy link
Contributor

@marta-lokhova marta-lokhova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the updates, LGTM! Just a minor code cleanup request. Please squash commits also.

if (!mSpawnedAssumeStateWork)
// Step 1: index buckets. Step 2: apply buckets. Step 3: assume state
bool isUsingBucketListDB = mApp.getConfig().isUsingBucketListDB();
if (isUsingBucketListDB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd prefer if we could write this flow such that it's consistent with Work usage across the codebase:

if (!mIndexBucketsWork)
{
     // Spawn indexing work for the first time
     mIndexBucketsWork = addWork<IndexBucketsWork>(mBucketsToIndex);
     return State::WORK_RUNNING;
}
else if (mIndexBucketsWork->getState() != BasicWork::State::WORK_SUCCESS)
{
      // Exit early if indexing work is still running, or failed
      return mIndexBucketsWork->getState();
}

// Otherwise, continue with next steps

// If indexing has not finished or finished and failed, return result
// status. If indexing finished and succeeded, move on and spawn assume
// state work.
auto status = checkChildrenStatus();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use checkChildrenStatus to determine the state of mIndexBucketsWork, as this parent work might have other children, which would cause the function to return unexpected results. To query state of each individual work, use that work's getState method.

@SirTyson
Copy link
Contributor Author

thanks for the updates, LGTM! Just a minor code cleanup request. Please squash commits also.

Done!

@marta-lokhova
Copy link
Contributor

r+ 004a4ba

@latobarita latobarita merged commit 25525d4 into stellar:master Apr 19, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve bucket apply
5 participants