Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove bucketlist shadows, avoid resolving snapshots on publish #2205

Merged

Conversation

marta-lokhova
Copy link
Contributor

This PR implements CAP-0025, and makes a few changes to how we publish and catchup. Main change is shadow removal from the bucketlist, which changes how buckets are merged (and hence, producing different buckets, which affects ledger header hash).

This change implies that on publish, it is no longer necessary to resolve future buckets. Moreover, it is not even necessary to store future buckets in half-merged state, since bucketlist can fully recover the merges by just using its currs and snaps.

This change also implements the upgrade mechanism. Specifically, upon the upgrade (which will be properly timed), in-progress merges are dropped, and new-style merges are started.

@marta-lokhova marta-lokhova force-pushed the avoid_resolving_buckets_on_publish branch from 503620b to 2b9cdfd Compare July 24, 2019 17:04
Copy link
Contributor

@graydon graydon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to need to revisit this (will try again tomorrow) but a preliminary scan looks plausible! I'm a little confused about the need to do the upgrade on a specific ledger boundary: why can't we just start doing new merges (after the upgrade) with the new logic? buckets all have their protocol version number written into them now so you can tell with a given input where it originated. I must not be getting something about this upgrade vs. the last one (which we didn't need to coordinate with a specific merge boundary).

src/catchup/ApplyBucketsWork.cpp Show resolved Hide resolved
src/catchup/ApplyBucketsWork.cpp Outdated Show resolved Hide resolved
src/bucket/FutureBucket.cpp Outdated Show resolved Hide resolved
src/bucket/BucketList.cpp Outdated Show resolved Hide resolved
@MonsieurNicolas MonsieurNicolas added this to In progress in v12.0.0 via automation Aug 9, 2019
Copy link
Contributor

@MonsieurNicolas MonsieurNicolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a quite a bit of churn in many places... I think I agree with what you told me offline that the better (and much simpler) solution might be to use bucket versions and just change FutureBucket::startMerge to not populate shadows if either input bucket is at protocol 12 or higher.

}

BasicWork::State
ResolveSnapshotWork::onRun()
{
if (mSnapshot->mLocalState.futuresAllClear())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to special case the "all clear" case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like futuresAllResolved already accounts for "all clear", so I think we don't need it. Removed.

@@ -748,6 +752,13 @@ Upgrades::applyVersionUpgrade(AbstractLedgerTxn& ltx, uint32_t newVersion)
{
prepareLiabilities(ltx, header);
}

if (header.current().ledgerVersion >= 12 && prevVersion < 12)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be consistent: use Bucket::FIRST_PROTOCOL_SHADOWS_REMOVED

if (clearFutureBuckets)
{
// Future buckets aren't needed in the archives
// post protocol 12.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Bucket::FIRST_PROTOCOL_SHADOWS_REMOVED not 12

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, it's quite verbose that way. "Protocol 10" and "Protocol 11" are used consistently in the comments/tests throughout the codebase, I don't see how this is any different.

@@ -106,7 +106,7 @@ class LedgerManager

// return the HAS that corresponds to the last closed ledger as persisted in
// the database
virtual HistoryArchiveState getLastClosedLedgerHAS() = 0;
virtual HistoryArchiveState getPersistentHAS() = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the rename?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are two functions with very similar names, which are easy to confuse: HistoryManager::getLastClosedHistoryArchiveState and LedgerManager::getLastClosedLedgerHAS. The former returns HAS from an in-memory LCL, the latter queries the database for whatever last HAS was saved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LastClosedLedger is used all over the code, calling it persistent makes it less clear on what it is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see now what you're saying. We should remove the method in HistoryManager instead

src/bucket/BucketList.cpp Outdated Show resolved Hide resolved
@marta-lokhova marta-lokhova force-pushed the avoid_resolving_buckets_on_publish branch 3 times, most recently from 8bb498a to 4eabb0e Compare August 23, 2019 22:39
Copy link
Contributor

@graydon graydon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of time for today, but definitely liking the look of this! A few nits that are largely stylistic, plus one worrying thing concerning the semantic blurring happening between maxProtocolVersion and protocolVersion. Thanks for keeping at it, I'll try to return to it next week.

src/catchup/ApplyBucketsWork.cpp Show resolved Hide resolved
src/catchup/ApplyBucketsWork.cpp Outdated Show resolved Hide resolved
src/history/HistoryArchive.cpp Outdated Show resolved Hide resolved
src/ledger/LedgerManagerImpl.cpp Outdated Show resolved Hide resolved
src/main/ApplicationUtils.cpp Outdated Show resolved Hide resolved
src/bucket/BucketList.cpp Outdated Show resolved Hide resolved
src/bucket/BucketList.cpp Outdated Show resolved Hide resolved
src/bucket/BucketList.cpp Outdated Show resolved Hide resolved
src/bucket/BucketList.cpp Outdated Show resolved Hide resolved
src/bucket/BucketList.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@MonsieurNicolas MonsieurNicolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think there is an opportunity for simplification:
right now the logic to deal with new/old merge is scattered around the code by passing around booleans/state information, a better approach might be to see if we can instead derive the behavior from information we already have (such as presence or absence of shadows, or version in bucket).

We may be able to perform additional checks (invariants) by performing changes similar to what you did, but it may not be worth it; instead, it may be better to just have a new "checkInvariants" method at the bucket list level.

// We only ever write fully-resolved HASs to files, when making
// checkpoints. This may change in the future if we start publishing
// input-only HASs.
assert(futuresAllResolved());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this check be performed conditionally? We still need to write those like that in protocol 11 and below

if (!shadows.empty())
{
throw std::runtime_error(
"Protocol 12 or later does not support shadows");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

12 or FIRST_PROTOCOL_SHADOWS_REMOVED ? pick one in the code base

}
else
{
++mc.mPreShadowRemovalProtocolMerges;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need those counters? if it's for tests, maybe it's better to add them when you add tests

src/bucket/BucketList.cpp Show resolved Hide resolved
@@ -145,6 +149,35 @@ ApplyBucketsWork::startLevel()
BasicWork::State
ApplyBucketsWork::onRun()
{
if (mResolveMerges && mDelayTimer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this should just be a different Work? I don't see why merges should be managed with Apply as they are unrelated.
Ideally what we can do is:

  1. download & verify buckets
  2. in parallel
    a. resolve merges
    b. apply buckets

so the way we can do this is simply:

  1. download & verify buckets
  2. start merges
  3. apply buckets
  4. wait for merges to be completed

@@ -106,12 +107,27 @@ CatchupWork::alreadyHaveBucketsHistoryArchiveState(uint32_t atCheckpoint) const
WorkSeqPtr
CatchupWork::downloadApplyBuckets()
{
// If first ledger we're going to replay is of older version (before shadow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(related to the other comment: we should probably move this out of the "apply buckets" method)

@@ -106,12 +107,27 @@ CatchupWork::alreadyHaveBucketsHistoryArchiveState(uint32_t atCheckpoint) const
WorkSeqPtr
CatchupWork::downloadApplyBuckets()
{
// If first ledger we're going to replay is of older version (before shadow
// removal), it'll perform old-style merges
auto ledgerVersion = mVerifiedLedgerRangeStart.header.ledgerVersion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just always wait for merges regardless of protocol version? The danger of blocking the main thread exists also in the current version of the protocol, right?

src/history/HistoryArchive.cpp Outdated Show resolved Hide resolved
@@ -251,7 +242,18 @@ HistoryManagerImpl::maybeQueueHistoryCheckpoint()
void
HistoryManagerImpl::queueCurrentHistory()
{
auto has = getLastClosedHistoryArchiveState();
// Subtle: depending on the protocol version,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related to my previous comment: I don't see why we need this here.
Basically: to queue things up we should never have to wait.

Later (actual publish), we can tell if we need to wait in ResolveSnapshotWork directly by for example seeing that we have shadows or not.

src/main/Config.cpp Show resolved Hide resolved
@marta-lokhova
Copy link
Contributor Author

Still got a version-related question/concern, to discuss offline

@marta-lokhova marta-lokhova force-pushed the avoid_resolving_buckets_on_publish branch 2 times, most recently from 9dde0eb to a05acb0 Compare September 12, 2019 23:29
src/bucket/Bucket.cpp Show resolved Hide resolved
{
auto& level = currentBuckets[i];
auto& prev = currentBuckets[i - 1];
Hash emptyHash;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const (emptyHash)

src/bucket/BucketList.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@graydon graydon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very close! Minor cosmetic changes for the most part, I reviewed 22/30 files, didn't get to tests yet. Will review tests this evening.

src/history/HistoryArchive.cpp Outdated Show resolved Hide resolved
src/history/HistoryArchive.cpp Outdated Show resolved Hide resolved
src/catchup/CatchupWork.cpp Outdated Show resolved Hide resolved
src/bucket/BucketList.cpp Outdated Show resolved Hide resolved
src/bucket/BucketList.cpp Outdated Show resolved Hide resolved
src/bucket/BucketList.cpp Outdated Show resolved Hide resolved
src/catchup/ApplyBucketsWork.cpp Show resolved Hide resolved
@marta-lokhova marta-lokhova force-pushed the avoid_resolving_buckets_on_publish branch 2 times, most recently from c077255 to 3da7b56 Compare September 14, 2019 03:11
Copy link
Contributor

@graydon graydon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more "comments, clarifications, slight reorganizations" in a few of the tests, but overall looking quite nearly done!

Thank you for all the patience and iteration: as you've seen every one of these bucket logic changes multiplies the overall subsystem complexity a lot, so I really appreciate that you've whittled this change down to its core and tested it thoroughly.

src/bucket/test/BucketTests.cpp Outdated Show resolved Hide resolved
src/bucket/test/BucketTests.cpp Outdated Show resolved Hide resolved
src/bucket/test/BucketManagerTests.cpp Outdated Show resolved Hide resolved
src/history/test/HistoryTests.cpp Outdated Show resolved Hide resolved
src/history/test/HistoryTests.cpp Outdated Show resolved Hide resolved
src/history/test/HistoryTests.cpp Show resolved Hide resolved
src/history/test/HistoryTests.cpp Show resolved Hide resolved
src/herder/test/UpgradesTests.cpp Outdated Show resolved Hide resolved
@marta-lokhova marta-lokhova force-pushed the avoid_resolving_buckets_on_publish branch from 3da7b56 to 119b651 Compare September 16, 2019 19:25
@graydon
Copy link
Contributor

graydon commented Sep 16, 2019

@marta-lokhova Thanks for all the patient editing! Looks great. @MonsieurNicolas Anything left on your end?

@MonsieurNicolas
Copy link
Contributor

r+ 119b651

latobarita added a commit that referenced this pull request Sep 16, 2019
…n_publish

Remove bucketlist shadows, avoid resolving snapshots on publish

Reviewed-by: MonsieurNicolas
@latobarita latobarita merged commit 119b651 into stellar:master Sep 17, 2019
v12.0.0 automation moved this from In progress to Done Sep 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
v12.0.0
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants