-
Notifications
You must be signed in to change notification settings - Fork 969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove bucketlist shadows, avoid resolving snapshots on publish #2205
Remove bucketlist shadows, avoid resolving snapshots on publish #2205
Conversation
503620b
to
2b9cdfd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to need to revisit this (will try again tomorrow) but a preliminary scan looks plausible! I'm a little confused about the need to do the upgrade on a specific ledger boundary: why can't we just start doing new merges (after the upgrade) with the new logic? buckets all have their protocol version number written into them now so you can tell with a given input where it originated. I must not be getting something about this upgrade vs. the last one (which we didn't need to coordinate with a specific merge boundary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a quite a bit of churn in many places... I think I agree with what you told me offline that the better (and much simpler) solution might be to use bucket versions and just change FutureBucket::startMerge
to not populate shadows if either input bucket is at protocol 12 or higher.
} | ||
|
||
BasicWork::State | ||
ResolveSnapshotWork::onRun() | ||
{ | ||
if (mSnapshot->mLocalState.futuresAllClear()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to special case the "all clear" case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like futuresAllResolved
already accounts for "all clear", so I think we don't need it. Removed.
src/herder/Upgrades.cpp
Outdated
@@ -748,6 +752,13 @@ Upgrades::applyVersionUpgrade(AbstractLedgerTxn& ltx, uint32_t newVersion) | |||
{ | |||
prepareLiabilities(ltx, header); | |||
} | |||
|
|||
if (header.current().ledgerVersion >= 12 && prevVersion < 12) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be consistent: use Bucket::FIRST_PROTOCOL_SHADOWS_REMOVED
src/history/HistoryArchive.cpp
Outdated
if (clearFutureBuckets) | ||
{ | ||
// Future buckets aren't needed in the archives | ||
// post protocol 12. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Bucket::FIRST_PROTOCOL_SHADOWS_REMOVED
not 12
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, it's quite verbose that way. "Protocol 10" and "Protocol 11" are used consistently in the comments/tests throughout the codebase, I don't see how this is any different.
src/ledger/LedgerManager.h
Outdated
@@ -106,7 +106,7 @@ class LedgerManager | |||
|
|||
// return the HAS that corresponds to the last closed ledger as persisted in | |||
// the database | |||
virtual HistoryArchiveState getLastClosedLedgerHAS() = 0; | |||
virtual HistoryArchiveState getPersistentHAS() = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the rename?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are two functions with very similar names, which are easy to confuse: HistoryManager::getLastClosedHistoryArchiveState
and LedgerManager::getLastClosedLedgerHAS
. The former returns HAS from an in-memory LCL, the latter queries the database for whatever last HAS was saved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LastClosedLedger
is used all over the code, calling it persistent
makes it less clear on what it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see now what you're saying. We should remove the method in HistoryManager
instead
8bb498a
to
4eabb0e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of time for today, but definitely liking the look of this! A few nits that are largely stylistic, plus one worrying thing concerning the semantic blurring happening between maxProtocolVersion
and protocolVersion
. Thanks for keeping at it, I'll try to return to it next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I think there is an opportunity for simplification:
right now the logic to deal with new/old merge is scattered around the code by passing around booleans/state information, a better approach might be to see if we can instead derive the behavior from information we already have (such as presence or absence of shadows, or version in bucket).
We may be able to perform additional checks (invariants) by performing changes similar to what you did, but it may not be worth it; instead, it may be better to just have a new "checkInvariants" method at the bucket list level.
// We only ever write fully-resolved HASs to files, when making | ||
// checkpoints. This may change in the future if we start publishing | ||
// input-only HASs. | ||
assert(futuresAllResolved()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this check be performed conditionally? We still need to write those like that in protocol 11 and below
src/bucket/Bucket.cpp
Outdated
if (!shadows.empty()) | ||
{ | ||
throw std::runtime_error( | ||
"Protocol 12 or later does not support shadows"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
12
or FIRST_PROTOCOL_SHADOWS_REMOVED
? pick one in the code base
src/bucket/Bucket.cpp
Outdated
} | ||
else | ||
{ | ||
++mc.mPreShadowRemovalProtocolMerges; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need those counters? if it's for tests, maybe it's better to add them when you add tests
@@ -145,6 +149,35 @@ ApplyBucketsWork::startLevel() | |||
BasicWork::State | |||
ApplyBucketsWork::onRun() | |||
{ | |||
if (mResolveMerges && mDelayTimer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this should just be a different Work
? I don't see why merges should be managed with Apply
as they are unrelated.
Ideally what we can do is:
- download & verify buckets
- in parallel
a. resolve merges
b. apply buckets
so the way we can do this is simply:
- download & verify buckets
- start merges
- apply buckets
- wait for merges to be completed
src/catchup/CatchupWork.cpp
Outdated
@@ -106,12 +107,27 @@ CatchupWork::alreadyHaveBucketsHistoryArchiveState(uint32_t atCheckpoint) const | |||
WorkSeqPtr | |||
CatchupWork::downloadApplyBuckets() | |||
{ | |||
// If first ledger we're going to replay is of older version (before shadow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(related to the other comment: we should probably move this out of the "apply buckets" method)
src/catchup/CatchupWork.cpp
Outdated
@@ -106,12 +107,27 @@ CatchupWork::alreadyHaveBucketsHistoryArchiveState(uint32_t atCheckpoint) const | |||
WorkSeqPtr | |||
CatchupWork::downloadApplyBuckets() | |||
{ | |||
// If first ledger we're going to replay is of older version (before shadow | |||
// removal), it'll perform old-style merges | |||
auto ledgerVersion = mVerifiedLedgerRangeStart.header.ledgerVersion; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just always wait for merges regardless of protocol version? The danger of blocking the main thread exists also in the current version of the protocol, right?
src/history/HistoryManagerImpl.cpp
Outdated
@@ -251,7 +242,18 @@ HistoryManagerImpl::maybeQueueHistoryCheckpoint() | |||
void | |||
HistoryManagerImpl::queueCurrentHistory() | |||
{ | |||
auto has = getLastClosedHistoryArchiveState(); | |||
// Subtle: depending on the protocol version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to my previous comment: I don't see why we need this here.
Basically: to queue things up we should never have to wait.
Later (actual publish), we can tell if we need to wait in ResolveSnapshotWork
directly by for example seeing that we have shadows or not.
4eabb0e
to
c4fd83a
Compare
Still got a version-related question/concern, to discuss offline |
9dde0eb
to
a05acb0
Compare
src/history/HistoryArchive.cpp
Outdated
{ | ||
auto& level = currentBuckets[i]; | ||
auto& prev = currentBuckets[i - 1]; | ||
Hash emptyHash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const (emptyHash)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very close! Minor cosmetic changes for the most part, I reviewed 22/30 files, didn't get to tests yet. Will review tests this evening.
c077255
to
3da7b56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more "comments, clarifications, slight reorganizations" in a few of the tests, but overall looking quite nearly done!
Thank you for all the patience and iteration: as you've seen every one of these bucket logic changes multiplies the overall subsystem complexity a lot, so I really appreciate that you've whittled this change down to its core and tested it thoroughly.
3da7b56
to
119b651
Compare
@marta-lokhova Thanks for all the patient editing! Looks great. @MonsieurNicolas Anything left on your end? |
r+ 119b651 |
…n_publish Remove bucketlist shadows, avoid resolving snapshots on publish Reviewed-by: MonsieurNicolas
This PR implements CAP-0025, and makes a few changes to how we publish and catchup. Main change is shadow removal from the bucketlist, which changes how buckets are merged (and hence, producing different buckets, which affects ledger header hash).
This change implies that on publish, it is no longer necessary to resolve future buckets. Moreover, it is not even necessary to store future buckets in half-merged state, since bucketlist can fully recover the merges by just using its currs and snaps.
This change also implements the upgrade mechanism. Specifically, upon the upgrade (which will be properly timed), in-progress merges are dropped, and new-style merges are started.