Publish throttles catchup #2292

marta-lokhova · 2019-09-26T21:19:47Z

Update offline catchup flow to avoid creating huge backlog of checkpoints to publish

MonsieurNicolas · 2019-10-10T00:01:21Z

src/catchup/ApplyCheckpointWork.cpp

-        auto done = lm.getLastClosedLedgerNum() == mCheckpointRange.mLast;
+        auto lcl = mApp.getLedgerManager().getLastClosedLedgerNum();
+
+        // In case we just closed a ledger that unblocked publishing of the


why do we wait at all if we're done applying the checkpoint?

MonsieurNicolas · 2019-10-10T00:02:22Z

src/catchup/ApplyCheckpointWork.cpp

+        if (result && mWaitForPublish)
+        {
+            auto ledger = hm.prevCheckpointLedger(lcl);
+            if (ledger == lcl && hm.publishQueueLength() > 0)


isn't the condition on the size of the publish queue enough?

now that I think of it we should move this code into a precondition and use ConditionalWork to wrap ApplyCheckpointWork

isn't the condition on the size of the publish queue enough?

The sequence of events is as follows: a ledger that is the last one in a checkpoint is closed, triggering snapshot being queued for publish. Then ResolveSnapshotWork waits for the next ledger (aka first ledger in the next checkpoint) to be closed in order to proceed. So the "publish queue is not empty" condition is not enough here, as we need to also close the next ledger to unblock ResolveSnapshotWork before going into WORK_WAITING state.

now that I think of it we should move this code into a precondition and use ConditionalWork to wrap ApplyCheckpointWork

Right, I think initially this was a bit tricky to do, since I did not allow any drift in catchup vs publish. But with the min/max queue size to trigger wait/applying that you suggested in the other comment, this should be cleaner to do with ConditionalWork (since we're waiting for checkpoint far enough in the past to complete publishing, so I think it's not going to have the ResolveSnapshotWork problem I mentioned above)

MonsieurNicolas · 2019-10-10T00:04:29Z

src/catchup/ApplyCheckpointWork.h

@@ -56,14 +56,16 @@ class ApplyCheckpointWork : public BasicWork
    medida::Meter& mApplyLedgerFailure;

    bool mFilesOpen{false};
+    bool const mWaitForPublish;
+    std::unique_ptr<VirtualTimer> mPublishTimer;


technically it's really mWaitForPublishQueueTimer

MonsieurNicolas · 2019-10-10T00:04:56Z

src/catchup/ApplyCheckpointWork.cpp

@@ -137,6 +140,13 @@ ApplyCheckpointWork::applyHistoryOfSingleLedger()
        return false;
    }

+    if (header.ledgerSeq > mLedgerRange.mLast)


unless you have a bug, this should never happen

Oh I see: that's because you're calling applyHistoryOfSingleLedger when coming back from a wait for publish even though you may be done (see my other comment)

MonsieurNicolas · 2019-10-10T00:20:08Z

src/catchup/ApplyCheckpointWork.cpp

+    auto& hm = mApp.getHistoryManager();
+    if (mPublishTimer)
+    {
+        if (hm.publishQueueLength() > 0)


you should use constants instead of 0 in both locations, and take the opportunity to define the two thresholds (0 is too aggressive, I think):

increase the size of the queue needed to trigger a wait - let's say 32

define the size of the queue to unblock applying ledgers - let's say 16 for now

Seems like the way ConditionalWork is implemented, it'd require changing state inside the lambda to know whether to wait or apply when publish queue size it in between 16 and 32 (which I think is quite error-prone and also doesn't comply with ConditionFn contract). I added the lower boundary of 16 for now to preserve the const-ness of lambda, what do you think?

What do you mean by changing state inside the lambda @marta-lokhova ? You mean the conditional is stateful? I don't think this is a problem: we do this all the time when we capture a pointer to a class.

MonsieurNicolas · 2019-10-11T21:28:46Z

r+ 4cec6af

Publish throttles catchup Reviewed-by: MonsieurNicolas

MonsieurNicolas requested changes Oct 10, 2019

View reviewed changes

MonsieurNicolas added this to In progress in v12.1.0 via automation Oct 10, 2019

Make helper functions in HistoryManager const

4428c5e

marta-lokhova force-pushed the throttle_catchup_for_publish branch from 7e856c2 to bb4451b Compare October 11, 2019 01:06

marta-lokhova added 2 commits October 11, 2019 14:00

Throttle catchup when publish cannot keep up

68e7d24

Test catchup throttling

4cec6af

marta-lokhova force-pushed the throttle_catchup_for_publish branch from bb4451b to 4cec6af Compare October 11, 2019 21:21

latobarita added a commit that referenced this pull request Oct 11, 2019

Merge pull request #2292 from marta-lokhova/throttle_catchup_for_publish

e8707c9

Publish throttles catchup Reviewed-by: MonsieurNicolas

latobarita merged commit 4cec6af into stellar:master Oct 11, 2019

v12.1.0 automation moved this from In progress to Done Oct 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Publish throttles catchup #2292

Publish throttles catchup #2292

marta-lokhova commented Sep 26, 2019

MonsieurNicolas Oct 10, 2019

MonsieurNicolas Oct 10, 2019

MonsieurNicolas Oct 10, 2019

marta-lokhova Oct 10, 2019

MonsieurNicolas Oct 10, 2019

MonsieurNicolas Oct 10, 2019

MonsieurNicolas Oct 10, 2019

MonsieurNicolas Oct 10, 2019

marta-lokhova Oct 11, 2019

MonsieurNicolas Oct 11, 2019

MonsieurNicolas commented Oct 11, 2019

Publish throttles catchup #2292

Publish throttles catchup #2292

Conversation

marta-lokhova commented Sep 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MonsieurNicolas commented Oct 11, 2019