ingest: Reuse Stellar-Core on-disk DB in online mode #4471

bartekn · 2022-07-21T13:07:43Z

PR Checklist

PR Structure

This PR has reasonably narrow scope (if not, break it down into smaller PRs).
This PR avoids mixing refactoring changes with feature changes (split into two PRs
otherwise).
This PR's title starts with name of package that is most changed in the PR, ex.
services/friendbot, or all or doc if the changes are broad or impact many
packages.

Thoroughness

This PR adds tests for the most critical parts of the new functionality or fixes.
I've updated any docs (developer docs, .md
files, etc... affected by this change). Take a look in the docs folder for a given service,
like this one.

Release planning

I've updated the relevant CHANGELOG (here for Horizon) if
needed with deprecations, added features, breaking changes, and DB schema changes.
I've decided if this PR requires a new major/minor version according to
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.

What

This commit changes the behaviour of stellarCoreRunner when using an on-disk DB in online mode to check if existing storage dir contains the DB in a state that allows Captive Core to start without rebuilding Stellar-Core state. In short, it checks (by using stellar-core offline-info command) if the LCL of Stellar-Core matches the requested ledger in startFrom.

Close #4454.

Why

While applying state from buckets was relatively fast in memory mode of Captive Core it can be extremely slow when using disk. This change allows reusing existing state in most cases.

Known limitations

[TODO or N/A]

…n/go into reuse-core-on-disk-db-online-mode

sreuland · 2022-07-22T19:04:53Z

ingest/ledgerbackend/stellar_core_runner.go

+		if err != nil {
+			r.log.Infof("Error running offline-info: %v, removing existing storage-dir contents", err)
+			removeStorageDir = true
+		} else if uint32(info.Info.Ledger.Num) != from {


do you know how core maintains info.Info.Ledger.Num , i.e. does it only bump it when it knows the meta record for that sequence was read off the pipe? wondering if info.Info.Ledger.Num will tend to be farther ahead than from which represents the last sequence that horizon read off the pipe(and serialized to history), if it does drift asynchronously from meta pipe reader activity(horizon), then this condition won't get hit much, right, result being it ends up in same routine of new-db/catchup?

To my best knowledge and some experimenting it seems that Stellar-Core only closes the ledger once it's read from meta pipe. This leaves us with two cases:

Horizon is catching up (after restart or state build) - it this case bufferedLedgerMetaReader can read ledgers from meta pipe upfront which will make the Horizon to be behind. In this case, when Horizon is stopped with ledgers in the buffer the solution in this PR will not work because the ledger sequences in will not match on restart. We can try removing bufferedLedgerMetaReader in online mode but I'm not sure about performance of this change. We can explore it in a separate PR.

Horizon is ingesting latest ledgers - in this case the bufferedLedgerMetaReader will contain up to one ledger but if Horizon is shutdown gracefully it will process this ledger before shutting down.

ok, that's interesting, meaning there's only one ledger of data present in that pipe at any time, sounds like core writer blocks until it's empty, which is the signal that prior ledger was read, but, this at least recovers from any out-of-sync case and worst outcome is it does the same as current day of full removal first and init first.

sreuland · 2022-07-22T19:11:57Z

ingest/ledgerbackend/stellar_core_runner.go

-			return errors.Wrap(err, "error initializing core db")
+		// Check if on-disk core DB exists and what's the LCL there. If not what
+		// we need remove storage dir and start from scratch.
+		removeStorageDir := false


might be worthwhile to add a unit test in stellar_core_runner_test.go to assert this new outcome?

Just a quick update on this: I'm working on refactoring stellarCoreRunner to allow writing better unit tests. I'll have a new commit ready by the end of today.

Actually while refactoring I changed some other parts of stellarCoreRunner that seemed inconsistent. Would you mind 👍 this PR (if there is nothing else that requires changes) and I'll open another PR with refactoring and tests?

@sreuland follow up PR: #4480

sreuland

nice solution with minimal coding!

This commit changes the behaviour of `stellarCoreRunner` when using an on-disk DB in online mode to check if existing storage dir contains the DB in a state that allows Captive Core to start without rebuilding Stellar-Core state. In short, it checks (by using `stellar-core offline-info` command) if the LCL of Stellar-Core matches the requested ledger in `startFrom`. This was done because while applying state from buckets was relatively fast in memory mode of Captive Core it can be extremely slow when using disk. This change allows reusing existing state in most cases. Close stellar#4454.

bartekn added 4 commits July 21, 2022 15:06

ingest: Reuse Stellar-Core on-disk DB in online mode

e2aa8ac

Merge branch 'master' into reuse-core-on-disk-db-online-mode

b7f834c

gofmt

5dab5b7

Merge branch 'reuse-core-on-disk-db-online-mode' of github.com:bartek…

e756d13

…n/go into reuse-core-on-disk-db-online-mode

bartekn requested a review from a team July 21, 2022 15:44

bartekn marked this pull request as ready for review July 21, 2022 15:44

bartekn mentioned this pull request Jul 21, 2022

/services/horizon/ingest: captive core on-disk ingestion, optimize catchup times #4454

Closed

sreuland reviewed Jul 22, 2022

View reviewed changes

sreuland approved these changes Jul 25, 2022

View reviewed changes

bartekn merged commit 4850d22 into stellar:master Jul 26, 2022

bartekn deleted the reuse-core-on-disk-db-online-mode branch July 26, 2022 10:47

sreuland mentioned this pull request Jul 26, 2022

exp/lighthorizon: Add an on-disk cache for frequently accessed ledgers. #4457

Merged

sreuland mentioned this pull request Aug 8, 2022

Consider adding a flag in Horizon to reset captive core DB #4067

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ingest: Reuse Stellar-Core on-disk DB in online mode #4471

ingest: Reuse Stellar-Core on-disk DB in online mode #4471

bartekn commented Jul 21, 2022 •

edited

sreuland Jul 22, 2022

bartekn Jul 25, 2022

sreuland Jul 25, 2022

sreuland Jul 22, 2022

bartekn Jul 25, 2022

bartekn Jul 25, 2022

bartekn Jul 26, 2022

sreuland left a comment

ingest: Reuse Stellar-Core on-disk DB in online mode #4471

ingest: Reuse Stellar-Core on-disk DB in online mode #4471

Conversation

bartekn commented Jul 21, 2022 • edited

PR Structure

Thoroughness

Release planning

What

Why

Known limitations

sreuland Jul 22, 2022

Choose a reason for hiding this comment

bartekn Jul 25, 2022

Choose a reason for hiding this comment

sreuland Jul 25, 2022

Choose a reason for hiding this comment

sreuland Jul 22, 2022

Choose a reason for hiding this comment

bartekn Jul 25, 2022

Choose a reason for hiding this comment

bartekn Jul 25, 2022

Choose a reason for hiding this comment

bartekn Jul 26, 2022

Choose a reason for hiding this comment

sreuland left a comment

Choose a reason for hiding this comment

bartekn commented Jul 21, 2022 •

edited