chaindag: don't keep backfill block table in memory #3429

arnetheduck · 2022-02-22T15:24:50Z

This PR names and documents the concept of the archive: a range of slots
for which we have degraded functionality in terms of historical access -
in particular:

we don't support rewinding to states in this range
we don't keep an in-memory representation of the block dag

The archive de-facto exists in a trusted-node-synced node, but this PR
gives it a name and drops the in-memory digest index.

In order to satisfy GetBlocksByRange requests, we ensure that we have
blocks for the entire archive period via backfill. Future versions may
relax this further, adding a "pre-archive" period that is fully pruned.

During by-slot searches in the archive (both for libp2p and rest
requests), an extra database lookup is used to covert the given slot
to a root - future versions will avoid this using era files which
natively are indexed by slot. That said, the lookup is quite
fast compared to the actual block loading given how trivial the table
is - it's hard to measure, even.

A collateral benefit of this PR is that checkpoint-synced nodes will see
100-200MB memory usage savings, thanks to the dropped in-memory cache -
future pruning work will bring this benefit to full nodes as well.

document chaindag storage architecture and assumptions
look up parent using block id instead of full block in clearance
(future-proofing the code against a future in which blocks come from era
files)
simplify finalized block init, always writing the backfill portion to
db at startup (to ensure lookups work as expected)
preallocate some extra memory for finalized blocks, to avoid immediate
realloc

github-actions · 2022-02-22T16:43:04Z

Unit Test Results

    12 files ±0   821 suites ±0 37m 47s ⏱️ + 5m 15s
1 671 tests ±0 1 625 ✔️ ±0   46 💤 ±0 0 ❌ ±0
9 755 runs ±0 9 655 ✔️ ±0 100 💤 ±0 0 ❌ ±0

Results for commit a109dfc. ± Comparison against base commit 7de3f00.

♻️ This comment has been updated with latest results.

This PR names and documents the concept of the archive: a range of slots for which we have degraded functionality in terms of historical access - in particular: * we don't support rewinding to states in this range * we don't keep an in-memory representation of the block dag The archive de-facto exists in a trusted-node-synced node, but this PR gives it a name and drops the in-memory digest index. In order to satisfy `GetBlocksByRange` requests, we ensure that we have blocks for the entire archive period via backfill. Future versions may relax this further, adding a "pre-archive" period that is fully pruned. During by-slot searches in the archive (both for libp2p and rest requests), an extra database lookup is used to covert the given `slot` to a `root` - future versions will avoid this using era files which natively are indexed by `slot`. That said, the lookup is quite fast compared to the actual block loading given how trivial the table is - it's hard to measure, even. A collateral benefit of this PR is that checkpoint-synced nodes will see 100-200MB memory usage savings, thanks to the dropped in-memory cache - future pruning work will bring this benefit to full nodes as well. * document chaindag storage architecture and assumptions * look up parent using block id instead of full block in clearance (future-proofing the code against a future in which blocks come from era files) * simplify finalized block init, always writing the backfill portion to db at startup (to ensure lookups work as expected) * preallocate some extra memory for finalized blocks, to avoid immediate realloc

arnetheduck force-pushed the hello-archive branch from 99033c1 to 78d3eab Compare February 22, 2022 17:41

arnetheduck added 2 commits February 23, 2022 09:13

more backfill position tests

fae01cb

fix tail iteration

a109dfc

tersec approved these changes Feb 26, 2022

View reviewed changes

arnetheduck merged commit 40a4c01 into unstable Feb 26, 2022

arnetheduck deleted the hello-archive branch February 26, 2022 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chaindag: don't keep backfill block table in memory #3429

chaindag: don't keep backfill block table in memory #3429

arnetheduck commented Feb 22, 2022

github-actions bot commented Feb 22, 2022 •

edited

chaindag: don't keep backfill block table in memory #3429

chaindag: don't keep backfill block table in memory #3429

Conversation

arnetheduck commented Feb 22, 2022

github-actions bot commented Feb 22, 2022 • edited

Unit Test Results

github-actions bot commented Feb 22, 2022 •

edited