Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hot/cold block storage #754

Closed
arnetheduck opened this issue Feb 19, 2020 · 16 comments
Closed

Hot/cold block storage #754

arnetheduck opened this issue Feb 19, 2020 · 16 comments
Labels
bounty good first issue Good for newcomers

Comments

@arnetheduck
Copy link
Member

We're currently using a key-value store for storing states and blocks. Due to the nature of eth2, when finalization happens a single history of blocks is chosen to be canonical, thus it would be efficient to store the block database in a cold storage that is a flat append-only file.

There are a few ways to design this - an example is keeping two files: one for blocks (which are variable-size) and an index file which contains fixed-size offsets - this would allow random-access to blocks by their block number.

It also probably makes sense to store block hashes - these can either go in the block file or a third file containing only hashes.

Another design is to keep offsets in the ordinary key-value store (for example with slot number as key, offset and hash as value) so that in total we have the kvstore and one cold-store file to deal with.

Finally, the block graph is currently stored in-memory - possibly, this could sit in the database as well, saving memory but increasing database traffic - the tradeoff is not clear here as the block graph is "fairly" light-weight.

@arnetheduck arnetheduck added the good first issue Good for newcomers label Feb 19, 2020
@arnetheduck arnetheduck changed the title Hot/cold storage Hot/cold block storage Feb 19, 2020
@zah zah added the bounty label Feb 25, 2020
@zah
Copy link
Contributor

zah commented Feb 25, 2020

The API of the cold database should allow us to efficiently memory-map the SSZ representation of a particular block without loading it in memory. We can use SszNavigator objects to extract any data of interest:

https://github.com/status-im/nim-beacon-chain/blob/8ab0248209aba82cfdf6e64dacf2e21753a5a55a/tests/test_ssz.nim#L85-L89

Since Nim cannot safely return openArrays yet, the best way to design the API is rely on callback closures that will receive the memory-mapped data as an argument:

https://github.com/status-im/nim-beacon-chain/blob/2a67ac3c05859af682994facc36e646a3febc24a/tests/test_kvstore.nim#L32-L34

The description above by @arnetheduck focuses on our needs for storing the history of BeaconBlocks. Please note that we'll also need to store the latest finalized state and potentially periodic snapshots of earlier states. It may be premature to propose designs for this as we're planning to introduce some level of data sharing between different beacon states that may be also used in the on-disk representation.

@disruptek
Copy link

Here's a simple nimterop wrapper for lmdb. It's pretty great. Golden isn't a super example of its use, honestly. Maybe I'll finish it someday.

Anyway, if you like this API, you can use it to close this issue.

import os

import nimterop/[build, cimport]

const
  baseDir = getProjectCacheDir("nimlmdb")

static:
  #cDebug()

  gitPull(
    "https://github.com/LMDB/lmdb",
    outdir = baseDir,
    checkout = "mdb.master"
  )

getHeader(
  "lmdb.h",
  outdir = baseDir / "libraries" / "liblmdb"
)

type
  mode_t = uint32

when defined(lmdbStatic):
  cImport(lmdbPath)
else:
  cImport(lmdbPath, dynlib = "lmdbLPath")

@arnetheduck
Copy link
Member Author

arnetheduck commented Mar 1, 2020

see also https://github.com/status-im/nim-beacon-chain/blob/devel/beacon_chain/kvstore_lmdb.nim - we've tried lmdb but it has issues on 32-bit platforms and needs local patching on windows - it's not great for our use case.

we use sqlite for now which also uses mmap if available but something else otherwise.

the point here is though that we don't want a database at all - the nature of the data is such that it's append-only - it allows for a very robust and trivially simple implementation with a flat file and an accompanying flat index - the lmdb btree would be overkill.

@arnetheduck
Copy link
Member Author

re nimterop, we have a preference not to have it as a dependency for whoever is building the code - see https://github.com/arnetheduck/nim-sqlite3-abi (we've produced wrappers manually as well as with c2nim, for this reason)

@disruptek
Copy link

It sounds like the best course of action is to let @protolambda tell us when the design is fairly stable and then use it to inform the hot/cold storage approach. It sounds like there may be two layers required; one which is append-only and never requires compaction, and another that is append-only and rarely requires compaction.

But I'm really trying to read between the lines here on something I know nothing at all about. 😉

@arnetheduck
Copy link
Member Author

This part of the design is stable: the way ethereum 2 works is that once finalization happens, there is not ever any rollback - the blocks that are older than the finalization point form a simple linear history, thus are append-only.

The blocks that are newer than finalization will be accessed randomly by hash - this is why they should be stored in an "ordinary" key/value store to begin with - even if it's likely that they are "almost-linear", we shouldn't make that assumption right now as it may open up for potential for DoS attacks, if accessing random non-finalized blocks is not constant time.

for some intuition as to what kind of requests will be made from the database, the networking spec is a good source:
https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#beaconblocksbyrange
https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#beaconblocksbyroot

the finalized blocks are accessed pretty much by their slot number while non-finalized blocks are accessed randomly - the databases in use should reflect this, storing the former in an append-only and the latter in.. well, they can stay in the KV store for now - there's an upper bound of about two weeks worth of blocks for how many there can be in the system.

@tersec tersec mentioned this issue Mar 13, 2020
@JGcarv
Copy link
Contributor

JGcarv commented Mar 13, 2020

Hello! I'm interested in this one! Is there still time for taking it? Thanks!

@zah
Copy link
Contributor

zah commented Mar 13, 2020

It's all yours, @JGcarv. We'll be happy to fund 2 days of work for creating a very basic initial implementation with an accompanying test suite. After reviewing the initial results, we will reassess the goals and suggest further directions.

@JGcarv
Copy link
Contributor

JGcarv commented Mar 13, 2020

Awesome. Thank you!

@arnetheduck
Copy link
Member Author

This topic has evolved a little since we last looked at it:

  • e2store: add era format #2382 provides a flat storage format that combines a state with the blocks that lead up to it - the interesting part here is that the file is self-contained, trivially verifiable and has all the roots and keys needed to fully validate the data - starting with an era file for the genesis state, we can produce a new era file every 8192 blocks (once per day more or less)
  • Because the flat file format is verifiable, it's also suitable for wider distribution, such as when dealing with weak subjectivity sync
  • Between head and the latest era, we can use https://github.com/status-im/nimbus-eth2/blob/stable/beacon_chain/statediff.nim and immutable validator database factoring #2297 to efficiently store states and diffs - these two features taken together mean we'll have a good balance between small footprint and simplicity of use, specially if the era files are indexed.

A downside of this approach is that we lose "here's an sqlite database with everything" world - but that's already the case somewhat with the slashing protection, validator keys and secrets being separate.

@TennisBowling
Copy link
Contributor

this should be closed

@tersec
Copy link
Contributor

tersec commented Feb 15, 2022

Why? Nimbus still doesn't really have the hot/cold storage distinction this issue proposes. Era files have been gradually developing (#3394 develops them a bit further, for example), but they're not yet functionally exposed to end-users except via ncli_db.

@TennisBowling
Copy link
Contributor

TennisBowling commented Feb 16, 2022

It seems that hot/cold storage wasn't being went after anymore

since this PR was created, we've pivoted towards using era files and state diffs as a future direction for hot/cold - closing as obsolete

#835

@tersec
Copy link
Contributor

tersec commented Feb 16, 2022

Yes, "using era files and state diffs as a future direction for hot/cold". That particular PR was closed as obsolete, but hot/cold block storage remains a goal, and as the sentence you quote suggests, era files and state diffs, neither of which is really end-user-visible yet modulo ncli/ncli_db, are the current approach to achieving that. This issue tracks hot/cold block storage overall, not just as a proxy for that one PR.

@TennisBowling
Copy link
Contributor

ah I see. thank you

@arnetheduck
Copy link
Member Author

The era store provides hot/cold storage functionality - further work in this area will be tracked separately: https://nimbus.guide/era-store.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bounty good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

7 participants
@zah @arnetheduck @disruptek @tersec @JGcarv @TennisBowling and others