-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
immutable validator database factoring #2297
Conversation
3cf527a
to
ba4e3f5
Compare
15b0c61
to
3c2a897
Compare
beacon_chain/beacon_chain_db.nim
Outdated
doAssert immutableValidators.len >= intermediateOutput[].validators.len | ||
|
||
# TODO factor out, maybe | ||
for i in 0 ..< intermediateOutput[].validators.len: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the output has validators, we can skip "updating" them here.
also, this code should use a pointer trick (unfortunately) to assign the fields, ie
let validator = addr validators[i]
assign(validator[], immutableValidators[i].withdrawal_credentials)
...
or it will keep invalidating the cache for the same entry over and over which is quite inefficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a neater way to solve it is actually to add func assign(tgt: var Validator, src: immutableValidator)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this version's faster, but I reimplemented this portion because it seemed to be a reasonable way to add an item to the end of the List
while simultaneously constructing the object. If it's still better to add a default object, then fill it in with assign
, this can be revisited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, really the thing that makes the most sense is to update output.validators
rather than overwriting it then readding the entries one by one - this will avoid 99% of the copying every time because the immutable validator set is.. well.. immutable..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would essentially mean turning getBeaconStateNoImmutableValidators
into an updateBeaconStateNoImmutableValidators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would play well with how getstate is typically use: we pass it some random but valid state that usually is "close" to what we're looking for, ie different fork, a few slots back etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
615bb9a
to
cf73c6c
Compare
Updated Total
So this branch applies blocks from the first 150k slots using A typical
for
for this branch. While other rows are noisy, but not consistently different between the two branches, the database store is about 60% faster in this branch on average than |
one thing the current bench suite doesn't do is reload states - might be a worthy addition to have |
It's slow, so it's only doing it every 32 states stored. Maybe that ratio should be even higher. This branch, 100k validators, running
These results aren't comparable to the others here, because they're on a different system (NVMe SSD, but probably slower CPU). They do point to a weak point of this PR that needs improving before merging, though. It's good that stores remain several times faster, and the space savings are nice, but pessimizing load this much isn't reasonable. |
After 8fe0f1d
|
uh, yeah, that load looks completely weird - just like less data is being stored, loading less data should be faster.. |
It's loading in an entire There's also a reason I commented in the code since the beginning that the correct way of doing that part isn't this copying approach, but reading the SSZ directly into a It also allocates a wholly new |
After f0a5f65
Evidently, 3/4 of the database load time had been related to that |
It doesn't, when it actually tries to reuse the 4b93c6d has some brute-force fixes for this, which still seem to perform adequately:
|
|
After d41c429
It seems to be stabilizing around there. The It does suggest that the |
…ng into beacon state DB
…ffers; add ncli_db and a unit test to verify this
…ist cache invalidation issues
…forwards-compatibility
…bleValidators objects
@@ -582,6 +582,11 @@ type | |||
ValidatorStatus* = object | |||
# This is a validator without the expensive, immutable, append-only parts | |||
|
|||
pubkey* {.dontserialize.}: ValidatorPubKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do these need to be here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They enable
nimbus-eth2/beacon_chain/beacon_chain_db.nim
Lines 372 to 378 in f7c86be
proc putState*(db: BeaconChainDB, key: Eth2Digest, value: var BeaconState) = | |
# TODO prune old states - this is less easy than it seems as we never know | |
# when or if a particular state will become finalized. | |
updateImmutableValidators(db, db.immutableValidatorsMem, value.validators) | |
db.put( | |
subkey(BeaconStateNoImmutableValidators, key), | |
cast[ref BeaconStateNoImmutableValidators](addr value)[]) |
and
nimbus-eth2/beacon_chain/beacon_chain_db.nim
Lines 461 to 464 in f7c86be
case db.get( | |
subkey( | |
BeaconStateNoImmutableValidators, key), | |
cast[ref BeaconStateNoImmutableValidators](addr output)[]) |
which is clarified slightly by
nimbus-eth2/beacon_chain/beacon_chain_db_immutable.nim
Lines 73 to 75 in f7c86be
static: | |
doAssert sizeof(Validator) == sizeof(ValidatorStatus) | |
doAssert sizeof(BeaconState) == sizeof(BeaconStateNoImmutableValidators) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither ValidatorStatus
nor BeaconStateNoImmutableValidators
is an object type that's ever exactly instantiated in this version; they're most usefully considered as instructions to the SSZ (de)serializer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's document that in the type? it's.. a bit weird - ie the intent is to match the ABI basically, so it's good to write that down
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From #2297 (comment) and #2297 (comment) one can see the improvement from this approach, from
All time are ms
Average, StdDev, Min, Max, Samples, Test
Validation is turned off meaning that no BLS operations are performed
1340.984, 0.000, 1340.984, 1340.984, 1, Initialize DB
0.327, 0.987, 0.079, 255.120, 91447, Load block from database
61.806, 0.000, 61.806, 61.806, 1, Load state from database
0.627, 1.463, 0.025, 446.327, 96875, Advance slot, non-epoch
41.311, 3.812, 10.844, 72.374, 3125, Advance slot, epoch
3.447, 7.850, 0.057, 252.413, 91447, Apply block, no slot processing
24.812, 6.950, 17.393, 84.707, 2348, Database load
37.135, 4.519, 24.421, 73.783, 2348, Database store
real 13m10.429s
user 12m2.704s
sys 0m30.644s
to
All time are ms
Average, StdDev, Min, Max, Samples, Test
Validation is turned off meaning that no BLS operations are performed
1234.699, 0.000, 1234.699, 1234.699, 1, Initialize DB
0.377, 0.694, 0.071, 133.208, 91447, Load block from database
49.274, 0.000, 49.274, 49.274, 1, Load state from database
0.566, 1.264, 0.025, 385.073, 96875, Advance slot, non-epoch
39.309, 2.966, 9.613, 49.510, 3125, Advance slot, epoch
3.163, 7.633, 0.042, 233.757, 91447, Apply block, no slot processing
18.837, 5.613, 9.518, 53.715, 2348, Database load
32.238, 29.068, 17.496, 1401.262, 2348, Database store
real 12m4.686s
user 10m54.210s
sys 0m30.473s
Average database load times decreased by 25%, database store times by more than 10%, and overall ncli_db
runtime by almost 10% on the first 100k Pyrmont slots.
I also ran ncli_db pruneDatabase
on my Pyrmont node's 19GB database, which it compressed to about 11GB, for a 40% decrease in overall database size. This is reduced from the 85% here due to all the blocks, which remain unchanged, but it's still a useful improvement. It won't be representative of nbc until the full-state-writing can be dropped, though.
beacon_chain/beacon_chain_db.nim
Outdated
doAssert s.isOk # TODO(zah) Handle this in a better way | ||
var sqliteStore = | ||
if inMemory: | ||
SqStoreRef.init("", "test", inMemory = true).expect("working database (disk broken/full?)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, the joys of copy-paste :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shows the approximate gains one can get though, and the general approach.
This is part of state diffs as a whole, but a prerequisite and is separately useful.
Overall
ncli_db bench --slots=150000
times with mainnet blocks, v1.0.7 (first row) vs this PR (second row):where columns are (n, 25th percentile, mean, 75th percentile) in seconds. It's about 20% faster overall.
For "Database block store" specifically, which here includes state storage in milliseconds, it's about 60% faster:
Typical
ncli_db
summaries look like, for v1.0.7:and for this branch:
For database size, v1.0.7 (with noted ncli_db patch) creates a 17GiB
benchmark/nbc.sqlite3
file snapshotting every epoch, while this branch creates a 3.9GiBbenchmark/nbc.sqlite3
file under the same conditions, for a 75% reduction in size, and, presumably, disk I/O. The gap increases with slot number.For v1.0.7 benchmarking, I first applied:
To simulate the basic operation of state storage by chain_dag. It doesn't account for some of the finalization pruning, but that only shifts the total amounts, not the ratios/proportions here. Similarly, without checkpointing, it just builds up the WAL file, so it checkpoints exactly as often as nbc.