Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(VDB-1249) backfill storage #71

Merged
merged 7 commits into from
Apr 7, 2020
Merged

(VDB-1249) backfill storage #71

merged 7 commits into from
Apr 7, 2020

Conversation

rmulhol
Copy link

@rmulhol rmulhol commented Mar 31, 2020

TODO:

  • inline getStorageValue command to backfillStorage
  • intelligent insert backfill diff function (no dupes)

@rmulhol rmulhol changed the title [WIP] (VDB-1249) backfill storage (VDB-1249) backfill storage Mar 31, 2020
@rmulhol
Copy link
Author

rmulhol commented Mar 31, 2020

Dockerfile updates will be made in transformers repo since config is required

addresses and storage slots must be back-filled).
Requires CLI flags are backfill-storage-start-block (-s) and backfill-storage-end-block (-e) to define range of blocks
that need to be back-filled.
Optional CLI flag is backfill-storage-contract-address (-a) to specify a single contract address that needs to be

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this flag is passed in, is the config file still required?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah bc still need the transformer in the exporter for the keys loader

that need to be back-filled.
Optional CLI flag is backfill-storage-contract-address (-a) to specify a single contract address that needs to be
back-filled (if not necessary for all transformers).
Before running this command, verify that you have run headerSync and execute for the desired blocks. Headers are

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should add a note here (and/or in the README) about if this can be run while execute and headerSync are running.

RETURN 0;
END IF;

IF last_storage_value is null and create_back_filled_diff.storage_value = empty_storage_value THEN

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mentioned in standup that the backfill process will need to be run in order (by block number) since otherwise this piece would potentially miss setting a value to zero when it should be, right? Is there anyway we can enforce this, to make sure that we're backfilling in order?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I'm seeing now that HeaderRepository. GetHeadersInRange sorts the headers it's returning, nice!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, still a good thought - will think about ways we can verify that all headers have been synced/checked. Seems like we could verify that we have all headers in the range and that they have a check count > 0, at least

@@ -55,6 +55,14 @@ func (repository diffRepository) CreateStorageDiff(rawDiff types.RawDiff) (int64
return storageDiffID, err
}

func (repository diffRepository) CreateBackFilledStorageValue(rawDiff types.RawDiff) (int64, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, i like this being it's own method, rather than reusing CreateStorageDiff. 💯

I wonder if this needs to return a the diff id, since we're not using it in the StorageValueLoader. But it may be nice to keep a similar patter as CreateStorageDiff. 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dig it, removed 👍

@rmulhol
Copy link
Author

rmulhol commented Apr 2, 2020

This is pretty slow running on my local machine (~1:30/block, with an incomplete set of keys). Will continue brainstorming ways to speed it up. Unfortunately only getting relevant keys (e.g. exclude entries in mappings and dynamic arrays where event was fired at a later block) imposes some additional cost since we'd need to refresh keys for every header (as opposed to once for all headers), but maybe that's worth it if we can exclude rpc calls 🤔

OTOH I'm a bit tempted to merge and fire it off for the first chunk of missing diffs to see how performance differs in a different env

@rmulhol rmulhol force-pushed the vdb-1249-backfill branch 10 times, most recently from 432ae7a to aea3f2e Compare April 2, 2020 21:14
@rmulhol
Copy link
Author

rmulhol commented Apr 2, 2020

Update: batching improves performance by 80% locally to 18s/block 🎉

@@ -104,6 +104,33 @@ func (blockChain *BlockChain) LastBlock() (*big.Int, error) {
return block.Number, err
}

func (blockChain *BlockChain) BatchGetStorageAt(account common.Address, keys []common.Hash, blockNumber *big.Int) (map[common.Hash][]byte, error) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

- Enable backfilling storage from multiple contracts across a range of
  blocks
- redundant with backfillStorage (can pass single block for starting
  and ending block to achieve same results)
- Avoid inserting storage values if they duplicate a previous block or
  are zero-valued and do not have a previous value (e.g. when looking
  up storage keys that haven't yet been assigned).
- NOTE: excluding zero-valued storage values for storage keys that
  were not previously associated with a non-zero value assumes that we
  are backfilling by block number in ascending order. Using this
  function before all previous values have been populated could cause
  a failure to persist legitimate assignments to zero.
@rmulhol rmulhol merged commit 97afac1 into staging Apr 7, 2020
@rmulhol rmulhol deleted the vdb-1249-backfill branch April 7, 2020 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants