Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix consensus bug from deleting account in same block that creates it #1913

Merged
merged 5 commits into from Feb 10, 2020

Conversation

@carver
Copy link
Contributor

carver commented Jan 29, 2020

What was wrong?

Fix #1912

How was it fixed?

The bug was: when a previous transaction creates/modifies an account,
and then is "locked", those pending changes are written at the end of the block, even if the
account is deleted is a subsequent transaction.

The fix was: clear out all "locked" storage changes (those made during previous transactions) when you lock in an account deletion.

That fix brought up a new interesting case, where the "dirty" flag wasn't noticing when an account started the block as empty, then some storage was added in one transaction, then in a later transaction it was deleted again. The further solution was to track a history of of storage tries that can be reverted to (a new one added on every account delete), and generously treat the storage as dirty if there were ever any writes in the block, even if the trie currently has the same state root as the root was at the beginning of the block.

To-Do

  • Clean up commit history
  • more documentation
  • Add entry to the release notes

Cute Animal Picture

put a cute animal picture link inside the parentheses

eth/db/storage.py Outdated Show resolved Hide resolved
@carver carver force-pushed the carver:fix-1912-consensus-bug branch from f4cba85 to 86be04f Jan 30, 2020
@carver

This comment has been minimized.

Copy link
Contributor Author

carver commented Jan 30, 2020

Well, making progress: Block 116526 is now importing!

... but it only got to 116530 before failing again 🤦‍♂

I'll wait to figure that out before posting this for review.

It's pretty concerning that we're getting all these consensus bugs while still passing the ethereum/tests! ☹️

state.lock_changes()
assert state.get_storage(ADDRESS, 0) == 0

# delete account in next "transaction"

This comment has been minimized.

Copy link
@veox

veox Jan 30, 2020

Contributor

Minor: probably meant to say "revive" instead of "delete".

@veox

This comment has been minimized.

Copy link
Contributor

veox commented Jan 30, 2020

Block #116530 is very similar: calls are made to create and destroy contracts... Not sure what's different from the previous one, I'll take a look now.

@veox

This comment has been minimized.

Copy link
Contributor

veox commented Jan 30, 2020

Well, making progress: Block 116526 is now importing!

It's #116525 there was trouble with... Are you sure you didn't import that using some other code by accident (e.g. f4cba85)? Because I'm trying Trinity (at current master) with this PR's branch (at 86be04f), and am still failing to import #116525.


Although 86be04f does help py-evm pass the test in 16a9b34.

I've made sure that my Trinity installation indeed has the changes from py-evm's 86be04f in its virtualenv dependency... Not sure what's going on here.

@pipermerriam

This comment has been minimized.

Copy link
Member

pipermerriam commented Jan 30, 2020

I think we may be near a point where we need to pay down tech debt in this area. It seems we've ended up with lots of layers here and I at least am having a lot of difficulty reasoning about how we manage all of the nuances related to updating account and contract state.

@carver

This comment has been minimized.

Copy link
Contributor Author

carver commented Jan 30, 2020

It's #116525 there was trouble with... Are you sure you didn't import that using some other code by accident (e.g. f4cba85)?

No, you're right. I had imported it using the old code and skipped over it during sync.

@carver carver force-pushed the carver:fix-1912-consensus-bug branch from 86be04f to 36b0e5a Jan 30, 2020
@carver

This comment has been minimized.

Copy link
Contributor Author

carver commented Jan 30, 2020

I think we may be near a point where we need to pay down tech debt in this area. It seems we've ended up with lots of layers here and I at least am having a lot of difficulty reasoning about how we manage all of the nuances related to updating account and contract state.

I'm also not happy with how complex the code is, and it being a pain to debug.

Some key things that seem to be interacting in a complex way are:

  • queuing up all the state changes without writing to the trie
  • state changes can be reverted back to the beginning of a transaction, but not across transaction boundaries
  • when storage tries are deleted, it changes the storage root hash on the same timeline as the other storage changes. The root hash change must be revertable, in lock-step with all the other storage changes.
  • thinking through pre and post Byzantium: whether you have to flush the changes to the trie at each transaction, or only at the end of the block

With all those interacting factors above, I have no immediate idea how to make it better. One option is to push the complexity into JournalDB, to make it trie-aware (to handle root hash changes) and boundary-aware (to handle changes that cannot be reverted, from previous transactions). I think that's just pushing complexity around, doesn't really seem like an improvement. I would probably need a solid chunk of time to really reset my mind on how to design it a whole new way.

@carver

This comment has been minimized.

Copy link
Contributor Author

carver commented Jan 30, 2020

But some good news! The latest push actually does import block 116525, and is at 153492 and continuing...

@carver carver marked this pull request as ready for review Jan 31, 2020
@carver

This comment has been minimized.

Copy link
Contributor Author

carver commented Jan 31, 2020

Wow, a shocking number of consensus bugs that are not covered by ethereum/tests.

Found the next failed import at block #336508. Pretty interesting, only the receipt root is broken:

 - header.receipt_root:
    (actual)  : b'\xfd\x03\xd3\xd3<\x9e\xf9\xc5\xed\xde\x02aW\xd5J\xe3\xa1\xed\x0c\x85,N\\\x13d\xb2\xa6`\xf4\xa0\x9eo'
    (expected): b'\xbfk\xb0>X\x19r\xe1A\xb4\xcf\xc2\x13X\xdf\xbdDQ\xff\xbe\xfa\xb1P6\xbe\x93\xd1&_\xb2^\x14'
eth/db/storage.py Show resolved Hide resolved
@veox

This comment has been minimized.

Copy link
Contributor

veox commented Jan 31, 2020

Found the next failed import at block #336508.

Verified - indeed!

JIC, I've checked with py-evm back at master, too - the issue persists. So, it's an existing bug, not a regression in this PR.


Reverting to known "good" Trinity and py-evm (at v0.2.0a42) allows the sync to continue, importing block #336508 and beyond (ran for 10 minutes, at least up to #346858).

Re-setting to block #336507 and switching py-evm to "bad" v0.2.0a43 results in import failure ("mix hash mismatch"), which among other things shows the same incorrect receipt_root as above (b'\xfd\x03..\xa0\x9eo').

In other words, the regression lies somewhere in the same py-evm update, v0.2.0a42 to v0.2.0a43. I haven't yet verified that the culprit is also PR #1735.

@veox

This comment has been minimized.

Copy link
Contributor

veox commented Jan 31, 2020

I'm checking the intermediate state_roots for each transaction receipt; in block #336508, there are 5 txs; at indexes 0, 1, 3, 4 are calls to setAvatar(uint256,bytes) (of a contract I didn't review); at index 2, there is the call to deleteAllAvatars().

state_root at index 2 doesn't match:

  • geth says it's 0xf1af37b9b975c2626d3b07ad995f06d5286af983d3b5f0ac0a0ff47bf0e1d098;
  • trinity says it's 0x39e4b7faa521431a15be1007fb19e9d386c057bde21917520f84312e2e2ac442.

state_root in all other (four) receipts matches.


FTR, trinity produces these (no 0x prefix here):

tx0 receipt state root: baf63127abe8f8445d98c137aeea39b77cbf1008ca41df31c3f63ba1f7add9aa
tx1 receipt state root: 5b2378dc119040215d6825f639fc097e000d01f902bda9cacaec17781194f772
tx2 receipt state root: 39e4b7faa521431a15be1007fb19e9d386c057bde21917520f84312e2e2ac442
tx3 receipt state root: c5b12436b28674f895b4158abd1b2326c590aca5f37e896ad38200c73e309aa6
tx4 receipt state root: c112806aedc65b2256714051c8178556d203daf0ea626fe37c9638407d0d763e

As to what's going on: the most likely reason we're not getting a state_root mismatch for the block: the deleted data in the contract's storage is overwritten (within the same block), and the resulting state is as expected. It's just the intermediate state that's "wrong" - or, to be more specific, its root seems to be miscalculated.

eth/db/storage.py Show resolved Hide resolved
@carver

This comment has been minimized.

Copy link
Contributor Author

carver commented Jan 31, 2020

As to what's going on: the most likely reason we're not getting a state_root mismatch for the block: the deleted data in the contract's storage is overwritten (within the same block), and the resulting state is as expected.

Looks like it's even simpler than that: AccountDB isn't recognizing the storage as having changed, so it doesn't try to write out the new storage root to the account. (Note there is no Updating account 0x5dc2e401e29ea16e51ebcecaef129c15fed5a5e8 to storage root X line at the end of the transaction at index 2). Why it's not recognizing the storage as changed is currently a mystery. (ie~ AccountStorageDB.has_changed_root is returning False) I'm digging in now...

@carver

This comment has been minimized.

Copy link
Contributor Author

carver commented Jan 31, 2020

Looks like it had to do with reverting the storage root back to the root at the beginning of the block, and not treating that as a change. By always treating the storage as changed, that fixes pre-Byzantium (but seems to break something in Istanbul's CREATE2).

@carver

This comment has been minimized.

Copy link
Contributor Author

carver commented Jan 31, 2020

If the account is not created, and the new storage root is for empty storage, then we shouldn't create the account just to put a blank storage root in it.

@carver

This comment has been minimized.

Copy link
Contributor Author

carver commented Jan 31, 2020

Up to block 415133 2.36M on my local full sync so far...

The full sync has tested some blocks from:

  • Frontier
  • Homestead (1.15M)
  • DAO fork (1.92M)
  • Shanghai attacks (2.28M-2.72M)
  • Tangerine Whistle (2.463M)
  • Spurious Dragon (2.675M)
  • Byzantium (4.37M)
  • Petersburg (7.28M)
  • Istanbul (9.069M)
  • Muir Glacier (9.2M)
@carver

This comment has been minimized.

Copy link
Contributor Author

carver commented Jan 31, 2020

So maybe come back for a refactor later, after we get sync reliable and trinity to mvp. Thoughts @pipermerriam ?

@carver carver changed the title Add test that deleted account stays deleted Fix consensus bug from deleting account in same block that creates it Feb 1, 2020
@carver

This comment has been minimized.

Copy link
Contributor Author

carver commented Feb 1, 2020

@cburgdorf Seems worth double-checking that this fix also fixes the goerli test you ran. How did you set that up again and/or are you up for re-running it against this branch?

@cburgdorf

This comment has been minimized.

Copy link
Contributor

cburgdorf commented Feb 1, 2020

@veox

This comment has been minimized.

Copy link
Contributor

veox commented Feb 1, 2020

Yup! I'll check this on main-net (from scratch), and then try to sync Goerli - that's where I started, after all.

@veox

This comment has been minimized.

Copy link
Contributor

veox commented Feb 1, 2020

Confirming that the issue is no longer present when using py-evm from PR #1913.

Main-net blocks #116525 and #336508 imported successfully (I ran it up to #347457).

So has Goerli block #226323 (still running, #472783 ATM).

Copy link
Member

pipermerriam left a comment

👍

@@ -120,7 +141,7 @@ def __delitem__(self, key: bytes) -> None:

@property
def has_changed_root(self) -> bool:
return self._write_trie and self._write_trie.root_hash != self._starting_root_hash
return self._write_trie

This comment has been minimized.

Copy link
@pipermerriam

pipermerriam Feb 6, 2020

Member

Assuming I'm correct that this is checking for the None value, I'd advocate for this to be return self._write_trie is not None.

@@ -129,9 +150,9 @@ def get_changed_root(self) -> Hash32:
raise ValidationError("Asked for changed root when no writes have been made")

def _clear_changed_root(self) -> None:
self._starting_root_hash = self._write_trie.root_hash

This comment has been minimized.

Copy link
@pipermerriam

pipermerriam Feb 6, 2020

Member

If you get back in here could you maybe drop a comment or two in this method explaining in some detail why we do each of these things. Since it's not clear how to address some of the debt in the code, maybe we can address some of with with better comments.

self._trie_nodes_batch: BatchDB = None

# When deleting an account, push the pending write info onto this stack.
# May happen multiple times for multiple deletes.
self._historical_write_tries: List[Tuple[HexaryTrie, BatchDB, Hash32]] = []

This comment has been minimized.

Copy link
@pipermerriam

pipermerriam Feb 6, 2020

Member

Maybe make a NamedTuple for the inner value type here. Probably makes some of the other code a bit easier to read/grok.

# Track how many times we have cleared the storage. This is journaled
# so that we can revert to an old clear count. The clear count is used
# as an index to find the base trie from before the revert.
self._clear_count = JournalDB(MemoryDB({CLEAR_COUNT_KEY_NAME: to_bytes(0)}))

This comment has been minimized.

Copy link
@pipermerriam

pipermerriam Feb 6, 2020

Member

Using the JournalDB here feels like it may be the wrong choice. Can this just be modeled using a list (stack) or somethng?

This comment has been minimized.

Copy link
@carver

carver Feb 10, 2020

Author Contributor

I updated the comment a bit to try to explain: it has to move in lockstep with the main storage journal, so we can detect when a revert crosses back over an account delete.

carver added 5 commits Jan 29, 2020
The bug was: when a previous transaction creates/modifies an account,
and then is "locked", those pending changes are written even if the
account is deleted is a subsequent transaction.

The fix was: clear out all locked storage changes when you lock in an
account deletion.

This requires carefully managing reverts that cross over to the
pre-delete state. So we track pending writes in a stack that can be
reverted to at the appropriate time.

---

It's important not to accidentally create empty accounts, so don't set
empty storage root in a missing account.

---

Start reporting a storage root as updated, even if the storage root is
identical.

Why? In pre-Byzantium, the storage root is reported at each transaction.
If one transaction changes the storage and then another changes it back,
then we need to treat the storage in the second transaction as changed,
even though it's identical to the start root, which is always kept as
the root at the beginning of the block.
@carver carver force-pushed the carver:fix-1912-consensus-bug branch from 30ee138 to 14a0785 Feb 10, 2020
@carver carver merged commit e3ccf92 into ethereum:master Feb 10, 2020
21 checks passed
21 checks passed
ci/circleci: py36-benchmark Your tests passed on CircleCI!
Details
ci/circleci: py36-core Your tests passed on CircleCI!
Details
ci/circleci: py36-database Your tests passed on CircleCI!
Details
ci/circleci: py36-docs Your tests passed on CircleCI!
Details
ci/circleci: py36-lint Your tests passed on CircleCI!
Details
ci/circleci: py36-native-blockchain-byzantium Your tests passed on CircleCI!
Details
ci/circleci: py36-native-blockchain-constantinople Your tests passed on CircleCI!
Details
ci/circleci: py36-native-blockchain-frontier Your tests passed on CircleCI!
Details
ci/circleci: py36-native-blockchain-homestead Your tests passed on CircleCI!
Details
ci/circleci: py36-native-blockchain-istanbul Your tests passed on CircleCI!
Details
ci/circleci: py36-native-blockchain-petersburg Your tests passed on CircleCI!
Details
ci/circleci: py36-native-blockchain-spurious_dragon Your tests passed on CircleCI!
Details
ci/circleci: py36-native-blockchain-tangerine_whistle Your tests passed on CircleCI!
Details
ci/circleci: py36-native-blockchain-transition Your tests passed on CircleCI!
Details
ci/circleci: py36-transactions Your tests passed on CircleCI!
Details
ci/circleci: py36-vm Your tests passed on CircleCI!
Details
ci/circleci: py37-core Your tests passed on CircleCI!
Details
ci/circleci: py37-database Your tests passed on CircleCI!
Details
ci/circleci: py37-lint Your tests passed on CircleCI!
Details
ci/circleci: py37-transactions Your tests passed on CircleCI!
Details
ci/circleci: py37-vm Your tests passed on CircleCI!
Details
@carver carver deleted the carver:fix-1912-consensus-bug branch Feb 10, 2020
@carver carver mentioned this pull request Feb 10, 2020
2 of 2 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

4 participants
You can’t perform that action at this time.