Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix consensus bug from deleting account in same block that creates it #1913

Merged
merged 5 commits into from Feb 10, 2020

Conversation

carver
Copy link
Contributor

@carver carver commented Jan 29, 2020

What was wrong?

Fix #1912

How was it fixed?

The bug was: when a previous transaction creates/modifies an account,
and then is "locked", those pending changes are written at the end of the block, even if the
account is deleted is a subsequent transaction.

The fix was: clear out all "locked" storage changes (those made during previous transactions) when you lock in an account deletion.

That fix brought up a new interesting case, where the "dirty" flag wasn't noticing when an account started the block as empty, then some storage was added in one transaction, then in a later transaction it was deleted again. The further solution was to track a history of of storage tries that can be reverted to (a new one added on every account delete), and generously treat the storage as dirty if there were ever any writes in the block, even if the trie currently has the same state root as the root was at the beginning of the block.

To-Do

  • Clean up commit history
  • more documentation
  • Add entry to the release notes

Cute Animal Picture

put a cute animal picture link inside the parentheses

eth/db/storage.py Outdated Show resolved Hide resolved
@carver
Copy link
Contributor Author

carver commented Jan 30, 2020

Well, making progress: Block 116526 is now importing!

... but it only got to 116530 before failing again 🤦‍♂️

I'll wait to figure that out before posting this for review.

It's pretty concerning that we're getting all these consensus bugs while still passing the ethereum/tests! ☹️

state.lock_changes()
assert state.get_storage(ADDRESS, 0) == 0

# delete account in next "transaction"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: probably meant to say "revive" instead of "delete".

@veox
Copy link
Contributor

veox commented Jan 30, 2020

Block #116530 is very similar: calls are made to create and destroy contracts... Not sure what's different from the previous one, I'll take a look now.

@veox
Copy link
Contributor

veox commented Jan 30, 2020

Well, making progress: Block 116526 is now importing!

It's #116525 there was trouble with... Are you sure you didn't import that using some other code by accident (e.g. f4cba85)? Because I'm trying Trinity (at current master) with this PR's branch (at 86be04f), and am still failing to import #116525.


Although 86be04f does help py-evm pass the test in 16a9b34.

I've made sure that my Trinity installation indeed has the changes from py-evm's 86be04f in its virtualenv dependency... Not sure what's going on here.

@pipermerriam
Copy link
Member

I think we may be near a point where we need to pay down tech debt in this area. It seems we've ended up with lots of layers here and I at least am having a lot of difficulty reasoning about how we manage all of the nuances related to updating account and contract state.

@carver
Copy link
Contributor Author

carver commented Jan 30, 2020

It's #116525 there was trouble with... Are you sure you didn't import that using some other code by accident (e.g. f4cba85)?

No, you're right. I had imported it using the old code and skipped over it during sync.

@carver
Copy link
Contributor Author

carver commented Jan 30, 2020

I think we may be near a point where we need to pay down tech debt in this area. It seems we've ended up with lots of layers here and I at least am having a lot of difficulty reasoning about how we manage all of the nuances related to updating account and contract state.

I'm also not happy with how complex the code is, and it being a pain to debug.

Some key things that seem to be interacting in a complex way are:

  • queuing up all the state changes without writing to the trie
  • state changes can be reverted back to the beginning of a transaction, but not across transaction boundaries
  • when storage tries are deleted, it changes the storage root hash on the same timeline as the other storage changes. The root hash change must be revertable, in lock-step with all the other storage changes.
  • thinking through pre and post Byzantium: whether you have to flush the changes to the trie at each transaction, or only at the end of the block

With all those interacting factors above, I have no immediate idea how to make it better. One option is to push the complexity into JournalDB, to make it trie-aware (to handle root hash changes) and boundary-aware (to handle changes that cannot be reverted, from previous transactions). I think that's just pushing complexity around, doesn't really seem like an improvement. I would probably need a solid chunk of time to really reset my mind on how to design it a whole new way.

@carver
Copy link
Contributor Author

carver commented Jan 30, 2020

But some good news! The latest push actually does import block 116525, and is at 153492 and continuing...

@carver carver marked this pull request as ready for review January 31, 2020 00:07
@carver
Copy link
Contributor Author

carver commented Jan 31, 2020

Wow, a shocking number of consensus bugs that are not covered by ethereum/tests.

Found the next failed import at block #336508. Pretty interesting, only the receipt root is broken:

 - header.receipt_root:
    (actual)  : b'\xfd\x03\xd3\xd3<\x9e\xf9\xc5\xed\xde\x02aW\xd5J\xe3\xa1\xed\x0c\x85,N\\\x13d\xb2\xa6`\xf4\xa0\x9eo'
    (expected): b'\xbfk\xb0>X\x19r\xe1A\xb4\xcf\xc2\x13X\xdf\xbdDQ\xff\xbe\xfa\xb1P6\xbe\x93\xd1&_\xb2^\x14'

@veox
Copy link
Contributor

veox commented Jan 31, 2020

Found the next failed import at block #336508.

Verified - indeed!

JIC, I've checked with py-evm back at master, too - the issue persists. So, it's an existing bug, not a regression in this PR.


Reverting to known "good" Trinity and py-evm (at v0.2.0a42) allows the sync to continue, importing block #336508 and beyond (ran for 10 minutes, at least up to #346858).

Re-setting to block #336507 and switching py-evm to "bad" v0.2.0a43 results in import failure ("mix hash mismatch"), which among other things shows the same incorrect receipt_root as above (b'\xfd\x03..\xa0\x9eo').

In other words, the regression lies somewhere in the same py-evm update, v0.2.0a42 to v0.2.0a43. I haven't yet verified that the culprit is also PR #1735.

@veox
Copy link
Contributor

veox commented Jan 31, 2020

I'm checking the intermediate state_roots for each transaction receipt; in block #336508, there are 5 txs; at indexes 0, 1, 3, 4 are calls to setAvatar(uint256,bytes) (of a contract I didn't review); at index 2, there is the call to deleteAllAvatars().

state_root at index 2 doesn't match:

  • geth says it's 0xf1af37b9b975c2626d3b07ad995f06d5286af983d3b5f0ac0a0ff47bf0e1d098;
  • trinity says it's 0x39e4b7faa521431a15be1007fb19e9d386c057bde21917520f84312e2e2ac442.

state_root in all other (four) receipts matches.


FTR, trinity produces these (no 0x prefix here):

tx0 receipt state root: baf63127abe8f8445d98c137aeea39b77cbf1008ca41df31c3f63ba1f7add9aa
tx1 receipt state root: 5b2378dc119040215d6825f639fc097e000d01f902bda9cacaec17781194f772
tx2 receipt state root: 39e4b7faa521431a15be1007fb19e9d386c057bde21917520f84312e2e2ac442
tx3 receipt state root: c5b12436b28674f895b4158abd1b2326c590aca5f37e896ad38200c73e309aa6
tx4 receipt state root: c112806aedc65b2256714051c8178556d203daf0ea626fe37c9638407d0d763e

As to what's going on: the most likely reason we're not getting a state_root mismatch for the block: the deleted data in the contract's storage is overwritten (within the same block), and the resulting state is as expected. It's just the intermediate state that's "wrong" - or, to be more specific, its root seems to be miscalculated.

@carver
Copy link
Contributor Author

carver commented Jan 31, 2020

As to what's going on: the most likely reason we're not getting a state_root mismatch for the block: the deleted data in the contract's storage is overwritten (within the same block), and the resulting state is as expected.

Looks like it's even simpler than that: AccountDB isn't recognizing the storage as having changed, so it doesn't try to write out the new storage root to the account. (Note there is no Updating account 0x5dc2e401e29ea16e51ebcecaef129c15fed5a5e8 to storage root X line at the end of the transaction at index 2). Why it's not recognizing the storage as changed is currently a mystery. (ie~ AccountStorageDB.has_changed_root is returning False) I'm digging in now...

@carver
Copy link
Contributor Author

carver commented Jan 31, 2020

Looks like it had to do with reverting the storage root back to the root at the beginning of the block, and not treating that as a change. By always treating the storage as changed, that fixes pre-Byzantium (but seems to break something in Istanbul's CREATE2).

@carver
Copy link
Contributor Author

carver commented Jan 31, 2020

If the account is not created, and the new storage root is for empty storage, then we shouldn't create the account just to put a blank storage root in it.

@carver
Copy link
Contributor Author

carver commented Jan 31, 2020

Up to block 415133 3.82M on my local full sync so far...

The full sync has tested some blocks from:

  • Frontier
  • Homestead (1.15M)
  • DAO fork (1.92M)
  • Shanghai attacks (2.28M-2.72M)
  • Tangerine Whistle (2.463M)
  • Spurious Dragon (2.675M)
  • Byzantium (4.37M)
  • Petersburg (7.28M)
  • Istanbul (9.069M)
  • Muir Glacier (9.2M)

@carver
Copy link
Contributor Author

carver commented Jan 31, 2020

So maybe come back for a refactor later, after we get sync reliable and trinity to mvp. Thoughts @pipermerriam ?

@carver carver changed the title Add test that deleted account stays deleted Fix consensus bug from deleting account in same block that creates it Feb 1, 2020
@carver
Copy link
Contributor Author

carver commented Feb 1, 2020

@cburgdorf Seems worth double-checking that this fix also fixes the goerli test you ran. How did you set that up again and/or are you up for re-running it against this branch?

@cburgdorf
Copy link
Contributor

cburgdorf commented Feb 1, 2020 via email

@veox
Copy link
Contributor

veox commented Feb 1, 2020

Yup! I'll check this on main-net (from scratch), and then try to sync Goerli - that's where I started, after all.

@veox
Copy link
Contributor

veox commented Feb 1, 2020

Confirming that the issue is no longer present when using py-evm from PR #1913.

Main-net blocks #116525 and #336508 imported successfully (I ran it up to #347457).

So has Goerli block #226323 (still running, #472783 ATM).

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -120,7 +141,7 @@ def __delitem__(self, key: bytes) -> None:

@property
def has_changed_root(self) -> bool:
return self._write_trie and self._write_trie.root_hash != self._starting_root_hash
return self._write_trie
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming I'm correct that this is checking for the None value, I'd advocate for this to be return self._write_trie is not None.

@@ -129,9 +150,9 @@ def get_changed_root(self) -> Hash32:
raise ValidationError("Asked for changed root when no writes have been made")

def _clear_changed_root(self) -> None:
self._starting_root_hash = self._write_trie.root_hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you get back in here could you maybe drop a comment or two in this method explaining in some detail why we do each of these things. Since it's not clear how to address some of the debt in the code, maybe we can address some of with with better comments.

self._trie_nodes_batch: BatchDB = None

# When deleting an account, push the pending write info onto this stack.
# May happen multiple times for multiple deletes.
self._historical_write_tries: List[Tuple[HexaryTrie, BatchDB, Hash32]] = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make a NamedTuple for the inner value type here. Probably makes some of the other code a bit easier to read/grok.

# Track how many times we have cleared the storage. This is journaled
# so that we can revert to an old clear count. The clear count is used
# as an index to find the base trie from before the revert.
self._clear_count = JournalDB(MemoryDB({CLEAR_COUNT_KEY_NAME: to_bytes(0)}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the JournalDB here feels like it may be the wrong choice. Can this just be modeled using a list (stack) or somethng?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the comment a bit to try to explain: it has to move in lockstep with the main storage journal, so we can detect when a revert crosses back over an account delete.

The bug was: when a previous transaction creates/modifies an account,
and then is "locked", those pending changes are written even if the
account is deleted is a subsequent transaction.

The fix was: clear out all locked storage changes when you lock in an
account deletion.

This requires carefully managing reverts that cross over to the
pre-delete state. So we track pending writes in a stack that can be
reverted to at the appropriate time.

---

It's important not to accidentally create empty accounts, so don't set
empty storage root in a missing account.

---

Start reporting a storage root as updated, even if the storage root is
identical.

Why? In pre-Byzantium, the storage root is reported at each transaction.
If one transaction changes the storage and then another changes it back,
then we need to treat the storage in the second transaction as changed,
even though it's identical to the start root, which is always kept as
the root at the beginning of the block.
@carver carver merged commit e3ccf92 into ethereum:master Feb 10, 2020
@carver carver deleted the fix-1912-consensus-bug branch February 10, 2020 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consensus failure on contract create/destruct within same block (both FrontierVM and PetersburgVM)
4 participants