Journal before trie #676

carver · 2018-05-10T22:33:57Z

What was wrong?

Right now, when an account changes multiple times within a transaction, then we:

waste time calculating keccak on intermediate state
waste time doing trie manipulation on intermediate state
store unnecessary data for intermediate account values

How was it fixed?

Journal the raw account values, before entering them into the trie.
Replaced values get dropped in the journal, in between calling persist, so they never hit the trie

Right now, this will save state at every persist call. We can do even better, saving it only at the end of every block, but this PR was getting full.

Side note: The account cache started returning stale results, because the state_root is no longer updated at every change. I tried to fix it in place, with a sprinkling of cache invalidations, but it was taking too long to hunt down all the bugs, so I wrote a simpler CacheDB instead. It works, but I'm not sure of the performance impact yet.

Plus a little unrelated VM refactor for creating state more succinctly.

~~Edit: probably need some more persists in there somewhere... Will come back to it after some reviews~~

Cute Animal Picture

pipermerriam

The journal class just got confusing for me, but I think that confusion will be alleviated with some comments. General approach seems fine.

pipermerriam · 2018-05-10T22:55:08Z

evm/db/account.py

+        self._journaldb = JournalDB(db)
+        self._trie = HashTrie(HexaryTrie(db, state_root))
+        self._trie_cache = CacheDB(self._trie)
+        self._journaltrie = JournalDB(self._trie_cache)


I could use comments/docstring/something here that explains what each of these is for.

This is where I'm at for the docstring so far:

r""" Internal implementation details (subject to rapid change): Database entries go through several pipes, like so... .. code:: -> hash-trie -> storage lookups / db -----------------------> _journaldb ----------------> code lookups \ > _trie -> _trie_cache -> _journaltrie --------------> account lookups Journaling sequesters writes here ^, until persist is called. _journaldb is a journaling of the keys and values used to store code and account storage. _trie is a hash-trie, used to generate the state root _trie_cache is a cache tied to the state root of the trie. It is important that this cache is checked *after* looking for the key in _journaltrie, because the cache is only invalidated after a state root change. _journaltrie is a journaling of the accounts (an address->rlp mapping, rather than the nodes stored by the trie). This enables a squashing of all account changes before pushing them into the trie. .. NOTE:: There is an opportunity to do something similar for storage AccountDB synchronizes the snapshot/revert/persist of both of the journals. """

carver · 2018-05-11T19:10:16Z

I wanted to be really sure that it wasn't the little refactor that broke the tests... and it is: #680

Edit: removed the commit and tried again.

carver · 2018-05-12T01:17:52Z

evm/db/account.py

+                        account.nonce,
+                        encode_hex(account.storage_root),
+                        encode_hex(account.code_hash),
+                    )


This logging kind of slipped in by accident, but maybe I should just leave it in. It comes in handy during debugging. Thoughts?

If it was pure logging, then I'd be good with it, but the loop and whatnot doesn't sit well with me. Maybe if you moved this outside the function body into a stand-alone and left it inline but commented out so that it's easy to enable as a one-liner but it doesn't clutter up the actual function body.

pipermerriam · 2018-05-12T02:30:20Z

tests/p2p/test_state_sync.py

@@ -22,22 +23,23 @@ def make_random_state(n):
        code = b'not-real-code'
        account_db.set_code(addr, code)
        contents[addr] = (balance, nonce, storage, code)
-    return account_db, contents
+    account_db.persist()
+    return raw_db, account_db.state_root, contents


 def test_state_sync():


This seems like a good candidate for using hypothesis.strategies.random_module

https://hypothesis.readthedocs.io/en/latest/data.html#hypothesis.strategies.random_module

You'll also need to convert the make_random_state function to use the random module for randomness generation.

I think using hypothesis won't buy us anything in those tests: #279 (comment)

Hrm, I think you are correct, though I also don't think it will cost us anything, and in the event that there is some strange randomness based test failure, it will be really nice to be able to reproduce it, which now, we wouldn't be able to do.

But leave a convenience method in place for future debugging

carver force-pushed the journal-before-trie branch from ba58e54 to 9274d94 Compare May 10, 2018 22:35

carver added the PR state: WIP label May 10, 2018

pipermerriam reviewed May 10, 2018

View reviewed changes

carver force-pushed the journal-before-trie branch from 9274d94 to 32a7d0b Compare May 11, 2018 19:54

Journal DB before the trie

814a3eb

carver force-pushed the journal-before-trie branch from 32a7d0b to 814a3eb Compare May 11, 2018 19:54

carver added 2 commits May 11, 2018 16:46

new accountdb log: show persisted account details

f87485c

bugfix cachedb: KeyError in cache skipped real del

f207757

carver force-pushed the journal-before-trie branch from 47585e2 to f207757 Compare May 12, 2018 00:51

carver removed the PR state: WIP label May 12, 2018

carver commented May 12, 2018

View reviewed changes

pipermerriam approved these changes May 12, 2018

View reviewed changes

Do not log each persisted account

11bde2a

But leave a convenience method in place for future debugging

carver merged commit ebf9b8b into ethereum:master May 14, 2018

carver deleted the journal-before-trie branch May 14, 2018 18:06

carver mentioned this pull request Dec 21, 2018

Make sure state sync test failures are reproducible ethereum/trinity#21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Journal before trie #676

Journal before trie #676

carver commented May 10, 2018 •

edited

Loading

pipermerriam left a comment

pipermerriam May 10, 2018

carver May 11, 2018

carver commented May 11, 2018 •

edited

Loading

carver May 12, 2018 •

edited

Loading

pipermerriam May 12, 2018

pipermerriam May 12, 2018

gsalgado May 14, 2018

pipermerriam May 14, 2018

Journal before trie #676

Journal before trie #676

Conversation

carver commented May 10, 2018 • edited Loading

What was wrong?

How was it fixed?

Cute Animal Picture

pipermerriam left a comment

Choose a reason for hiding this comment

pipermerriam May 10, 2018

Choose a reason for hiding this comment

carver May 11, 2018

Choose a reason for hiding this comment

carver commented May 11, 2018 • edited Loading

carver May 12, 2018 • edited Loading

Choose a reason for hiding this comment

pipermerriam May 12, 2018

Choose a reason for hiding this comment

pipermerriam May 12, 2018

Choose a reason for hiding this comment

gsalgado May 14, 2018

Choose a reason for hiding this comment

pipermerriam May 14, 2018

Choose a reason for hiding this comment

carver commented May 10, 2018 •

edited

Loading

carver commented May 11, 2018 •

edited

Loading

carver May 12, 2018 •

edited

Loading