-
Notifications
You must be signed in to change notification settings - Fork 637
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Journal before trie #676
Journal before trie #676
Conversation
ba58e54
to
9274d94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The journal class just got confusing for me, but I think that confusion will be alleviated with some comments. General approach seems fine.
evm/db/account.py
Outdated
self._journaldb = JournalDB(db) | ||
self._trie = HashTrie(HexaryTrie(db, state_root)) | ||
self._trie_cache = CacheDB(self._trie) | ||
self._journaltrie = JournalDB(self._trie_cache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could use comments/docstring/something here that explains what each of these is for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where I'm at for the docstring so far:
r"""
Internal implementation details (subject to rapid change):
Database entries go through several pipes, like so...
.. code::
-> hash-trie -> storage lookups
/
db -----------------------> _journaldb ----------------> code lookups
\
> _trie -> _trie_cache -> _journaltrie --------------> account lookups
Journaling sequesters writes here ^, until persist is called.
_journaldb is a journaling of the keys and values used to store
code and account storage.
_trie is a hash-trie, used to generate the state root
_trie_cache is a cache tied to the state root of the trie. It
is important that this cache is checked *after* looking for
the key in _journaltrie, because the cache is only invalidated
after a state root change.
_journaltrie is a journaling of the accounts (an address->rlp mapping,
rather than the nodes stored by the trie). This enables
a squashing of all account changes before pushing them into the trie.
.. NOTE:: There is an opportunity to do something similar for storage
AccountDB synchronizes the snapshot/revert/persist of both of the
journals.
"""
I wanted to be really sure that it wasn't the little refactor that broke the tests... and it is: #680 Edit: removed the commit and tried again. |
9274d94
to
32a7d0b
Compare
32a7d0b
to
814a3eb
Compare
47585e2
to
f207757
Compare
account.nonce, | ||
encode_hex(account.storage_root), | ||
encode_hex(account.code_hash), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logging kind of slipped in by accident, but maybe I should just leave it in. It comes in handy during debugging. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it was pure logging, then I'd be good with it, but the loop and whatnot doesn't sit well with me. Maybe if you moved this outside the function body into a stand-alone and left it inline but commented out so that it's easy to enable as a one-liner but it doesn't clutter up the actual function body.
@@ -22,22 +23,23 @@ def make_random_state(n): | |||
code = b'not-real-code' | |||
account_db.set_code(addr, code) | |||
contents[addr] = (balance, nonce, storage, code) | |||
return account_db, contents | |||
account_db.persist() | |||
return raw_db, account_db.state_root, contents | |||
|
|||
|
|||
def test_state_sync(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good candidate for using hypothesis.strategies.random_module
https://hypothesis.readthedocs.io/en/latest/data.html#hypothesis.strategies.random_module
You'll also need to convert the make_random_state
function to use the random
module for randomness generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using hypothesis won't buy us anything in those tests: #279 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, I think you are correct, though I also don't think it will cost us anything, and in the event that there is some strange randomness based test failure, it will be really nice to be able to reproduce it, which now, we wouldn't be able to do.
But leave a convenience method in place for future debugging
What was wrong?
Right now, when an account changes multiple times within a transaction, then we:
How was it fixed?
persist
, so they never hit the trieRight now, this will save state at every
persist
call. We can do even better, saving it only at the end of every block, but this PR was getting full.Side note: The account cache started returning stale results, because the state_root is no longer updated at every change. I tried to fix it in place, with a sprinkling of cache invalidations, but it was taking too long to hunt down all the bugs, so I wrote a simpler
CacheDB
instead. It works, but I'm not sure of the performance impact yet.Plus a little unrelated VM refactor for creating state more succinctly.
Edit: probably need some more persists in there somewhere... Will come back to it after some reviewsCute Animal Picture