Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Journal before trie #676

Merged
merged 4 commits into from
May 14, 2018
Merged

Journal before trie #676

merged 4 commits into from
May 14, 2018

Conversation

carver
Copy link
Contributor

@carver carver commented May 10, 2018

What was wrong?

Right now, when an account changes multiple times within a transaction, then we:

  • waste time calculating keccak on intermediate state
  • waste time doing trie manipulation on intermediate state
  • store unnecessary data for intermediate account values

How was it fixed?

  • Journal the raw account values, before entering them into the trie.
  • Replaced values get dropped in the journal, in between calling persist, so they never hit the trie

Right now, this will save state at every persist call. We can do even better, saving it only at the end of every block, but this PR was getting full.

Side note: The account cache started returning stale results, because the state_root is no longer updated at every change. I tried to fix it in place, with a sprinkling of cache invalidations, but it was taking too long to hunt down all the bugs, so I wrote a simpler CacheDB instead. It works, but I'm not sure of the performance impact yet.

Plus a little unrelated VM refactor for creating state more succinctly.

Edit: probably need some more persists in there somewhere... Will come back to it after some reviews

Cute Animal Picture

put a cute animal picture link inside the parentheses

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The journal class just got confusing for me, but I think that confusion will be alleviated with some comments. General approach seems fine.

self._journaldb = JournalDB(db)
self._trie = HashTrie(HexaryTrie(db, state_root))
self._trie_cache = CacheDB(self._trie)
self._journaltrie = JournalDB(self._trie_cache)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use comments/docstring/something here that explains what each of these is for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I'm at for the docstring so far:

        r"""
        Internal implementation details (subject to rapid change): 
        Database entries go through several pipes, like so... 

        .. code:: 

                                                     -> hash-trie -> storage lookups
                                                    /
            db -----------------------> _journaldb ----------------> code lookups
             \
              > _trie -> _trie_cache -> _journaltrie --------------> account lookups

        Journaling sequesters writes here ^, until persist is called.

        _journaldb is a journaling of the keys and values used to store
        code and account storage.

        _trie is a hash-trie, used to generate the state root

        _trie_cache is a cache tied to the state root of the trie. It
        is important that this cache is checked *after* looking for
        the key in _journaltrie, because the cache is only invalidated
        after a state root change.

        _journaltrie is a journaling of the accounts (an address->rlp mapping,
        rather than the nodes stored by the trie). This enables
        a squashing of all account changes before pushing them into the trie.

        .. NOTE:: There is an opportunity to do something similar for storage

        AccountDB synchronizes the snapshot/revert/persist of both of the
        journals. 

        """

@carver
Copy link
Contributor Author

carver commented May 11, 2018

I wanted to be really sure that it wasn't the little refactor that broke the tests... and it is: #680

Edit: removed the commit and tried again.

account.nonce,
encode_hex(account.storage_root),
encode_hex(account.code_hash),
)
Copy link
Contributor Author

@carver carver May 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logging kind of slipped in by accident, but maybe I should just leave it in. It comes in handy during debugging. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it was pure logging, then I'd be good with it, but the loop and whatnot doesn't sit well with me. Maybe if you moved this outside the function body into a stand-alone and left it inline but commented out so that it's easy to enable as a one-liner but it doesn't clutter up the actual function body.

@@ -22,22 +23,23 @@ def make_random_state(n):
code = b'not-real-code'
account_db.set_code(addr, code)
contents[addr] = (balance, nonce, storage, code)
return account_db, contents
account_db.persist()
return raw_db, account_db.state_root, contents


def test_state_sync():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good candidate for using hypothesis.strategies.random_module

https://hypothesis.readthedocs.io/en/latest/data.html#hypothesis.strategies.random_module

You'll also need to convert the make_random_state function to use the random module for randomness generation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using hypothesis won't buy us anything in those tests: #279 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I think you are correct, though I also don't think it will cost us anything, and in the event that there is some strange randomness based test failure, it will be really nice to be able to reproduce it, which now, we wouldn't be able to do.

But leave a convenience method in place for future debugging
@carver carver merged commit ebf9b8b into ethereum:master May 14, 2018
@carver carver deleted the journal-before-trie branch May 14, 2018 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants