New native database adapter #859

pipermerriam · 2019-07-30T20:03:42Z

What was wrong?

Currently, we've been using the multiprocessing facilities to access the database and chain from multiple processes.

How was it fixed?

This implements a custom server that serves the AtomicDatabaseAPI over an IPC socket and a client that implements the full AtomicDatabaseAPI.

This requires converting all of the various Chain, ChaindDB and HeaderDB operations to now run in the local process using a thread-based executor.

To-Do

Clean up commit history

Add entry to the release notes

Cute Animal Picture

pipermerriam · 2019-07-30T21:00:08Z

Replaces #756

pipermerriam · 2019-07-30T21:12:00Z

Problems that arose while working on this.

Pretty much everywhere in Trinity where we interact with the chain we use the async wrapper/proxy objects for things like Chain, ChainDB, HeaderDB, and BaseDB. These objects pretend like they are the actual object but when you call a method on them what actually occurs is that the execution happens within the database process.

This is how Trinity keeps the networking code from being blocked by expensive CPU-bound operations related to the EVM or Chain.

If we want to replace the way we interact with the database, we will also be required to replace this mechanism. I.E. we will still need some semblance of a proxy object that represents the actual Chain/HeaderDB/etc which actually lives in a separate process and which delegates calls to the object methods in a manner that doesn't block async execution.

With the direction that we are currently moving, this probably means architecture that looks something like:

A service that runs in it's own process with a lahja endpoint which listens to and responds to requests.
A proxy object that delegates calls over the lahja event bus to execute specific methods or properties and uses asyncio.run_in_executor or trio.run_sync_in_thread on the local side to avoid blocking the async execution.

Doing this today will involve a lot of boilerplate since we'd need to write events and handlers for all of the methods. @cburgdorf recently did some similar things for the ProxyPeerPool.

What I think we need for this to be less heavy on the boilerplate code and thus on the complexity of working with it is some sort of machinery to automate the concept of serving an object's methods over an event bus. Something like a metaclass that generates all of the events under the hood as well as the handlers for serving the request/response for those events.

pipermerriam · 2019-08-02T15:23:20Z

@carver I ended up running with this to try an easier way to get it working. I did a minimal change such that all of the Chain/ChainDB/HeaderDB coroutine methods now run in a ThreadPoolExecutor which should roughly preserve the previous property of not having Chain operations block the async processing, though this is slightly different from the previous model in that we will see some performance changes since the database process will be doing less CPU work (maybe good?) and the other processes will be doing more CPU work via threads which is likely to have some measurable effect on their performance.

I'm wondering if you can profile beam sync against this branch to see if we actually gain or lose any measurable performance using this mechanism

carver · 2019-08-02T18:55:08Z

I'm wondering if you can profile beam sync against this branch to see if we actually gain or lose any measurable performance using this mechanism

Hm, maybe a small lift, but nothing obvious. At least, it doesn't seem to hurt! :)

At this point, it can be hard to tell because Beam Sync is usually staying caught up. I have a few profiling improvements I've been thinking about, which might help me give you a definitive answer later today.

pipermerriam · 2019-08-02T19:22:52Z

Cool, I'll work on cleaning this up with intent to merge after some review next week.

pipermerriam · 2019-08-02T19:49:27Z

Test failures are unrelated to this PR

pipermerriam · 2019-08-02T20:20:33Z

local profiling shows these numbers:

old-db-manager: ~14,000 get+set ops/sec
new-db-manager: ~20,000 get+set ops/sec

Better profiling would be to do something like running a Chain.import_block to get a more real usage profile but I don't think we have facilities in place to do that easily. Either way, I feel good about the direction this takes us.

pipermerriam · 2019-08-02T20:21:12Z

tests-trio/p2p-trio/test_channel_services.py

@@ -1,6 +1,7 @@
 import trio

 import pytest
+import pytest_trio


extract this re-org to its own PR

pipermerriam · 2019-08-02T20:22:57Z

trinity/chains/base.py


    @abstractmethod
    async def coro_validate_chain(
            self,
            parent: BlockHeader,
            chain: Tuple[BlockHeader, ...],
            seal_check_random_sample_rate: int = 1) -> None:
-        pass
+        ...


TODO: extract to stand-alone cleanup PR

pipermerriam · 2019-08-02T20:23:45Z

trinity/db/eth1/header.py



-class BaseAsyncHeaderDB(ABC):
+def async_thread_method(method: Callable[..., Any]) -> Callable[..., Any]:


This probably belongs somewhere under trinity._utils

pipermerriam · 2019-08-02T20:24:23Z

trinity/main.py

+            try:
+                manager.wait_stopped()
+            except KeyboardInterrupt:
+                pass


TODO: I don't think this is actually running it's cleanup step because the IPC path is consistently left behind.

pipermerriam · 2019-08-02T20:25:23Z

trinity/protocol/common/peer.py

@@ -149,7 +149,7 @@ def __init__(self,
        super().__init__(token)

    def __str__(self) -> str:
-        return f"{self.__class__.__name__} {self.remote.uri}"
+        return f"{self.__class__.__name__} {self.remote}"


bugfix can be extracted to standalone

pipermerriam · 2019-08-02T20:25:39Z

trinity/server.py

@@ -295,87 +293,3 @@ def _make_receive_server(self) -> BCCReceiveServer:
            peer_pool=self.peer_pool,
            token=self.cancel_token,
        )
-
-
-def _test() -> None:


removal of these tests can be extracted to standalone

pipermerriam · 2019-08-02T20:26:54Z

trinity/sync/full/hexary_trie.py

@@ -233,7 +234,8 @@ def _schedule(self, node_key: Hash32, parent: SyncRequest, depth: int,
        called.
        """
        self.committed_nodes += 1
-        await self.db.coro_set(request.node_key, request.data)
+        loop = asyncio.get_event_loop()
+        await loop.run_in_executor(None, self.db.set, request.node_key, request.data)


AFAIK this is the only place that used the base async db methods so I just updated it to run the sync versions asynchronously using a thread.

pipermerriam · 2019-08-06T02:27:40Z

scripts/benchmark/db_managers/bench_db_client.py

+
+        clients = [
+            multiprocessing.Process(target=run_client, args=[ipc_path, i])
+            for i in range(1)


This benchmark needs to be updated to have a CLI that allows specifying these values like how many clients, how many keys/values and it needs to get added to the CI run.

pipermerriam · 2019-08-06T03:52:11Z

trinity/db/manager.py

+        elif result_byte == FAIL_BYTE:
+            return False
+        else:
+            raise Exception("Unknown result byte: {result_byte.hex}")


missing f for f-string.

pipermerriam · 2019-08-06T03:52:14Z

trinity/db/manager.py

+        elif result_byte == FAIL_BYTE:
+            raise KeyError(key)
+        else:
+            raise Exception("Unknown result byte: {result_byte.hex}")


missing f for f-string.

pipermerriam · 2019-08-06T03:52:22Z

trinity/db/manager.py

+            elif result_byte == FAIL_BYTE:
+                raise KeyError(key)
+            else:
+                raise Exception("Unknown result byte: {result_byte.hex}")


missing f for f-string.

pipermerriam · 2019-08-06T03:53:36Z

trinity/db/manager.py

+            *delete_sizes,
+        )
+        kv_data = b''.join(itertools.chain(*pending_kv_pairs))
+        delete_data = b''.join(pending_deletes)


can combine this with the statement above inside the chain call

lithp

Submitting a batch of comments so you can start reading this review while I continue going through this PR.

lithp · 2019-08-15T00:07:34Z

tests/core/database/test_db_client.py

+        manager.logger.info('started db manager')
+        yield manager
+        manager.logger.info('exiting db manager')
+    manager.logger.info('exited db manager')


These three log lines (and the clos.* client lines just below) might not be necessary anymore now that this has reached the point of a PR waiting to be merged.

lithp · 2019-08-15T00:21:01Z

tests/core/p2p-proto/test_sync.py

@@ -382,7 +382,7 @@ def finalizer():
    async with run_peer_pool_event_server(
        event_bus, server_peer_pool, handler_type=LESPeerPoolEventServer
    ), run_request_server(
-        event_bus, FakeAsyncChainDB(chaindb_20.db), server_type=LightRequestServer
+        event_bus, AsyncChainDB(chaindb_20.db), server_type=LightRequestServer


This is really cool, being able to drop all these FakeAsyncChainDBs.

lithp · 2019-08-15T00:25:44Z

trinity/plugins/builtin/attach/console.py

@@ -160,8 +159,7 @@ def get_beacon_shell_context(database_dir: Path, trinity_config: TrinityConfig)

    trinity_already_running = ipc_path.exists()
    if trinity_already_running:
-        db_manager = beacon.manager.create_db_consumer_manager(ipc_path)  # type: ignore
-        db = db_manager.get_db()
+        db = DBClient.connect(ipc_path)


Man, this interface is so much better!

lithp · 2019-08-15T00:30:05Z

newsfragments/859.feature.rst

@@ -0,0 +1 @@
+Replace ``multiprocessing`` based database access with a custom implementation.


Since newsfragments is user-facing maybe mention the performance boost too!

lithp · 2019-08-15T23:14:10Z

trinity/db/manager.py

+ATOMIC_BATCH Response:
+
+- Success Byte: 0x01
+"""


<3 these docstrings

lithp · 2019-08-16T00:12:34Z

trinity/db/manager.py

+                    self._stopped.set()
+                    break
+                self.logger.debug('Server accepted connection: %r', addr)
+                threading.Thread(


What made you decide to do this synchronously w/ a bunch of threads? I'm surprised! Writing a server which response to requests from a lot of clients seems like exactly the asyncio-usecase. Maybe it's a performance thing, calls into leveldb block the calling thread?

We did both and the synchronous outperformed the asyncio implementation by roughly 2-3x. We didn't do any real measuring of why but I suspect event loop overhead is just way more significant than the gains we get from being able to do work while the i/o is occuring.

lithp · 2019-08-16T00:19:59Z

trinity/db/manager.py

+                try:
+                    operation = Operation(operation_byte)
+                except TypeError:
+                    self.logger.error("Unrecognized database operation: %s", operation_byte.hex())


It might be useful to also print raw_socket here like above, getting out of sync like this is a pretty serious problem! It could be useful to know who's having trouble talking to the database.

I don't actually think we get any useful information from that which would let us know who is misbehaving...

You're right, not without probably too much extra work.

lithp · 2019-08-16T00:38:15Z

trinity/db/manager.py

+    """
+    logger = logging.getLogger('trinity.db.manager.DBManager')
+
+    def __init__(self, db: BaseAtomicDB):


It might be worth adding a comment here, we're not exactly expecting a BaseAtomicDB, we're expecting one which is also threadsafe.

lithp · 2019-08-16T00:39:47Z

trinity/db/manager.py

+
+
+class empty:
+    pass


Accidental commit?

lithp · 2019-08-16T01:15:28Z

trinity/db/manager.py

+        return cls(s)
+
+
+class AtomicBatch(BaseDB):


It's frustrating that this so nearly duplicates py-evm's AtomicDBWriteBatch, there might be a refactoring of it which allows it to be used here.

I'll open an issue.

lithp · 2019-08-16T19:50:10Z

trinity/db/manager.py

+    pass
+
+
+class DBClient(BaseAtomicDB):


Dropping multiprocessing is already a nice improvement and this gives me some ideas for future PRS:

I expect that we'll see an even larger performance improvement if these methods were to become asynchronous! Because these methods are synchronous every call to the database blocks the event loop, if the database ever gets backed up then the other processes will back up along with it when they could be talking over the network or otherwise let their other coros truck along.

And this is a far-future idea, but since so much of our database is immutable data I wonder whether some kind of caching on this side of the process could also improve performance.

I think that any caching would likely need to happen at a higher level but yes, there are future improvements that could be made and now that we fully own this API we have a lot of freedom to explore.

lithp · 2019-08-16T19:54:14Z

trinity/db/manager.py

+
+    def __getitem__(self, key: bytes) -> bytes:
+        if self._track_diff is None:
+            raise ValidationError("Cannot get data from a write batch, out of context")


These error messages assume that AtomicBatch will only ever be used by DBClient, it might be better for AtomicBatch to have a name which better indicates that it's private to this file, or for AtomicBatch to do it's own context management, or for these messages to say something like "cannot use AtomicBatch after it has been finalized".

lithp · 2019-08-16T20:01:57Z

trinity/sync/full/hexary_trie.py

@@ -163,7 +163,8 @@ def next_batch(self, n: int = 1) -> List[SyncRequest]:
        if node_key in self.nodes_cache:
            self.logger.debug2("Node %s already exists in db", encode_hex(node_key))
            return
-        if await self.db.coro_exists(node_key):
+        loop = asyncio.get_event_loop()
+        if await loop.run_in_executor(None, self.db.exists, node_key):


We're already blocking the loop everywhere we call a method on this database so it probably isn't a big problem to keep the code clean and call self.db.exists directly here, and same with self.db.set below.

Correct, however I don't actually forsee this being an issue because all of our database operations occur behind HeaderDB or ChainDB and for the most part we use async_thread_method to mitigate the blocking nature of these calls.

IIRC, making these calls asynchronous was an intentional choice back when this code was written and it produced a performance gain. Also, I think that this code is going to go away with beam/firehose so I'm not inclined to do much more than maintain status quo.

lithp · 2019-08-16T20:20:51Z

trinity/db/manager.py

+                    break
+
+                try:
+                    if operation is GET:


I'm not fully on the if-less programming bandwagon but wanted to drop a comment here, I bet this could be turned into an: operation.perform(self.db, sock). And a world where the different operations are different subclasses of Operation is also a world where the method to send an ATOMIC_BATCH and the method to receive an ATOMIC_BATCH sit right next to each other, which might make future changes easier.

@ChihChengLiang originally wrote them that way but I felt they were harder to understand, though that could have just been the code organization or something. Going to leave that idea for a future refactor. Also noting that I tried to minimize any overhead since this is somewhat performance sensitive code.

lithp

It's exciting to see so much multiprocessing code be deleted, this is a great improvement!

pipermerriam force-pushed the piper/new-database-adapter branch 2 times, most recently from a946e86 to aebe299 Compare August 1, 2019 19:41

pipermerriam marked this pull request as ready for review August 1, 2019 19:42

pipermerriam force-pushed the piper/new-database-adapter branch 6 times, most recently from 348c0f1 to 692483a Compare August 1, 2019 23:18

pipermerriam force-pushed the piper/new-database-adapter branch 5 times, most recently from 21cdee5 to 3f3baa0 Compare August 2, 2019 17:16

pipermerriam commented Aug 2, 2019

View reviewed changes

pipermerriam force-pushed the piper/new-database-adapter branch 2 times, most recently from a5e8ba7 to 8d73204 Compare August 3, 2019 01:44

pipermerriam changed the title ~~Piper/new database adapter~~ New native database adapter Aug 5, 2019

pipermerriam force-pushed the piper/new-database-adapter branch from 8d73204 to b367112 Compare August 6, 2019 02:26

pipermerriam commented Aug 6, 2019

View reviewed changes

pipermerriam force-pushed the piper/new-database-adapter branch 2 times, most recently from d63dc5d to b3613b5 Compare August 6, 2019 16:59

pipermerriam force-pushed the piper/new-database-adapter branch 3 times, most recently from 2c01c2f to 40c2b04 Compare August 6, 2019 22:31

pipermerriam force-pushed the piper/new-database-adapter branch 3 times, most recently from 8efe867 to c3db355 Compare August 14, 2019 15:58

pipermerriam requested review from ChihChengLiang and lithp August 14, 2019 16:16

pipermerriam force-pushed the piper/new-database-adapter branch from c3db355 to 2dd9c2f Compare August 14, 2019 16:18

pipermerriam mentioned this pull request Aug 16, 2019

Create PeerPairFactory and remove many *helper* functions from testing. #937

Merged

2 tasks

lithp reviewed Aug 16, 2019

View reviewed changes

New database adapter

e7c55d8

lithp reviewed Aug 16, 2019

View reviewed changes

lithp approved these changes Aug 16, 2019

View reviewed changes

PR feedback

2fc22bd

pipermerriam force-pushed the piper/new-database-adapter branch from 2dd9c2f to 2fc22bd Compare August 16, 2019 20:30

pipermerriam merged commit e726c43 into ethereum:master Aug 17, 2019

pipermerriam deleted the piper/new-database-adapter branch August 17, 2019 03:37

pipermerriam mentioned this pull request Aug 19, 2019

Remove "Fake" chain and database objects in favor of real ones. #949

Merged

2 tasks

cburgdorf mentioned this pull request Aug 27, 2019

New database manager #756

Closed



		class BaseAsyncHeaderDB(ABC):
		def async_thread_method(method: Callable[..., Any]) -> Callable[..., Any]:

		@@ -0,0 +1 @@
		Replace ``multiprocessing`` based database access with a custom implementation.

New native database adapter #859

New native database adapter #859

Conversation

pipermerriam commented Jul 30, 2019 • edited

What was wrong?

How was it fixed?

To-Do

Cute Animal Picture

pipermerriam commented Jul 30, 2019

pipermerriam commented Jul 30, 2019

pipermerriam commented Aug 2, 2019

carver commented Aug 2, 2019

pipermerriam commented Aug 2, 2019

pipermerriam commented Aug 2, 2019

pipermerriam commented Aug 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lithp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lithp Aug 16, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lithp left a comment

Choose a reason for hiding this comment

pipermerriam commented Jul 30, 2019 •

edited

lithp Aug 16, 2019 •

edited