Node should be using message-passing concurrency #121

MaksymZavershynskyi · 2018-12-01T00:46:03Z

In order to be more aligned with the interface of libp2p and other Rust libraries that use futures we need to replace some of our state-sharing concurrency with message-passing concurrency. Which would also allow us to get rid of some thread-safety primitives and hopefully be faster.

The following things should be run as tasks:

network_listener. Listens to the stream of network messages from libp2p and spawns network_messages_handler's;
network_messages_handler (currently called Protocol). We spawn a new instance of handler for every new network message received just shown in the last example here: https://tokio.rs/docs/getting-started/echo/ so that we can benefit from the concurrency.
TxFlowTask is as designed should run as a task.
A pool of Runtimes that execute the contracts.
RPC server.

CLI would be on the main thread, and spawn tasks 1, 3, 4, 5, and provide them with the channels that they would use to communicate.

bowenwang1996 · 2018-12-01T00:54:59Z

Unless I missed something, network message handler is currently called protocol.

MaksymZavershynskyi · 2018-12-01T01:12:26Z

Typo. Fixed.

MaksymZavershynskyi · 2018-12-01T01:54:12Z

Some tasks for me and @azban :

network/src/service.rs should be spawning a new protocol on each new network message, cloning protocol in the process.
cli/.../lib.rs should be spawning tasks instead of creating a bunch of Arcs.
TxFlowTask should be connected to the protocol and the network (for sending gossips).
InMemorySigner should be copied instead of shared, because it is lightweight.
RpcImpl should send transactions on channel for TxFlow rather than taking Client (in review)
RpcImpl should spawn runtime task for view calls (deferring until later)
remove transaction pool from client?
remove import queue from client?

ilblackdragon · 2018-12-01T02:36:02Z

Signer is a trait - it can be implemented with expensive implementation.
Specifically, there should be implementation that maintains connection to other process/another machine to do signing there.
If that's ok to copy, then no problem.

@nearmax I'm unclear what you mean by pool of Runtimes

Also where is the chains/client?

MaksymZavershynskyi · 2018-12-01T03:37:36Z

s/Signer/InMemorySigner

We are going to be executing multiple contracts concurrently. Currently the contracts are executed by Runtime. To execute them concurrently we would need a pool of workers that runs Runtimes. The reason why we cannot spawn them as separate tasks is because we want to control the number of threads they are running on and prevent them from affecting other async tasks, like the network listener.

Chain is a heavy resource with frequent access requirements, we will wrap it into lock+arc for now.
We also need a separate task that listens for chain updated from the network and updates the chain with the recent blocks (in the cases when these blocks were produced without participation of the current node).

Regarding the Client. Consider how we produce a block. A block is produced by first computing the consensus and then executing transactions in the consensus based on the previous state which produces the new state, so the creation of the block is initiated by the consensus and not the client. The flow is as follows:

TxFlow+BeaconChainConsensus would produce a consensus (a signed set of N transactions) and put it in a channel;
Runtime Pool (or for the sake of separation of concerns we can have a separate class called ConsensusListener) would be reading from this channel and upon receiving the consensus it would retrieve the current ChainContext from the Chain (ChainContext would provide all the necessary information for the Runtime) and produce N tasks for the workers that run Runtime's. Runtime workers then would finish these tasks and produce a new state which it would send back to the network and to the Chain.

Therefore, the following functions that we are currently using to produce fake blocks are going to go away: prod_block in chain.rs and produce_blocks.rs. What would be left in the Client struct is the functionality to import blocks upon the start of the node we can then rename it to BlockImporter and execute one time synchronously upon the start of the node.

bowenwang1996 · 2018-12-01T03:55:39Z

We probably don't need a pool of runtimes for now, as doing so seems error-prone and requires more effort than we can afford. The main reason is that the order of transactions matter in a lot of cases and trying to separate them into independent tasks requires careful manipulation and also incurs some overhead on its own.

I am also not entirely convinced that we need to spawn a network message handler whenever a new message comes in. There will be some shared state anyways (things like peer_info) and processing each message should take very little time on the protocol level. I guess we should do some experiments and see how much concurrency helps there.

MaksymZavershynskyi · 2018-12-01T04:38:07Z

We cannot be computing the state one transaction at a time, it is too slow. I also disagree that the separation is hard to implement. It is a simple greedy system. TxFlow already defines for us the global order of the transactions which only matters for those transactions that touch same addresses. So at each step we look in the bucket of unprocessed transactions and take any transaction that does not have unprocessed dependencies. Otherwise we wait.

Regarding having one handler per message. See the first two examples in tokio documentation: https://tokio.rs/docs/getting-started/echo/ . What we have currently is the first example. What I suggest is the second example. Even if protocol turns out to be lightweight we have need to spawn a separate task per message anyway to send the data down the channel, like here: https://github.com/nearprotocol/nearcore/blob/7f6ec43f8a9147d4bae80ff9859ec55fa7ccc516/core/txflow/src/txflow_task/mod.rs#L104
There is really no problem with copying the handler: the lightweight fields like config are fast to copy because it uses memcpy, while the heavy fields (not sure if peer_info is a heavy field) can have shared state.

bowenwang1996 · 2018-12-01T05:01:35Z

I think we can probably parallelize executions of cross-shard calls and same-shard calls without too much trouble. For calls in the same shard, it is much harder because one call could lead to some other calls in the same shard again and the current design is to execute them in the same block, which makes parallelizing them hard due to possibly complicated dependencies. Some static analysis could help, but I am not too sure about that. @ilblackdragon

MaksymZavershynskyi · 2018-12-01T06:46:48Z

For calls in the same shard, it is much harder because one call could lead to some other calls in the same shard again and the current design is to execute them in the same block, which makes parallelizing them hard due to possibly complicated dependencies.

Yes, that's a special case. I think most of our transactions are not going to be like that. We can make a static analysis of the contract and establish what addresses it can potentially touch and then execute all contracts that are not affected by it independently in parallel.

Also, for MVB we might consider dropping the guarantee to execute them in the same block for now.

Also, remember that we are targeting 100k tps. With 1-4k shards it means each verifier needs to execute 25-100 contracts a second. The beacon chain witnesses will need to be executing even more! That means if we execute them sequentially it should be faster than 10-40ms, which is ridiculously fast.

vgrichina · 2018-12-01T06:55:59Z

@nearmax let's implement single-threaded design first and make it work. I don't think we have time to handle additional complexity (like static analyzer) now.

After it's working properly (but slowly) we'll measure whether anything needs to be sped up. Note that we don't want to over-utilize CPU anyway (running on mobiles) so shouldn't aim for 100% CPU load.

MaksymZavershynskyi · 2018-12-02T20:21:24Z

Ok, let's have one worker and parallelize it later after we measure the load.

* Starting reworking node code, starting with cli. #121 * Missing files

* First step in refactoring the network * Added the loop that runs the Libp2p network * Extract RPC from node/service into node/rpc_server * Undo rpc_server

MaksymZavershynskyi added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Dec 1, 2018

MaksymZavershynskyi added this to the MVB milestone Dec 1, 2018

MaksymZavershynskyi self-assigned this Dec 1, 2018

MaksymZavershynskyi assigned azban Dec 1, 2018

MaksymZavershynskyi mentioned this issue Dec 2, 2018

Reimplemented functionality of the Rust futures should go #128

Closed

3 tasks

MaksymZavershynskyi pushed a commit that referenced this issue Dec 3, 2018

Starting reworking node code, starting with cli. #121

c8913b5

MaksymZavershynskyi added a commit that referenced this issue Dec 3, 2018

Starting reworking node code, starting with cli. #121 (#132)

9c38631

* Starting reworking node code, starting with cli. #121 * Missing files

MaksymZavershynskyi added a commit that referenced this issue Dec 4, 2018

Initial code for spawning the network #121 (#133)

c90c552

* First step in refactoring the network * Added the loop that runs the Libp2p network * Extract RPC from node/service into node/rpc_server * Undo rpc_server

MaksymZavershynskyi mentioned this issue Dec 10, 2018

Parallelize runtime #161

Closed

MaksymZavershynskyi closed this as completed Dec 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node should be using message-passing concurrency #121

Node should be using message-passing concurrency #121

MaksymZavershynskyi commented Dec 1, 2018 •

edited

Loading

bowenwang1996 commented Dec 1, 2018

MaksymZavershynskyi commented Dec 1, 2018

MaksymZavershynskyi commented Dec 1, 2018 •

edited by azban

Loading

ilblackdragon commented Dec 1, 2018

MaksymZavershynskyi commented Dec 1, 2018

bowenwang1996 commented Dec 1, 2018

MaksymZavershynskyi commented Dec 1, 2018

bowenwang1996 commented Dec 1, 2018

MaksymZavershynskyi commented Dec 1, 2018

vgrichina commented Dec 1, 2018

MaksymZavershynskyi commented Dec 2, 2018

Node should be using message-passing concurrency #121

Node should be using message-passing concurrency #121

Comments

MaksymZavershynskyi commented Dec 1, 2018 • edited Loading

bowenwang1996 commented Dec 1, 2018

MaksymZavershynskyi commented Dec 1, 2018

MaksymZavershynskyi commented Dec 1, 2018 • edited by azban Loading

ilblackdragon commented Dec 1, 2018

MaksymZavershynskyi commented Dec 1, 2018

bowenwang1996 commented Dec 1, 2018

MaksymZavershynskyi commented Dec 1, 2018

bowenwang1996 commented Dec 1, 2018

MaksymZavershynskyi commented Dec 1, 2018

vgrichina commented Dec 1, 2018

MaksymZavershynskyi commented Dec 2, 2018

MaksymZavershynskyi commented Dec 1, 2018 •

edited

Loading

MaksymZavershynskyi commented Dec 1, 2018 •

edited by azban

Loading