Collators get incoming parachain messages #149

rphmeier · 2019-02-20T16:59:27Z

In #117, the polkadot-consensus (now polkadot-validation) traits were extended to support message-passing. The Polkadot network implementation got some utilities for broadcasting and fetching ingress messages to a parachain, but these utilities were only available to the validator's pipeline.

This PR extends and generalizes a bit further so any part of the codebase can fetch ingress messages, and extends polkadot-collator to use that and to produce outgoing messages.

andresilva

the changes lgtm but need to re-review more in-depth.

andresilva · 2019-02-25T11:33:22Z

collator/src/lib.rs

 		).map_err(Error::Collator)?;

 		let block_data_hash = block_data.hash();
 		let signature = key.sign(block_data_hash.as_ref()).into();
+		let egress_queue_roots
+			= ::polkadot_validation::egress_roots(&mut extrinsic.outgoing_messages);


Can you move the = to the previous line?

andresilva · 2019-02-25T11:42:26Z

network/src/router.rs


 				// this is the ingress from source to target, with given messages.
-				let target_incoming = incoming_message_topic(self.parent_hash, target);
+				let target_incoming
+					= validation::incoming_message_topic(self.parent_hash(), target);


= on previous line?

andresilva · 2019-02-25T11:58:00Z

network/src/router.rs

+	/// with `import_statement`.
+	pub(crate) fn checked_statements(&self) -> impl Stream<Item=SignedStatement,Error=()> {
+		// spin up a task in the background that processes all incoming statements
+		// TODO: propagate statements on a timer?


Let's track this?

yeah, that's an old TODO, just has been moved here.

andresilva · 2019-02-25T12:01:29Z

network/src/validation.rs

+					let canon_root = occupied.get().clone();
+					let messages = messages.iter().map(|m| &m.0[..]);
+					if ::polkadot_validation::message_queue_root(messages) != canon_root {
+						continue;


I know this code is just being moved and it's unchanged but can you explain the reasoning behind this? If the roots never match for a given parachain it seems we'll just drop all ingress data eventually.

We know what the roots are supposed to be, but this is a struct that's filtering out messages from gossip that are claiming that this is the message packet from another chain. If the messages in that packet don't have the correct root, then we want to ignore that. If they do, then we import and stop listening for that parachain's outgoing messages. Ideally, we would punish the peers circulating bad messages on that topic, but it is gossip so it's hard to tell who the originator was. In the future we will try to avoid gossiping around bad message packets, although it's a bit racy.

gterzian

Just a few comments...

gterzian · 2019-03-20T12:22:51Z

network/src/router.rs

+	/// with `import_statement`.
+	pub(crate) fn checked_statements(&self) -> impl Stream<Item=SignedStatement,Error=()> {
+		// spin up a task in the background that processes all incoming statements
+		// validation has been done already by the gossip validator.


I think the "spin up a task in the background" here is incorrect, since gossip_messages_for will block waiting on the mpsc::UnboundedReceiver, and then the filter_map and map will happen in the current task. So it would more accurate to write that:

This entire operation will block the current thread.

filter_map and map will happen in the current task.

What will happen "in the background" is fetching the gossip messages.

I remember you mentioned a potential solution to this in a chat, which we'll have to look into in a different PR. I think for now the comment can be amended.

yes, the solution is to pass a oneshot::Sender to gossip_messages_for

gterzian · 2019-03-20T12:23:19Z

network/src/validation.rs

+			Async::Ready(mut inner) => {
+				let poll_result = inner.poll();
+				self.inner = Some(inner);
+				poll_result.map_err(map_err)


I remember you once pointed that it would be more robust to do

// Don't first do the inner.poll() self.inner = Some(inner); self.poll()

gterzian · 2019-03-20T12:37:44Z

network/src/validation.rs

+
+		{
+			let mut incoming_fetched = self.fetch_incoming.lock();
+			for (para_id, _) in incoming_fetched.drain() {


Maybe close each receiver as well? Not sure if other tasks could still be receiving messages on those while this is ongoing(and until the drop_gossip message is handled by the protocol thread).

What I'll do here is have an exit_future::Signal in the SessionDataFetcher and have the fetching futures select on that as well as the global exit handler. Then when the SessionDataFetcher is dropped, all the futures will be as well.

gterzian

A question and proposal on whether we can remove the locks in SessionDataFetcher. Maybe something better done in a follow-up. Otherwise looks good.

gterzian · 2019-03-21T06:58:13Z

network/src/validation.rs

+	knowledge: Arc<Mutex<Knowledge>>,
+	parent_hash: Hash,
+	message_validator: RegisteredMessageValidator,
+}


Ok so now that I understand the structure a bit better, I have a proposal for a larger restructuring, which perhaps could be done in a separate PR, or here:

I'd like to see if we can remove the two Arc<Mutex above, because they potentially introduce blocking in the system(can we know for sure that two tasks will never attempt a note_statement or fetch_incoming at the same time and end-up having to wait on each other, in the meantime blocking other tasks?).

Could we make SessionDataFetcher a Stream, and only share a sender(probably wrapped in a struct with methods like fetch_incoming) with the Router?

Then something like fetch_incoming could be implemented by having an intermediary future with an "inner/outer" receiver like we have elsewhere, and sending a message containing a sender to the SessionDataFetcher stream, which would then do the work in it's own task and be able to mutate fetch_incoming without a lock, finally responding by sending a receiver via the sender it received in the original message.

Something like note_statement could also be done by sending a message to SessionDataFetcher, which would do then do self.knowledge.note_statement without having to lock knowledge.

The inner/outer futures are terrible so I'd prefer to avoid that wherever possible.

Some of the operations are order-dependent. e.g. note_statement should be done before repropagation. What is the item of SessionDataFetcher anyway?

I think the item can be (), the main idea is that the SessionDataFetcher would run in it's own independent task and internally poll channels whose senders have been shared with other tasks, and then "finish" once those senders have been dropped.

For order dependent operations, you could either do some bookkeeping inside SessionDataFetcher(if the order would require cross-task coordination), or if the "order" can be ensured by doing the operations one after the other from one task, you could just send one message after the other from one task and rely on the fact that they will be received in that order.

I don't think the inner/outer pattern is great, and it does seem like a necessary piece of glue because the other task expect to immediately get a receiver to start polling. It's something that could be addressed if we have a better idea. The good thing about inner/outer is that it cannot block, unlike the use of a lock.

If you look at fetch_incoming(&self, parachain: ParaId) -> IncomingReceiver, you can imagine it running like such:

The router calls self.fetcher.fetch_incoming, which is supposed to immediately return a receiver.

fetch_incoming will attempt to acquire the lock around self.fetch_incoming.lock().

At that point the thread, not the task, could block waiting for the lock to be available. If this happens it will block every other task tied to the same thread on the pool(parking_lot is optimized for threads, not tasks, and will happily park the thread if the lock doesn't become available "quickly").

When the lock becomes available, the rest of fetch_incoming executes and returns a IncomingReceiver.

So it looks like we have non-blocking code with the IncomingReceiver immediately being available to be included in a chain of work on the thread pool, but actually we have a potential "stop the world" moment right in the middle of the operation.

Obviously, this can be seen as just an optimization problem, and the lock might actually never be contented, or it can be seen as a code design problem, to be solved by ensuring the code can never block. The ugliness of the inner/outer pattern that might be required to plug this in the system, is actually a result of the system having unrealistic expectations(that a IncomingReceiver can always be returned immediately from a call to fetch_incoming).

I don't think it's a huge problem, and perhaps it's best addressed in a follow-up PR. I do think in general it would be good to move away from using parking_lot in the context of tasks. If locks really is the right pattern for a given solution, it would be better to see if we could use a lock that is futures aware like https://docs.rs/futures/0.1.25/futures/sync/struct.BiLock.html

I've opened #180 to further discuss this...

* Add skeleton for worst case import_unsigned_header * Fix a typo * Add benchmark test for best case unsigned header import * Add finality verification to worst case bench * Move `insert_header()` from mock to test_utils Allows the benchmarking code to use this without having to pull it in from the mock. * Add a rough bench to test a finalizing a "long" chain * Try to use complexity parameter for finality bench * Improve long finality bench * Remove stray dot file * Remove old "worst" case bench * Scribble some ideas down for pruning bench * Prune headers during benchmarking * Clean up some comments * Make finality bench work for entire range of complexity parameter * Place initialization code into a function * Add bench for block finalization with caching * First attempt at bench with receipts * Try and trigger validator set change * Perform a validator set change during benchmarking * Move `validators_change_receipt()` to shared location Allows unit tests and benchmarks to access the same helper function and const * Extract a test receipt root into a constant * Clean up description of pruning bench * Fix cache and pruning tests * Remove unecessary `build_custom_header` usage * Get rid of warnings * Remove code duplication comment I don't think its entirely worth it to split out so few lines of code. The benches aren't particularly hard to read anyways. * Increase the range of the complexity parameter * Use dynamic number of receipts while benchmarking As part of this change we have removed the hardcoded TEST_RECEIPT_ROOT and instead chose to calculate the receipt root on the fly. This will make tests and benches less fragile. * Prune a dynamic number of headers

rphmeier added 5 commits February 19, 2019 21:07

refactor out a consensus data fetcher from table router

85e25de

move statement checking logic into router

1b95f1b

refuse to start authority if collator

cc07222

support building the table router asynchronously

a20d516

instantiate_consensus does not overwrite old

1486091

rphmeier added the A3-in_progress Pull request is in progress. No review needed at this stage. label Feb 20, 2019

update key in new consensus if there was none before

5d3f7e7

rphmeier mentioned this pull request Feb 20, 2019

Rename polkadot-consensus -> polkadot-validation #151

Merged

rphmeier added 4 commits February 21, 2019 13:30

collator collects ingress from network

2f7e0c5

test produced egress roots

031b3cb

fix adder-collator compilation

5d64baa

Merge branch 'master' into rh-collator-messages

0d9414f

rphmeier added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Feb 21, 2019

andresilva reviewed Feb 25, 2019

View reviewed changes

rphmeier added 4 commits February 25, 2019 17:48

address first grumbles

821cd21

Merge branch 'master' into rh-collator-messages

580136a

integrate new gossip with collator network launch

9fed687

Merge branch 'master' into rh-collator-messages

824ffa3

rphmeier requested review from marcio-diaz and gterzian March 19, 2019 17:26

gterzian suggested changes Mar 20, 2019

View reviewed changes

address review

51252e9

rphmeier mentioned this pull request Mar 20, 2019

Router's Drop implementation is wrong #165

Closed

gterzian approved these changes Mar 21, 2019

View reviewed changes

rphmeier merged commit e9402a6 into master Mar 21, 2019

rphmeier deleted the rh-collator-messages branch March 21, 2019 23:48

gterzian mentioned this pull request Mar 22, 2019

Remove use of non-futures aware locks in code executing in the context of tasks #180

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collators get incoming parachain messages #149

Collators get incoming parachain messages #149

rphmeier commented Feb 20, 2019 •

edited

andresilva left a comment

andresilva Feb 25, 2019

andresilva Feb 25, 2019

andresilva Feb 25, 2019

rphmeier Feb 25, 2019

andresilva Feb 25, 2019

rphmeier Feb 25, 2019 •

edited

gterzian left a comment

gterzian Mar 20, 2019 •

edited

rphmeier Mar 20, 2019

gterzian Mar 20, 2019 •

edited

gterzian Mar 20, 2019

rphmeier Mar 20, 2019 •

edited

gterzian left a comment

gterzian Mar 21, 2019

rphmeier Mar 21, 2019

rphmeier Mar 21, 2019

gterzian Mar 22, 2019

gterzian Mar 22, 2019

Collators get incoming parachain messages #149

Collators get incoming parachain messages #149

Conversation

rphmeier commented Feb 20, 2019 • edited

andresilva left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rphmeier Feb 25, 2019 • edited

Choose a reason for hiding this comment

gterzian left a comment

Choose a reason for hiding this comment

gterzian Mar 20, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gterzian Mar 20, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rphmeier Mar 20, 2019 • edited

Choose a reason for hiding this comment

gterzian left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rphmeier commented Feb 20, 2019 •

edited

rphmeier Feb 25, 2019 •

edited

gterzian Mar 20, 2019 •

edited

gterzian Mar 20, 2019 •

edited

rphmeier Mar 20, 2019 •

edited