Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HeaderSync optimization (#1372) #1400

Merged
merged 21 commits into from Aug 30, 2018
Merged

HeaderSync optimization (#1372) #1400

merged 21 commits into from Aug 30, 2018

Conversation

garyyu
Copy link
Contributor

@garyyu garyyu commented Aug 21, 2018

With this optimization for #1372, the HeaderSync procedure can be 200% faster than before on my mac air(early 2015), i.e, 3 minutes compared to 6 minutes before:

Aug 21 19:02:20.615 DEBG sync_state: sync_status: Initial -> HeaderSync { current_height: 0, highest_height: 61131 }
Aug 21 19:05:19.367 DEBG sync_state: sync_status: HeaderSync { current_height: 60698, highest_height: 61134 } -> TxHashsetDownload

@ignopeverell
Copy link
Contributor

This is a great performance improvement! However I'd like to propose a couple changes:

  1. It seems most of the new code is just about buffering reads instead of consuming straight from the TCP connection. Couldn't we get the same gain by using BufStream as described in wrap TcpStream in BufStream (buffered read and write) #1402 ?
  2. If this is indeed the gain, we should do it for every message, not only headers.

@garyyu
Copy link
Contributor Author

garyyu commented Aug 22, 2018

Please look my comment in #1402 , BufStream won't help anything.

Regarding:

It seems most of the new code is just about buffering reads instead of consuming straight from the TCP connection

Why you have this feeling? This new code's not related to any buffering reads.
Actually, this new code gave 2 changes:

  1. Instead of receiving all 511 Headers and then go to process all Headers in one call, I split it into pieces, each piece containing 8 Headers and go to process immediately, then go to download next 8 Headers. I call this new procedure as streaming. This can avoid a long (about 1 second on my pc) processing time on 511 Headers at one call.
  2. 2nd change is in get_locator(), which you gave the comment and I need rework it.

@garyyu
Copy link
Contributor Author

garyyu commented Aug 22, 2018

I just realized it's better to split get_locator() optimization from this PR, for the good rule of one PR only solve one problem.

Another reason is that get_locator() is security related, so better to give dedicated code review on it.

Will remove it from this PR and submit a new PR for it.

@ignopeverell
Copy link
Contributor

Sounds good, thank you for removing the get_locate changes! To summarize a gitter discussion:

  1. Add a simpler method on Message that just returns the &'mut TcpStream so it can be read from protocol.rs
  2. Move your code to a simple utility function in protocol.rs
  3. Potentially simplify given that all headers have equal size
  4. Remove the rustfmt changes from sync.rs as you're not touching it.

And thank you for the great work!

@garyyu
Copy link
Contributor Author

garyyu commented Aug 22, 2018

Thanks for a good review!
Refactoring is finished according to above comments.

@ignopeverell
Copy link
Contributor

ignopeverell commented Aug 23, 2018

Just meant to amend my comments first as I realized I made a mistake. Point 3 is actually incorrect as we allow Cuckoo sizes greater than 30 now. So the proof of work part of the header could be 30*42 bits, 31*42 bits, 32*42 bits, etc. Sorry for that.

@@ -39,6 +41,60 @@ impl Protocol {
pub fn new(adapter: Arc<NetAdapter>, addr: SocketAddr) -> Protocol {
Protocol { adapter, addr }
}

/// Read the Headers Vec size from the underlying connection, and calculate the header_size of one Header
pub fn headers_header_size(conn: &mut TcpStream, msg_len: u64) -> Result<u64, Error> {

This comment was marked as spam.

@garyyu
Copy link
Contributor Author

garyyu commented Aug 23, 2018

Done for comments of moving out of impl Protocol and removing pub.

@ignopeverell
Copy link
Contributor

Have you seen my previous comment? Looks like header size calculation will fail in presence of headers with different Cuckoo sizes.

@garyyu
Copy link
Contributor Author

garyyu commented Aug 23, 2018

Just meant to amend my comments first as I realized I made a mistake. Point 3 is actually incorrect as we allow Cuckoo sizes greater than 30 now. So the proof of work part of the header could be 3042 bits, 3142 bits, 32*42 bits, etc. Sorry for that.

Sorry, I don't know why I thought that doesn't impact :) (it could because you firstly said simplify , and then you said made a mistake, so I took it as -1 * -1 = +1) . But indeed it does!

The Headers Vector with different size Header is totally out of control, both in this new code and in the old code, I can't imagine how possible to deserialize a Vector with each different size of element. I will read more code to confirm. If it's an existing issue, that will be a major bug and better to be handled in an independent new PR.

@garyyu
Copy link
Contributor Author

garyyu commented Aug 23, 2018

The good thing: After reading Headers deserialization code, I confirm there's no problem for the old code, since Proof has its own deserialization code which can handle different Cuckoo size.

Now the bad thing: a very upset finding, we don't have a size field in BlockHeader structure! then, without a deserialization, we don't know the exact size of this BlockHeader. But, to split the Headers, I need to know the size of this BlockHeader before deserialization!

This fact could make this optimization not feasible anymore!

Any suggestion to this? perhaps it's too late to add size field to BlockHeader structure, which will be consensus breaking?

@ignopeverell
Copy link
Contributor

It's too late and in theory not needed, as each header has enough information to get deserialized (the cuckoo size is in them).

One idea would be to read enough bytes for say Cuckoo34 headers. You'll overshoot, but then you can reuse the unused bytes in the next iteration, placing them first in the buffer.

@garyyu
Copy link
Contributor Author

garyyu commented Aug 24, 2018

One idea would be to read enough bytes for say Cuckoo34 headers. You'll overshoot, but then you can reuse the unused bytes in the next iteration, placing them first in the buffer.

sounds a feasible solution. but :) a little bit ugly, and need rewrite a deserialize function for that.

Even I agree it could be too late to add size for BlockHeader, I still strongly propose a hard fork on Testnet3. Because it's not only for this minor optimization, also for another maybe more important use case: Block receiving.

When a Grin node receive a Block, currently we complete this whole block receiving from network, then start block validation, if find this received block already received before (or Header validation fail), we throw this block. That could be a hole for attacking. A better solution is to receive its Header firstly, than start a validation for this Header, if not needed, we don't need receive the bigger body part of this block.

Does this make sense to worth a hard fork?

And remember we have a security bug in #1358, which also need a hard fork.

Anyway, up to your decision:)

@ignopeverell
Copy link
Contributor

You're arguing for header-first propagation. We already have compact blocks, which are pretty much the same, maybe better. By the way none of these are strictly hard forks (nothing forks), they're just breaking protocol changes that can lead to network partitioning if not properly handled. Also we only receive a single block/header in that cases, the total size is the same as the message size.

I may be mistaking but it doesn't seem the lack of header size requries requires rewriting a deserialization. You can just use the same deserialization, it'll do the right thing from a Reader as long as you feed it enough bytes.

@garyyu
Copy link
Contributor Author

garyyu commented Aug 25, 2018

Complete the enhancement to support variable BlockHeader size in one Headers message, from Cuckoo30 to Cuckoo36.

@@ -265,6 +299,20 @@ impl BlockHeader {
pub fn total_kernel_offset(&self) -> BlindingFactor {
self.total_kernel_offset
}

/// Ser size of this header
pub fn size_of_ser(&self) -> usize {

This comment was marked as spam.

Copy link
Contributor

@ignopeverell ignopeverell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment but looking good otherwise! Restarted the failing test to see if it's just a transient issue.

@garyyu
Copy link
Contributor Author

garyyu commented Aug 26, 2018

Thanks for approving!

And function name has been changed.

The travis-ci panic at grin_chain::txhashset::extending could be a real report. But I can't reproduce on Mac, could only happen in Linux platform. I have to wait to tomorrow for Linux test since now only laptop with me, in a trip.

@ignopeverell
Copy link
Contributor

I think you're running into an issue with the fast sync test that I fixed a few days ago. Can you merge master back to double-check?

@garyyu
Copy link
Contributor Author

garyyu commented Aug 27, 2018

Linux test in local also OK.
And merged master into this PR, still fail in travis-ci, check the log and find:

Aug 27 01:18:06.852 DEBG (Server ID: Port 40000) Mining Cuckoo10 for max 60s on 86 @ 60 [ea49bda7].
Aug 27 01:18:06.853 INFO (Server ID: Port 40000) Found valid proof of work, adding block b6f5472f.
Aug 27 01:18:06.853 DEBG pipe: process_block b6f5472f at 60 with 0 inputs, 1 outputs, 1 kernels
Aug 27 01:18:06.885 INFO Rejected block b6f5472f at 60: Error { inner: stack backtrace:

Mining Cuckoo10

then check inner_mining_loop() and find:

/// The minimum acceptable sizeshift
pub fn min_sizeshift() -> u8 {
	let param_ref = CHAIN_TYPE.read().unwrap();
	match *param_ref {
		ChainTypes::AutomatedTesting => AUTOMATED_TESTING_MIN_SIZESHIFT,       <<<< i.e. 10
		ChainTypes::UserTesting => USER_TESTING_MIN_SIZESHIFT,      <<<< i.e. 16
		ChainTypes::Testnet1 => USER_TESTING_MIN_SIZESHIFT,         <<<< i.e. 16
		_ => DEFAULT_MIN_SIZESHIFT,        <<<< i.e. 30
	}
}

Please let me think out a solution to be compatible with this.

…koo10 will be used for AutomatedTesting chain
@ignopeverell
Copy link
Contributor

Starting to lose track with other PRs looking at Travis failures. Are there still test regressions caused by this PR? I've restarted tests many times but they always fail (while master mostly passes).

@hashmap
Copy link
Contributor

hashmap commented Aug 28, 2018

@ignopeverell on my machine I see a pr issue:

Aug 27 14:16:28.767 DEBG Client 127.0.0.1:12001 connection lost: Connection(Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" })
Aug 27 14:16:28.771 DEBG Client 127.0.0.1:12000 connection lost: Connection(Custom { kind: InvalidData, error: StringError("headers_streaming_body") })
Aug 27 14:16:28.771 WARN sync: no peers available, disabling sync
Aug 27 14:16:28.771 DEBG sync_state: sync_status: HeaderSync { current_height: 0, highest_height: 30 } -> NoSync

@garyyu
Copy link
Contributor Author

garyyu commented Aug 28, 2018

@ignopeverell please let me finish #1434 firstly, to fix the boring false alarm of travis-ci. then will come back to check this PR again.

@garyyu
Copy link
Contributor Author

garyyu commented Aug 30, 2018

Finally! the last fix works and pass the travis-ci test.
That mistake is caused by calculating serialized_size of block, which I was using hardcoded 42 instead of global::proofsize().

Now this PR is ready to merge.

BTW:
A remaining issue on test: that mistake should alway make test fail, it's strange that local test on my mac always passed! something must be wrong in the test, will find it with another PR.

@ignopeverell
Copy link
Contributor

Indeed finally! Sorry it was so difficult, but thanks for another great PR!

@ignopeverell ignopeverell merged commit d719493 into mimblewimble:master Aug 30, 2018
@garyyu garyyu mentioned this pull request Aug 31, 2018
@garyyu garyyu deleted the HeaderSync branch September 16, 2018 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants