Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consensus: Sync speedup #4

Closed
wants to merge 16 commits into from
Closed

Conversation

kaar2
Copy link
Contributor

@kaar2 kaar2 commented Jun 14, 2018

I'm reopening the PR from the old repo.
I've added tests and changed the default values of MAX_DOWNLOAD_TIME (from 10 seconds to 2 seconds) and MAX_UNFINISHED_JOBS (from 16 to 256.) Those values yielded best results in my tests.

kaar2 added 13 commits June 14, 2018 06:01
The method 'validateSetHashes' was added to fastSync for better code readability.

This method now also validates the block header, in addition to the hash validation.

Finally, it no longer compares the block number against latest block number as blocks are added to currentSet only if block number is greater than latest.
Remove redundant sorting of toFinalize.
Reduce the size of synchronization blocks.
Rename variables to add mroe clarity.
@orogvany
Copy link
Collaborator

Do you think these tunings are appropriate for all hardware/bandwidth?

Anyway, approved via last review

@kaar2
Copy link
Contributor Author

kaar2 commented Jun 14, 2018

Do you think these tunings are appropriate for all hardware/bandwidth?

I guess I can add those parameters to the config file, so everyone can tune it for his hardware and bandwidth but I think that for 99% of the users these will work just fine. Anyway, even in the case of a really bad bandwidth, this may result in a slower sync wbut the sync will still work (a timed-out block is added back to the download queue, but unless there is an error, the block will arrive eventually. So in the worst case there will be several requests of the same block).

For very large blocks 2 seconds isn't enough, but the vast majority of the blocks contain only a few transactions. I would say that when we get to the point where the average block size is > 2MB we might need to change this.

@orogvany
Copy link
Collaborator

orogvany commented Jun 14, 2018

well, I guess I'm more concerned about the memory implications. With 1MB block size, I'd expect an extra load of 256mb+ (depending on buffering and how we unmarshal the request (and some folks are already rather prone to complaints of memory usage). Now, we're nowhere near 1mb full blocks, so unlikely to be an issue, just more thinking out loud. 2 seconds is rather a short period too, since they're done in parallel, the 10s is likely a fine limit. Again, at full 1mb blocks, this requires 128MB/s from around the world, which will be unlikely to work as block size grows.

@kaar2
Copy link
Contributor Author

kaar2 commented Jun 14, 2018

Well the fast sync needs atleast 200 blocks as it first validates all the hashes in the validator set and only then adds the blocks to the chain. About the block timeout, I thought the same at first but I noticed a dramatic boost in performance when I changed it from 10 to 2 seconds. I guess 5 seconds might work as well.

@orogvany
Copy link
Collaborator

Ok. I trust your tests. I just worry that this value will need tuning as block size increases over time. Maybe a configuration value (with default if not set) would be worthwhile. I don't think that should block this PR though.

TreeSet<Pair<Block, Channel>> toProcess = new TreeSet<>(
Comparator.comparingLong(o -> o.getKey().getNumber()));

for (int i = 0; i < 200; i++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the 199 below should probably at least be static constants (or config values later on).

'Magic numbers' are an anti-pattern.

There's nothing special about 200 afaik, just a convenient number for sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thank you. I was playing with the numbers and forgot to change it back to getValidatorUpdateInterval().


for (int i = 0; i < 200; i++) {
Block block = null;
;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errant semicolon.

Rename sync parameters to prevent confusion:
MAX_DOWNLOAD_TIME -> DOWNLOAD_TIMEOUT
MAX_QUEUED_BLOCKS ->
MAX_QUEUED_JOBS
MAX_UNFINISHED_JOBS ->
MAX_PENDING_JOBS

Add those three parameters along with MAX_PENDING_BLOCKS to the config to allow future tuning.
@kaar2
Copy link
Contributor Author

kaar2 commented Jun 14, 2018

@orogvany I've added all sync parameters to the config so they can be easily tuned in the future.

@kaar2 kaar2 changed the title Sync speedup Consensus: Sync speedup Jun 14, 2018
@semuxgo semuxgo self-requested a review June 15, 2018 01:51
Copy link
Contributor

@semuxgo semuxgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add an option to disable this experimental feature?

lastBlockInSet = latest + config.getValidatorUpdateInterval();
fastSync = true;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned that if more and more people are using fastsync, votes for blocks inside an interval are less-likely validated. Eventually, we may never be able to sync the blocks in [start_block, last_block_of_interval]

@orogvany
Copy link
Collaborator

orogvany commented Jun 15, 2018

Why would we not be able to sync those other blocks? Clients would still hold them, just trusting they were valid, ya?

I wonder if there would be possibility of faking those blocks in between but getting enough votes at %200 would be tough. Not clear where abuse would happen

@kaar2
Copy link
Contributor Author

kaar2 commented Jun 15, 2018

I don't see how this could be abused either. fastSync is reset every validator interval and gets activated only if target >= height of the last block in the interval. Blocks on an ongoing validator round are synced in the old fashion and are fully validated.

If fastSync is activated, it first fully validates the last block in the interval before applying any blocks to the chain. Once it's validated, it also validates all previous blocks as their hashes can be compared against their children parentHash field. The only weakness (security wise) I can see compared to the normal sync is that if someone manages to create a fake block with a hash identical to the last block's parentHash, he might be able to fake all the blocks in the interval, instead of just one block. However this scenario is very unlikely and if someone could do that there are many other points of failure that can be exploited.

@semuxgo, Could you please elaborate on how exactly can we get to a point where we are unable to sync?

@semuxgo
Copy link
Contributor

semuxgo commented Jun 15, 2018

@karr2 the votes are not hashed.

@kaar2
Copy link
Contributor Author

kaar2 commented Jun 15, 2018

@semuxgo yes that is correct. But why do those votes matter? Are you concerned that we may need those votes in a future implementation and they might be lost if everyone start using fastSync?

Adding an option to enable/disable fastSync.

Adding a safe gap from the target block in which fastSync will be disabled and syncing will be performed in the normal fashion.
@kaar2
Copy link
Contributor Author

kaar2 commented Jul 1, 2018

I've added an option to enable/disable fastSync from the config file. Also, fastSync will be disabled for the last 1024 blocks, as requested.

About the conflicting file, it's because of the usage of NTP time which was introduced in a later commit. Changing line 172 (long timestamp = System.currentTimeMillis();) to "long timestamp = TimeUtil.currentTimeMillis();" and removing lines 174 and 175 will fix the problem.

@orogvany
Copy link
Collaborator

This has stagnated a bit. I vote we approve with default=off so we at least have it available for folks initial sync

@semuxgo
Copy link
Contributor

semuxgo commented Jul 25, 2018

LGTM. Finished syncing in 40 minutes.

@semuxgo semuxgo mentioned this pull request Jul 25, 2018
semuxgo pushed a commit to semuxgo/semux-core that referenced this pull request Jul 26, 2018
@semuxgo
Copy link
Contributor

semuxgo commented Jul 26, 2018

Merged

@semuxgo semuxgo closed this Jul 26, 2018
semuxgo pushed a commit to semuxgo/semux-core that referenced this pull request Jul 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants