Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bench TPS performance is lacking with real PoH #4722

Closed
pgarg66 opened this issue Jun 18, 2019 · 9 comments

Comments

@pgarg66
Copy link
Contributor

commented Jun 18, 2019

Problem

The bench TPS client does not perform well when real PoH is turned on (hashes per tick is set to auto). The following dashboard snapshot shows the statistics

image

Setting hashes per tick to sleep improves the performance (TPS, and overall stability)

Proposed Solution

Debug and fix it

@pgarg66 pgarg66 added this to the Trestles v0.16.0 milestone Jun 18, 2019

@pgarg66

This comment has been minimized.

Copy link
Contributor Author

commented Jun 18, 2019

@pgarg66

This comment has been minimized.

Copy link
Contributor Author

commented Jun 19, 2019

This seems to be related to this PR #4524
I tweaked the code to use CPU based verify and it works better with that.

image

@pgarg66

This comment has been minimized.

Copy link
Contributor Author

commented Jun 19, 2019

@TristanDebrunner

I timed EntrySlice::verify() call for GPU vs Non GPU case. The one with GPU is consistently taking ~1400ms. Whereas CPU is taking around ~200 ms.

image

@TristanDebrunner

This comment has been minimized.

Copy link
Contributor

commented Jun 19, 2019

What EntrySlice::verify() call were those times from? Do we know any details of the EntrySlices? With small num_hashes and/or short slices, CPU verification is faster, which is a known issue: #4632.

@pgarg66

This comment has been minimized.

Copy link
Contributor Author

commented Jun 19, 2019

@TristanDebrunner I created a draft PR with my changes. You can view it here: #4732

@pgarg66

This comment has been minimized.

Copy link
Contributor Author

commented Jun 19, 2019

To calculate timing for CPU, I just commented out the CUDA version of verify and removed the feature flag check from the CPU version.

@TristanDebrunner

This comment has been minimized.

Copy link
Contributor

commented Jun 19, 2019

That looks like the problem is small num_hashes/short EntrySlices. It's strange that the GPU case is taking so long. On local benches I'm getting below 600us for small EntrySlices

@pgarg66

This comment has been minimized.

Copy link
Contributor Author

commented Jun 19, 2019

Can we do small batches on CPU instead of GPU? Sigverify does something similar..

@TristanDebrunner

This comment has been minimized.

Copy link
Contributor

commented Jun 19, 2019

That's what #4632 is. I just haven't gotten to it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.