-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark results #26
Comments
My current best speeds for ProgPOW as of today built as described in #27 on our HPC Village box, using GTX 1080, Titan X Maxwell, Titan Kepler, and Vega 64:
The speed on GTX 1080 achieved above wouldn't persist long-term as that GPU's clock rate reduces very significantly as the GPU gets hotter. The remaining 3 GPUs would likely achieve similar speeds long-term. For comparison, here are best speeds achieved at Ethash using current ethminer:
Of course, in a sense comparing ProgPOW vs. Ethash is apples to oranges, but on the other hand we see that the old Titan Kepler is an outlier - it performed remarkably well at Ethash (same 18M+ speed as the newer Titan X Maxwell's), but a lot worse at ProgPOW (13.9M on Titan X Maxwell vs. 5.8M on Titan Kepler). Forced use of OpenCL (rather than CUDA) on NVIDIA resulted in same or worse speeds for both Ethash and ProgPOW on all 3 NVIDIA GPUs. |
The above was for block 30k (1 GB DAG) as used in the testcase. The below is similar for block 7M (3 GB DAG), which would reflect current Ethereum. ProgPOW:
Ethash:
At this DAG size, somehow there's a huge speed drop not only on Titan Kepler but also on Titan X Maxwell, and not only at ProgPOW, but also at Ethash. Only the newer GTX 1080 and Vega 64 do well, at both ProgPOW and Ethash. |
The huge speed drop on older hardware is due to the DAG size being >2GB. This is a known limitation of the hardware: We have some slightly out-of-date expected hashrates at the end of this post: You're right at we should refresh them, though it'll take me a bit to collect all the hardware again. |
Thanks @ifdefelse! I learned a few things from those links.
|
@solardiz I've benchmarked a few GPUs I had. See here Working on benching the newest 0.9.3 spec with some AMD/Nvidia GPUs. |
With the AMDGPU-PRO driver under Linux, I am now using:
Previously, the Vega 64 card would tend to run at 1084 MHz under load, and I thought this was what it'd have to be under its default power limit - but it seems not. The first two commands above try to set the frequency to the max in this card's default frequency list, which is 1663 MHz (might be vendor OC). With these settings, the actual frequency seen under load is alternating between 1401 and 1576 MHz at Ethash, and is 1401 MHz all the time at ProgPoW. The last "echo 4" is switching the card from video to compute mode (as mentioned in @xazax310 blog post referenced above), but this appears to make (almost?) no performance difference in my testing. With these settings, I am getting speeds close to those seen in @ifdefelse's blog post also referenced above: ProgPoW block 7M speed went up from 19.8M to 22.7M (@ifdefelse's is 22.5M), Ethash block 7M speed went up from 28.2M to 36.2M initially (@ifdefelse's is 37.1M) but is reducing to ~33M under longer runs as the GPU gets hotter and the lower 1401 MHz frequency is used more. This ProgPoW speed on Vega 64 is close to @xazax310's reported 22.6M for GTX 1080 Ti, but perhaps Vega 64 consumes significantly more power at this test. |
@solardiz I'm going to get a Vega 56 for a day from a friend, ill run a few benches. Unfortunately I don't think ill have enough time to test ROCM in linux, but gains from ROCM are suppose to be 10% or so. In my further testing of 0.9.3 spec I notice core speed varies, this happens to AMD and Nvidia. Example being RX480 stock 1266mhz core speed, clocks down to 1069 core, resulting in loss of hashrate. I've had to set my core voltage to 1000mv~ then the core speeds should stay set @ 1266mhz. I've reach out to @ifdefelse to get there thoughts on it. |
@solardiz Awesome! Please also show how you're running ethminer benchmark mode. I assume you're running against block 7m? It is important to try a few spread out blocks as you're going to get different results on each block. |
@lookfirst It's ProgPoW on Vega 64:
Ethash on all 4 GPUs mentioned before (Vega 64 is
|
I got partial success repairing the speeds on older GPUs, as per my comment here from Feb 8. Specifically, after applying #35 we can then use these settings:
This doubles With this, I am getting the same speed as above (which was for ProgPoW 0.9.2) at block 7M on Vega 64 (still 22.7M), slightly lower speed on GTX 1080 (down from 15.15M to 14.8M), but much higher speed on Titan X Maxwell (up from 4.0M to 6.6M) and somewhat higher on the old Titan Kepler (up from 3.8M to 4.4M). I think the 65% speed increase on Maxwell may well justify the maybe 2.5% slowdown on Pascal. The twice larger register file usage might justify that, too, even if there were no speed increase on those older GPUs. This is based on benchmark for one block number only. Since the generated program changes with these parameter changes, it's not a direct comparison of the two versions of the parameters. For a direct comparison, many benchmarks for different block numbers would need to be run for both versions and then average speeds compared. (So even the 2.5% slowdown observed on Pascal might as well not exist for real, or it might be different.) Also, we don't strictly have to double 0.9.3's |
So I encountered the same issue with Vega 64 in Windows. While I didn't get a lot of time to play with the Vega 64, It's definitely something with AMD's driver/software controls. I couldn't get it to run correctly for Ethhash either.
To your point of increasing DAG loads to 512 which benefits older GPUs. While interesting, it's irrelevant. As a fairly large farm myself, I don't run a single Maxwell GPU. In fact nothing older than Pascal or Polaris. Most GPU farms, gamers, hobbyists, would not be running such old GPUs. If the increase had a positive effect in newer generation I would say it makes sense, but we're degrading performance in favor of older GPUs. That makes no sense. We should build the spec to current generation GPUs. In 2 years once new generation comes out and many replace old equipment, as Kristy has said, ProgPoW could then be tuned towards Turing and Navi.
|
@xazax310 Thanks for that Medium post. I learned of some news from there (I'm not following Ethereum news normally, but am looking into ProgPoW as part of Zcash's potential interest in it). You make a valid point about irrelevance of older GPUs. |
Implementing more of my idea from Feb 8 with a further hack, I got a 3x+ speedup on Titan X Maxwell (up from 4.0M to 12.3M) at a cost of 3.5% slowdown on GTX 1080 (down from 15.15M to 14.6M). Is this possibly adequate enough speed for some miners to reconsider using Maxwell again? +++ b/libprogpow/ProgPow.cpp
@@ -133,6 +133,7 @@ std::string ProgPow::getKern(uint64_t block_number, kernel_t kern)
ret << "barrier(CLK_LOCAL_MEM_FENCE);\n";
ret << "offset = share[group_id];\n";
}
+ ret << "uint32_t orig_offset = offset;\n";
ret << "offset %= PROGPOW_DAG_ELEMENTS;\n";
ret << "offset = offset * PROGPOW_LANES + (lane_id ^ loop) % PROGPOW_LANES;\n";
ret << "data_dag = g_dag[offset];\n";
@@ -181,7 +182,12 @@ std::string ProgPow::getKern(uint64_t block_number, kernel_t kern)
ret << "if (hack_false) __threadfence_block();\n";
else
ret << "if (hack_false) barrier(CLK_LOCAL_MEM_FENCE);\n";
+ ret << "if ((loop & 3) == 3) {\n";
ret << merge("mix[0]", "data_dag.s[0]", rnd());
+ ret << "} else {\n";
+ ret << merge("mix[1]", "data_dag.s[0]", rnd());
+ ret << "mix[0] = orig_offset + 1;\n";
+ ret << "}\n";
for (int i = 1; i < PROGPOW_DAG_LOADS; i++)
{
std::string dest = mix_dst(); Combined with #35 and the parameters change proposed above, this little patch implements sequential fetches of groups of 4 blocks of 512 bytes each, or effectively fetches of blocks of 2 KiB each, at the same time without requiring this much room to fetch to. I've also tested its variation with While the slowdowns on modern GPUs are really unfortunate, to me ProgPoW is far from final yet - I am considering many other tweaks - so performance differences of a few percent might be premature to take seriously, whereas the 3x+ speedup is a real thing. Consider this a proof of concept. Disclaimer: in absence of test vectors for this revised code that we'd compare against a pure host-side implementation, it's always possible that I made some error and the code doesn't actually behave as I assume it does, which would invalidate the benchmark results. These results are consistent with my expectations, and make sense to me, but they'd need to be verified. |
Where I overwrite |
As xazax said while academically interesting, I don't see any reason to tune for old Maxwell cards. Maxwell cards have performed terribly at Ethash ever since the DAG grew to >2GB back at the end of 2017. They've probably all been retired from mining farms over the past 1.5 years. There's no reason to target hardware that no longer exists. |
Thank you for sharing your opinion @ifdefelse. I'm thinking of more than "just" Ethereum here. I guess some Maxwell GPUs are still mining some other altcoins. Those GPUs might switch to Ethereum if that becomes reasonable, or those other altcoins might switch to ProgPoW. But maybe I'm imagining this. There's also the 2x increase in register file usage with these changes, which is of some value even on newer hardware, whereas the slight performance drop might or might not persist after other tweaks yet to be made for other reasons. |
It should be possible to sanity-check the performance of a ProgPOW build against its developers' expectations. For that, this repository should include benchmark results on a few system setups (which should also be documented - both hardware and software) for the latest and/or other specified versions of ProgPOW. So far, I only found outdated results in the "Testing" section of the first comment in ZcashFoundation/GrantProposals-2018Q2#15, and these don't include detail on the system setups (they only list GPU types) nor the block number.
The text was updated successfully, but these errors were encountered: