Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile OpenCL kernels for correct ProgPoW period seed #44

Merged
merged 1 commit into from Jun 10, 2019

Conversation

solardiz
Copy link
Contributor

@solardiz solardiz commented Jun 7, 2019

Fixes #43

@solardiz
Copy link
Contributor Author

solardiz commented Jun 7, 2019

I tested this using the debugging output patch I posted in #25 (comment) and using block numbers 10000, 29000, 30000, 31000, 7000000, 10000000, 10001000 with ProgPoW 0.9.2. When run on NVIDIA cards (yet via OpenCL), this produced digests matching c-progpow's and CUDA implementation's. When run on AMD Vega 64 using AMDGPU-PRO 18.50, this produced digests matching c-progpow's and those seen on NVIDIA cards for all of these block numbers except 10000 and 10000000. That's puzzling, but is probably an unrelated issue to what we're fixing here. I suspect a miscompile by AMD's OpenCL backend on those block numbers (and this illustrates a maybe-drawback of ProgPoW in general). AMD's OpenCL is notorious for its miscompiles. Another guess I had is a timing issue (e.g., trying to use the DAG before it's ready), but this is sort of disproven by the block 10001000 test giving the correct result.

I only tested this with fixed block numbers (using the -M option to provide them). It's untested whether the logic to rebuild the kernel and when necessary the DAG on changing epoch and period works correctly or not. Specifically, this added check (and its placement) is untested:

		if (!new_epoch)
			return true;

I think this PR is a clear improvement compared to the obviously very wrong state the code was in prior to the changes here (with only block 30000 out of those I listed above giving the correct digest). So I suggest it be merged and then further testing can proceed and maybe further changes made.

@solardiz
Copy link
Contributor Author

solardiz commented Jun 7, 2019

Also works correctly (matching c-progpow) on the AMD for blocks 1000, 100000, 1000000, 9999000, 10002000, 10003000, 10004000, 10005000, 10006000.

@solardiz
Copy link
Contributor Author

solardiz commented Jun 7, 2019

Also works correctly (matching c-progpow) on the AMD for block 10000100, but fails on the AMD for block 10000050 (just one period after the also failing block 10M). Puzzling indeed. Maybe there's some similarity between kernels for adjacent seeds due to imperfection of the PRNG.

@OhGodAGirl
Copy link
Collaborator

OhGodAGirl commented Jun 8, 2019

We are currently reviewing this PR; this will not work due to the previously aforementioned compiler issue with AMD. I know Andrea had a workaround in his GitHub, but I am not aware of whether that is public or not anymore. I have reached out to clarify.

@AndreaLanfranchi
Copy link
Contributor

Let's separate problems here:

what this PR corrects is the detection of period switching: in OpenCL's workloop() the code is a leftover from very early implementation when progpow_period == ethash_epoch_size (i.e. 30k blocks.) On CUDA code this was already amended and this PR make the logic for switching periods to converge.
Nevertheless CUDAMiner and CLMiner still behave differently: on the latter the single call to init() (which generates both the DAG and PP Kernel) causes the DAG to be recreated every 50 blocks, set aside the fact init() does a lot more of redundant stuff.

AMD results vs NVIDIA results.
What @solardiz is referring to is what is known as the "Bogus periods" issue originally discovered by @jean-m-cyr : on AMD's OpenCL (actually NVIDIA OpenCL is compiled by the very same NVVM compiler used in CUDA) the random sequence of instructions contained in progPowLoop() produces erratic results when certain combinations (driven by the period) are met.
I have found that the only way to prevent this is to change the unroll factor in front of the loop: apparently the prevention of the unroll causes a race condition among lanes.

This solves the "Bogus Periods"

        // For whatver reason prevent unrolling this loop causes
        // bogus periods on AMD OpenCL. Use any unroll factor greater than 1
        #pragma unroll 2
        for (uint32_t l = 0; l < PROGPOW_CNT_DAG; l++)
            progPowLoop(l, mix, g_dag, c_dag, share[0].uint64s, hack_false, lane_id, group_id);

@AndreaLanfranchi
Copy link
Contributor

Please also keep in mind that this repo (per @ifdefelse words) is not to be considered a fully implemented and production-ready miner. It's a staging laboratory for the algo.

@solardiz
Copy link
Contributor Author

solardiz commented Jun 8, 2019

Thank you @OhGodAGirl and @AndreaLanfranchi for the replies.

what this PR corrects is the detection of period switching: in OpenCL's workloop() the code is a leftover from very early implementation when progpow_period == ethash_epoch_size

BTW, this explains why BitcoinInterestOfficial apparently didn't run into this issue: I think they're using a revision of ProgPoW from prior to that change.

Nevertheless CUDAMiner and CLMiner still behave differently: on the latter the single call to init() (which generates both the DAG and PP Kernel) causes the DAG to be recreated every 50 blocks

I think not: that's what the added if (!new_epoch) return true; mentioned above is meant to prevent, but like I said that's untested for now.

This solves the "Bogus Periods"

I've just confirmed that this little change, on top of my other changes, does result in correct computation for the previously problematic blocks 10000, 10M, 10M+50 for me. I didn't retest other blocks, so don't know if it possibly exposed a problem for any other block number (as it might happen with unreliable compilers). Perhaps this change should be merged into this ProgPoW tree as well, separately from merging of this PR.

Please also keep in mind that this repo (per @ifdefelse words) is not to be considered a fully implemented and production-ready miner. It's a staging laboratory for the algo.

I understand that, but it's a major problem for ProgPoW adoption (as well as testing, benchmarking, forking and tweaking) if it doesn't have one specific reference upstream implementation of its latest revision that's entirely correct and can be tested and benchmarked. It doesn't strictly have to be "a fully implemented and production-ready miner", but not being entirely correct and not outputting test vectors is problematic for it being "a staging laboratory".

@AndreaLanfranchi
Copy link
Contributor

AndreaLanfranchi commented Jun 8, 2019

@solardiz

BTW, this explains why BitcoinInterestOfficial apparently didn't run into this issue: I think they're using a revision of ProgPoW from prior to that change.

BCI implements a period == epoch and early versions of PP impl.

I didn't retest other blocks, so don't know if it possibly exposed

I ran tests from block 7M to 20M without any "bogus period" found at the moment.

I understand that, but it's a major problem for ProgPoW adoption [...]

There was one ... my work to make ethminer dual algo capable. Unfortunately EF's inertia and all the messes around possible adoption by CoreDevs, CatHerders, Influencers ... whatever ... prevented me from obtaining the funds to continue working on it. So I decided to withdraw my repo.
I think the most complete implementation publicly available at the moment is this one (which resurrected part (if not all) of my work till it was available).

https://github.com/miscellaneousbits/ethminer

@AndreaLanfranchi
Copy link
Contributor

AndreaLanfranchi commented Jun 8, 2019

I think not: that's what the added if (!new_epoch) return true; mentioned above is meant to prevent, but like I said that's untested for now

Unfortunately it is so. If you look at the workloop implementation for CUDA vs OpenCL one you'll notice.

  • CUDA splits epoch initialization and PP kernel generation/compilation. OpenCL doesn't
  • OpenCL does all within init() function which, on every iteration, rebuilds DAG. Thus when PP kernel is compiled, at each period switch, DAG is also rebuilt.

This mock has been built on an outdated release of ethminer (iirc was 0.13 or 0.14).

Edit ... sorry ... I didn't read carefully. Yes your return on no epoch change mitigates the problem even though a lot of unuseful stuff is still performed (eg the rescan of available devices).

@solardiz
Copy link
Contributor Author

solardiz commented Jun 8, 2019

I decided to withdraw my repo.

Maybe you'd resurrect it and add a mention to the very beginning of README.md that the repo is unmaintained (optionally also since when and why)? There are links to it, which are now broken.

a lot of unuseful stuff is still performed (eg the rescan of available devices).

Do you think we should address that in this repo, or maybe not? For my current use, being able to compute digests for fixed block numbers and to run benchmarks (that would represent performance of an actual miner) is sufficient. So I opted for fewer changes in this PR, especially given my lack of testing except on individual block numbers. We can improve mining efficiency (starting with second period and on) with separate PRs if desired.

@OhGodAGirl
Copy link
Collaborator

I understand that, but it's a major problem for ProgPoW adoption (as well as testing, benchmarking, forking and tweaking) if it doesn't have one specific reference upstream implementation of its latest revision that's entirely correct and can be tested and benchmarked. It doesn't strictly have to be "a fully implemented and production-ready miner", but not being entirely correct and not outputting test vectors is problematic for it being "a staging laboratory".

I hear you, and agree @solardiz.

We should follow best practices, and make this a correct “reference” implementation - to the best of our abilities.

@ifdefelse ifdefelse merged commit e87df97 into ifdefelse:master Jun 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ProgPoW OpenCL kernel is usually built for wrong epoch
4 participants