Skip to content

OpenCL GPU Miner for x16rs#36

Merged
jojoin merged 11 commits intohacash:mainfrom
Iv-84:opencl_miner
Dec 7, 2025
Merged

OpenCL GPU Miner for x16rs#36
jojoin merged 11 commits intohacash:mainfrom
Iv-84:opencl_miner

Conversation

@Iv-84
Copy link
Contributor

@Iv-84 Iv-84 commented Dec 3, 2025

Overview

Dear maintainers,

My name is Ivan, and I have been working on this OpenCL GPU miner for approximately 2–3 months. On a GeForce RTX 5090, the current implementation achieves around 100 Mh/s.
Although I am an experienced programmer, I am not a specialist in mining software. This implementation started from scratch, based on the legacy sources uploaded by jojoin in the hacash/x16rs repository.
This PR introduces a working GPU kernel for x16rs, along with optimizations and fixes that enable practical performance on modern hardware.
I kindly ask for your review, feedback, and guidance on further improvements.

Thank you,
Ivan

Challenges in x16rs GPU Mining

The x16rs algorithm sequence is unpredictable and varies with each nonce.

  • It is not possible to simply chain kernels in a fixed order.
  • Each round requires determining the next algorithm, which complicates parallelism.

My Approach

  • Instead of processing one nonce per work-item, each thread processes multiple (unit_size) nonces (e.g., 16, 32, 64, 128).
  • By keeping many nonces in-flight, the likelihood of executing the same algorithm across multiple nonces increases, which enhances parallelism.
  • Within each work-group, hashes are shared and reordered by their next algorithm (using the last byte of the hash). This enables even stronger parallel execution.
  • All algorithms are executed within the same kernel, reducing kernel launch overhead.

Implementation Details

  • Work-groups, work-items and... unit-items!
    • Each work-group runs with local_size work-items.
    • Each work-item processes unit_size nonces.
  • Reordering per round
    • After each hashing step, nonces are bucket-sorted by hash % 16.
    • Implemented with histogram, starting_index, and offset arrays in __local memory for faster sorting
  • Local memory usage
    • Tables for Blake, AES, LT, and mixtab are preloaded into __local memory by cooperative threads.
    • Synchronization is enforced with barrier(CLK_LOCAL_MEM_FENCE).
  • Best hash selection
    • After all rounds (x16rs_repeat), each work-item finds its best hash.
    • A reduction across the work-group selects the best hash and nonce globally.
  • Optimizations applied
    • #pragma unroll in critical loops.
    • Avoiding unnecessary hash copies (using pointers directly).
    • Integration of optimized sources by Wolf.
    • Removal of legacy 80-block hashing logic, limiting processing to 32 chars.
    • Selective loop roll/unroll based on performance.
    • attribute((work_group_size_hint(256,1,1))) for compiler guidance.

Benchmarks

  • Tested in Windows/Ubuntu with similar results
  • Hashrate was between heights 650.000 and 700.000
GPU work_groups local_size unit_size Hashrate
RTX 4070 256 256 128 ~38Mh/s
RTX 4070 2048 256 256 ~40Mh/s
RTX 4090 2048 256 256 ~80Mh/s
RTX 5090 2048 256 256 ~100Mh/s
RTX 5090 2048 256 512 ~100Mh/s

Known Limitations / Future Work

  • Frequent barriers (barrier(CLK_LOCAL_MEM_FENCE)) help in x16rs parallelism but may create bottlenecks.
  • OpenCL 2.0+ features (e.g., subgroups) and vector types (uint4, ulong4) could be leveraged for further optimization.
  • Currently tested only on NVIDIA hardware; AMD testing is pending.

Code Included

The full kernel is in this PR (x16rs_main), along with other supporting headers (util.cl, x16rs.cl, sha3_256.cl). The full "opencl" folder is required in order to launch the poworker with GPU mining enabled.

As a suggestion, the "x16rs/opencl" folder could be added as an asset on the Releases page.

zip -r hacash_x16rs_opencl.zip x16rs/opencl

Config file

GPU section is required in order to mine with GPU. I will open a new PR in hacash/doc to edit https://github.com/hacash/doc/blob/main/build/config_description.md

[gpu]
use_opencl = false
work_groups = 1024
local_size = 256
unit_size = 128
opencl_dir = opencl/
platform_id = 0
device_id = 0

*️⃣ These are default values, change "use_opencl" to true to start mining with GPU

@jojoin
Copy link
Member

jojoin commented Dec 4, 2025

Thank you very much for your contribution! I am reviewing this part of the code, and once I ensure it doesn't affect the existing CPU mining section, this PR will be merged.

@jojoin
Copy link
Member

jojoin commented Dec 4, 2025

It would be great if you could ensure that the GPU mining tool is compatible with different platforms and operating systems, allowing as many GPUs as possible to participate in mining.

@Iv-84
Copy link
Contributor Author

Iv-84 commented Dec 4, 2025

@jojoin

Thank you very much for your contribution! I am reviewing this part of the code, and once I ensure it doesn't affect the existing CPU mining section, this PR will be merged.

Thanks. Let me know if something needs to be changed.

It would be great if you could ensure that the GPU mining tool is compatible with different platforms and operating systems, allowing as many GPUs as possible to participate in mining.

I tested this on Windows 10, Windows 11, Ubuntu 22.04, and Ubuntu 24.04 using NVIDIA GPUs. Since only the default NVIDIA drivers are required, any NVIDIA GPU worked fine.
I will check with the community Telegram group to see if anyone can assist with testing on AMD hardware and macOS systems.

@Iv-84
Copy link
Contributor Author

Iv-84 commented Dec 5, 2025

I have already contacted several beta testers. They are currently testing the miner and have provided valuable feedback.
I will continue working on the PR to address a few issues.

@jojoin jojoin merged commit a98a70a into hacash:main Dec 7, 2025
@jojoin
Copy link
Member

jojoin commented Dec 11, 2025

Compilation of OpenCL has added the 'ocl' feature switch. For details, please pull the latest code.
@Iv-84

@YouKenTrust
Copy link
Member

@Iv-84 Hi Ivan, thank you for pushing Hacash forward with your GPU miner work. Could you please share your ERC20 or BEP20 address? The community would like to donate 500 USDT as a small token of appreciation and support.

@Iv-84
Copy link
Contributor Author

Iv-84 commented Dec 12, 2025

@YouKenTrust I really appreciate it. I'll take it as an incentive to continue developing Hacash.
My BEP20 address is 0xd407652d2b64c8e2ac9fb219da20484fab593ec3

@YouKenTrust
Copy link
Member

https://bscscan.com/tx/0xa5674d501f7a574e24d9c629eeac7e826b087ffd4c8b85f58321e08e730a4833

@TaKKiD
Copy link

TaKKiD commented Dec 13, 2025

Windows 11 , AMD 7900XT is working, CPU is 7950X3D, Integrated Graphics Card is also working, but it's not working in AMD Software: Adrenalin Edition Driver. However, AMD Software: PRO Edition is working. The config.ini setting should be:
use_opencl = true
work_groups = 128
local_size = 128
unit_size = 28
opencl_dir = -opencl/
platform_id = 0
device_ids = 0, 1 ( 0= AMD 7900xt Min 60/Mh/s- Max 80Mh/s 1 = CPU GPU AMD 7950X3D Min1,20Mh/s Max 4 Mh/s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants