The optimized code by David Li #18

davilizh · 2017-05-15T08:49:47Z

The performance is improv…ed from 'min/mean/max: 22369621/22579336/22719146 H/s' to 'min/mean/max: 23767722/23907532/24117248 H/s' on a flashed GTX 1060 with 2 GPCs 9 TPCs (the product chip should have 10 TPCs). Note that the code is tested on the code pulled from May-11. The current code from github cannot generate reasonable scores ('min/max/avg is 0/0/0 H/s')
Optimizations include:
1. ethash_cuda_miner_kernel.cu
We have commented out "launch_bounds" in the code. launch_bound is discussed in http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#axzz4fzSzZc9p in detail.
2. dagger_shuffle.cuh

We moved around and reduced variable definitions to the minimum required. The compiler should have been able to do this analysis, but it never hurts to help out the compiler.
The state in compute_hash of dagger_shuffle.cuh is modified.
We simplify the nested if/else blocks into a switch statement.
We simplify control flow. Remove the conditional from the inner loop so all threads calculate the value, and then all threads use a __shfl to read thread t's value (throwing away the rest of the threads' calculated value).
We increase the total number of LDGs to increase occupancy. We define PARALLEL_HASH to let each warp have PARALLEL_HASH LDGs in-flight at a time, not 1 at a time, which is the original case.
Every thread is the master for calculating one hash value. Each thread initializes its version of state using keccak_f1600_init. Then in the main loop: When i=0 threads 0-7 copy the values of thread 0's state[0-7] into each threads' shuffle[0-7], do the main computation, and then thread 0 captures the result of shuffle[0-3] into state[8-11]. On the next loop when i=1 threads 0-7 copy the values of thread 1's state[0-7] into each threads' shuffle[0-7], do the main computation, and then thread 1 captures the result of shuffle[0-3] into state[8-11].
With the modification this is changed so that if PARALLEL_HASH=2: When i=0 threads 0-7 copy the values of thread 0's state[0-7] into each threads' shuffle[0][0-7] and thread 1's state[0-7] into each threads' shuffle[1][0-7]. They do the main computation on these 2 shuffle vectors in parallel. Then thread 0 captures the result of shuffle[0][0-3] into its state[8-11] and thread 1 captures the result of shuffle[1][0-3] into its state[8-11].
3. keccak.cuh
Since the input argument uint2 *s is changed in dagger_shuffle.cuh, we have to modify keccak_f1600_init and keccak_f1600_final in keccak.cuh accordingly.

chfast · 2017-05-15T09:23:11Z

libethash-cuda/ethash_cuda_miner_kernel.cu

@@ -26,7 +26,7 @@
 #endif

 __global__ void 
-__launch_bounds__(TPB, BPSM)
+//__launch_bounds__(TPB, BPSM)


Can you explain this to a newbie?

Generally, the fewer registers a kernel uses, the more threads and thread blocks are likely to reside on a multiprocessor, which can improve performance. Therefore, the compiler uses heuristics to minimize register usage while keeping register spilling and instruction count to a minimum. And this is achieved by using launch_bounds__(TPB, BPSM).
It has 2 parameters:
maxThreadsPerBlock: specifies the maximum number of threads per block with which the application will ever launch the kernel
minBlocksPerMultiprocessor: is optional and specifies the desired minimum number of resident blocks per multiprocessor
However, for the ethminer code to run on GTX1060, the restriction of launch_bound is too tight (register number is limited to about 47, while the optimal should be about 70), which makes the overhead from register spilling to be higher than performance benefit from reducing register number.
When deleting launch bound, we have about 2% performance improvement on GTX1060.

Can this decrease performance on other cards?

It depends. As long as the card has enough register file, this should increase the performance.

Can you just remove the code?
Are TPB and BPSM defined by us?

Sure, I can.
TPB and BPSM are defined in the same file (libethash-cuda/ethash_cuda_miner_kernel.cu) with launch_bound

chfast · 2017-05-15T15:34:10Z

Benchmarks for Nvidia GTX 1070 (mobile)

Before

ethminer -U -Z 4000000
ℹ 17:22:25|ethminer Mining on difficulty 29 23.07MH/s
ℹ 17:22:26|ethminer Mining on difficulty 29 26.21MH/s
ℹ 17:22:27|ethminer Mining on difficulty 29 25.17MH/s
ℹ 17:22:28|ethminer Mining on difficulty 29 25.17MH/s
ℹ 17:22:29|ethminer Mining on difficulty 29 26.21MH/s
ℹ 17:22:30|ethminer Mining on difficulty 29 25.17MH/s
ℹ 17:22:31|ethminer Mining on difficulty 29 25.17MH/s
ℹ 17:22:32|ethminer Mining on difficulty 29 25.17MH/s
ℹ 17:22:33|ethminer Mining on difficulty 29 26.21MH/s
ℹ 17:22:34|ethminer Mining on difficulty 29 25.17MH/s
ℹ 17:22:35|ethminer Mining on difficulty 29 25.17MH/s
ℹ 17:22:36|ethminer Mining on difficulty 29 26.21MH/s
ℹ 17:22:37|ethminer Mining on difficulty 29 25.17MH/s
ℹ 17:22:38|ethminer Mining on difficulty 29 25.17MH/s
ℹ 17:22:39|ethminer Mining on difficulty 29 26.21MH/s
ℹ 17:22:40|ethminer Mining on difficulty 29 25.17MH/s

ethminer -U -M
Trial 1... 25515349
Trial 2... 25864874
Trial 3... 25864874
Trial 4... 25515349
Trial 5... 25864874
min/mean/max: 25515349/25725064/25864874 H/s

After

ethminer -U -Z 4000000
ℹ 17:31:25|ethminer Mining on difficulty 31 26.21MH/s
ℹ 17:31:26|ethminer Mining on difficulty 31 26.21MH/s
ℹ 17:31:27|ethminer Mining on difficulty 31 26.21MH/s
ℹ 17:31:28|ethminer Mining on difficulty 31 27.26MH/s
ℹ 17:31:29|ethminer Mining on difficulty 31 26.21MH/s
ℹ 17:31:30|ethminer Mining on difficulty 31 26.21MH/s
ℹ 17:31:31|ethminer Mining on difficulty 31 26.21MH/s
ℹ 17:31:32|ethminer Mining on difficulty 31 26.21MH/s
ℹ 17:31:33|ethminer Mining on difficulty 31 26.21MH/s
ℹ 17:31:34|ethminer Mining on difficulty 31 26.21MH/s
ℹ 17:31:35|ethminer Mining on difficulty 31 26.21MH/s
ℹ 17:31:36|ethminer Mining on difficulty 31 26.21MH/s
ℹ 17:31:37|ethminer Mining on difficulty 31 26.21MH/s
ℹ 17:31:38|ethminer Mining on difficulty 31 26.21MH/s
ℹ 17:31:39|ethminer Mining on difficulty 31 26.21MH/s

ethminer -U -M
Trial 1... 26563925
Trial 2... 26563925
Trial 3... 26214400
Trial 4... 26563925
Trial 5... 26563925
min/mean/max: 26214400/26494020/26563925 H/s

Improvement ~3%

chfast · 2017-05-15T15:36:37Z

libethash-cuda/ethash_cuda_miner_kernel.cu

@@ -26,7 +26,7 @@
 #endif

 __global__ void 
-__launch_bounds__(TPB, BPSM)
+//__launch_bounds__(TPB, BPSM)


Can you just remove the code?
Are TPB and BPSM defined by us?

chfast · 2017-05-15T15:38:49Z

libethash-cuda/keccak.cuh

@@ -328,12 +331,22 @@ __device__ __forceinline__ void keccak_f1600_init(uint2* s)

 	/* iota: a[0,0] ^= round constant */
 	s[0] ^= vectorize(keccak_round_constants[23]);
+
+	for(uint32_t i=0; i<12; i++)


Can you reformat this to for (int i = 0; i < 12; ++i) and skip {}?

chfast · 2017-05-15T15:39:01Z

libethash-cuda/keccak.cuh

 	uint2 t[5], u, v;

+	for (uint32_t i = 0; i<12; i++)


The same reformatting here.

There are several for loops in this file that has the "for (uint32_t i = 0; i<?; i++)" style, should I re-format them all?

Only the ones you have added or modified.

…ed from 'min/mean/max: 22369621/22579336/22719146 H/s' to 'min/mean/max: 23767722/23907532/24117248 H/s' on a flashed GTX 1060 with 2 GPCs 9 TPCs (the product chip should have 10 TPCs). Note that the code is tested on the code pulled from May-11. The current code from github cannot generate reasonable scores ('min/max/avg is 0/0/0 H/s')

…e for loop in keccak.cuh

chfast · 2017-05-16T10:03:43Z

libethash-cuda/keccak.cuh

@@ -328,12 +331,19 @@ __device__ __forceinline__ void keccak_f1600_init(uint2* s)

 	/* iota: a[0,0] ^= round constant */
 	s[0] ^= vectorize(keccak_round_constants[23]);
+
+	for(int i=0; i<12; ++i)


I also had in mind to add some spaces. After for, around =, around <.

davilizh · 2017-05-17T00:56:36Z

libethash-cuda/keccak.cuh

@@ -328,12 +331,19 @@ __device__ __forceinline__ void keccak_f1600_init(uint2* s)

 	/* iota: a[0,0] ^= round constant */
 	s[0] ^= vectorize(keccak_round_constants[23]);
+
+	for(int i=0; i<12; ++i)


Sorry, I'm not quite understand. Should I add a new line between "s[0] ^= vectorize(keccak_round_constants[23]);" and "}" ?

I mean for_(int i_=_0; i_<_12; ++i). Required spaces marked with _.

Ah, sorry for my misunderstanding. I have checked in my code, please help to review. Thank you.

qtxwang · 2017-05-25T20:34:57Z

Tested on Nvidia GRID K520 on Ubuntu, performance decreased by about 5%, maybe it's specific to my card.

qtxwang · 2017-05-25T20:43:55Z

min/mean/max: 0/4858402/7689557 H/s
inner mean: 5534151 H/s

davilizh · 2017-05-26T08:10:39Z

This code is optimized for GTX1060 according to its architecture. As you know, if we want to get the highest performance, we have to optimize code according to the underlying architecture of each specific device. Can we separate the code for each device, or can we add switches/macros in the code to distinguish device related optimizations?

chfast · 2017-05-26T08:20:10Z

I understand this, but looks we cannot ship this in this form.

Yes for separating code for each architecture (not device), but I don't know how to do it.
The best might be to have command line switch to enable this optimization to get feedback from users if it make it faster for their device.

davilizh · 2017-05-26T08:44:04Z

Yeah, adding command line switch will surely work.
BTW, you can also use the architecture identification macro "CUDA_ARCH" supported by NVCC, which can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled.
Detailed description about the macro is here: http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#axzz4iAhM64FX

chfast · 2017-05-26T08:53:20Z

I'm happy with enable it for 10XX GPUs (this works good for at least some of them). We can add a switch in next iteration. This is all up to you.

Do you have access to more hardware to perform more tests?

davilizh · 2017-05-26T09:04:41Z

Agree, we can enable it and wait for user's feedback.
Currently, I have no other device at hand. But I will probably evaluate the code on another device in the near future (maybe 1-2 weeks).

brianorwhatever · 2017-06-12T23:30:01Z

Is this going to help with performance on all GDDR5X gpus or is it addressing a different issue?

davilizh · 2017-06-14T02:26:55Z

@brianorwhatever This code is optimized for GP106 with 1GPC GDDR5 case, and may be helpful for other chips. You can scale the “#define PARALLEL_HASH 4” in dagger_shuffled.cuh from 1 to 8 to check whether can improve performance for your GPU.

andrusha · 2017-06-19T18:03:32Z

Tested on M60:

Before:

min/mean/max: 17476266/17755886/17825792 H/s
inner mean: 5941930 H/s

After:

min/mean/max: 17825792/18035507/18175317 H/s
inner mean: 6058439 H/s

Which is 2% improvement. Didn't notice any difference from 4 or 8 parallel hashes.

davilizh · 2017-06-23T00:27:49Z

@qtxwang, Hi, qtxwang, can you test my code again with "./ethminer -M -U --benchmark-warmup 100"
It's weird that your min score is 0, which might bring about noise to the comparison.
BTW, is your pasted code tested on my code, or on the master branch code?
Can you paste the results for both version of the code?

Thank you very much.

min/mean/max: 0/4858402/7689557 H/s
inner mean: 5534151 H/s

@andrusha Hi, Andrusha, thank you for your testing. The code have the best performance on my GTX1060 when PARALLEL_HASH = 4.

kiwina · 2017-06-24T00:08:30Z

hi, davilizh what 1060 did you test on?
i cant get them over 19

cfelicio · 2017-06-24T15:42:35Z

can anyone provide a windows binary with this change, I would love to test it on a GTX 1070. Thanks!

chfast · 2017-06-24T18:26:48Z

Windows binaries are always available on AppVeyor CI.
https://ci.appveyor.com/project/ethereum-mining/ethminer/build/93/job/ss7k95dsy1kly4vl/artifacts

emily-pesce · 2017-07-02T03:01:29Z

Not sure if you're checking this anymore but here are my results:

GTX 1070
Ubuntu 16.04
Memory offset +1500
Power ceiling 115w
./ethminer -U -M --cuda-parallel-hash X:
31.10 at 1
32.36 at 2
28.87 at 3
32.42 at 4
25.86 at 5
21.60 at 6
18.59 at 7
32.22 at 8

jacksgituk · 2017-07-02T15:26:22Z

CUDA error in func 'ethash_cuda_miner::search' at line 359 : unspecified launch failure

Occurs after around 20 minutes of mining, unable to reproduce on Claymore which appears to run fine.

Running on 6x Gigabyte 1060 3GB cards

spieiga · 2017-07-02T16:42:52Z

@michael-pesce which nvidia driver version are you running? how were you able to change clock speeds in linux? I have tried via nvidia-smi and nvidia-settings but can't with 375.66 linux driver and a 1070 devtalk.nvidia discussion

emily-pesce · 2017-07-02T17:46:04Z

@spieiga 375.66

First, make sure that the x.conf configuration allows overclocking by using this command and restarting:

sudo nvidia-xconfig -a --cool-bits=31 --allow-empty-initial-configuration

Then use nvidia-settings to overclock. Note: you must be in an X session to do so. Here's a script I use at startup:

#! /bin/bash

SET='/usr/bin/nvidia-settings'
    
NUMGPU="$(nvidia-smi -L | wc -l)"

echo "Setting up ${NUMGPU} GPU(s)"

    n=0
    while [  $n -lt $NUMGPU ];
    do

  ${SET} -a [gpu:${n}]/GPUFanControlState=1 -a [fan:${n}]/GPUTargetFanSpeed=60 -a [gpu:${n}]/GPUPowerMizerMode=1 -a [gpu:${n}]/GPUMemoryTransferRateOffset[3]=1000

      let n=n+1
    done
    echo "Complete"; 
    exit 0;

Each command would look something like this:

nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=60 -a [gpu:0]/GPUPowerMizerMode=1 -a [gpu:0]/GPUMemoryTransferRateOffset[3]=1000

That sets fans to 60% and memory transfer speed offset to +1000, and the script will iterate through each GPU (in my case 0..5)

Output of nvidia-smi -q just so you can see version numbers and such.

==============NVSMI LOG==============

Timestamp : Sun Jul 2 11:32:45 2017
Driver Version : 375.66

Attached GPUs : 6
GPU 0000:01:00.0
Product Name : GeForce GTX 1070
Product Brand : GeForce
Display Mode : Enabled
Display Active : Enabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 1920
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-xxx
Minor Number : 0
VBIOS Version : 86.04.50.00.AB
MultiGPU Board : No
Board ID : 0x100
GPU Part Number : N/A

davilizh · 2017-07-03T01:06:31Z

@sleepeeg3, the code is for cuda only.

murathai · 2017-07-05T11:06:41Z

Works like charm.
4x palit gtx 1060 3gb ram (+748 mem, -200 core (same rate for 0 core), 75% tdp (100% tdp also gives same)) = 95.33 - 96.2 megahash/sec stable.
claymore got 90 megahash/sec using same settings.

froggrog · 2017-07-05T17:48:08Z

4x Palit GTX 1060 3gb
Ubuntu 16.04
claymore got 94 mh/s
ethminer got 104 mh/s
but the Nanopool statistics show me a smaller hashrate than when i use a claymore. why?

froggrog · 2017-07-05T17:58:59Z

@spieiga
You also can use this command to increase Power Limit. This increases the scope of overclocking
sudo nvidia-smi -i GPU -pl WATT
example:
sudo nvidia-smi -i 1 -pl 140

You can also find out the maximum value for your gpu's
nvidia-smi -q -d POWER

LtMerlin · 2017-07-05T18:14:24Z

@michael-pesce any idea to get this working on a headless ubuntu machine? I tried a fresh Ubuntu server 16.04 LTS installation with xfce4 and nvidia-375...
Did you use an Ubuntu Desktop installation with standard Xorg config?

spieiga · 2017-07-05T18:48:01Z

@eliclement from what I've read, it can't be headless, otherwise the NVIDIA driver is not loaded.

emily-pesce · 2017-07-06T04:49:08Z

@eliclement I couldn't figure it out.

Instead, I bought these: https://www.amazon.com/gp/product/B00JKFTYA8/

They fake a monitor UDID (or whatever) so X loads. Then you can set a file to autostart when X loads that does the overclocking/fan speed setting. If you need to adjust you simply screen share into the X session.

LtMerlin · 2017-07-06T09:23:50Z

@michael-pesce thanks for the tip. I have such a dummy and tried it, but when setting a nvidia-settings params, i always get:
ERROR: Error querying connected displays on GPU 0 (Missing Extension)

(logged in through vnc, Ubuntu 16.04, xfce4, nvidia-375)
any ideas how to solve this?

LtMerlin · 2017-07-06T13:06:48Z

It works now by installing the Ubuntu 16.04 LTS Desktop instead of the server distro with the dummy plug.

rizwansarwar · 2017-07-07T10:06:11Z

@michael-pesce @eliclement you don't need a monitor connected to make X work. At the time of installation save the EDID of the monitor using nvidia-settings and then use the edid.bin file in your xorg.conf to fake X that there is a monitor connected. I have this working on my rig and X has no issues. You can add edid by using nvidia-xconfig --custom-edid=<location of edid.bin>. This will generate your xconfig using fake edid, X should start fine after that.

froggrog · 2017-07-07T12:51:41Z

@michael-pesce @eliclement
You can use this config, it is made on 4 gpu's

# nvidia-xconfig: X configuration file generated by nvidia-xconfig
# nvidia-xconfig:  version 375.66  (buildmeister@swio-display-x86-rhel47-06)  Mon May  1 15:45:32 PDT 2017

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
    Screen      1  "Screen1" RightOf "Screen0"
    Screen      2  "Screen2" RightOf "Screen1"
    Screen      3  "Screen3" RightOf "Screen2"
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
EndSection

Section "Files"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Mouse0"
    Driver         "mouse"
    Option         "Protocol" "auto"
    Option         "Device" "/dev/psaux"
    Option         "Emulate3Buttons" "no"
    Option         "ZAxisMapping" "4 5"
EndSection

Section "InputDevice"
    # generated from default
    Identifier     "Keyboard0"
    Driver         "kbd"
EndSection

Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       28.0 - 33.0
    VertRefresh     43.0 - 72.0
    Option         "DPMS"
EndSection

Section "Monitor"
    Identifier     "Monitor1"
    VendorName     "Unknown"
    ModelName      "Unknown"
EndSection

Section "Monitor"
    Identifier     "Monitor2"
    VendorName     "Unknown"
    ModelName      "Unknown"
EndSection

Section "Monitor"
    Identifier     "Monitor3"
    VendorName     "Unknown"
    ModelName      "Unknown"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:1:0:0"
EndSection

Section "Device"
    Identifier     "Device1"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:4:0:0"
EndSection

Section "Device"
    Identifier     "Device2"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:5:0:0"
EndSection

Section "Device"
    Identifier     "Device3"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:7:0:0"
EndSection

Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    Option         "Coolbits" "28"
    DefaultDepth    24
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen1"
    Device         "Device1"
    Monitor        "Monitor1"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "28"
EndSection

Section "Screen"
    Identifier     "Screen2"
    Device         "Device2"
    Monitor        "Monitor2"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "28"
EndSection

Section "Screen"
    Identifier     "Screen3"
    Device         "Device3"
    Monitor        "Monitor3"
    DefaultDepth    24
    Option         "AllowEmptyInitialConfiguration" "True"
    Option         "Coolbits" "28"
EndSection

efrister · 2017-07-14T10:35:04Z

Anyone using Linux having success with the --cuda-parallel-hash setting? Not sure if it's related to the OS, but it seems to have no noticable effect on my hashrates for the GTX 1060 from MSI Gaming with 6 GB.

pabloi · 2017-07-14T10:43:31Z

@efrister I have on Ubuntu 17.04, on a GTX 1060. The default (4) seems best for me, with slight decreases for 2 and 8. Are you starting ethminer with the -U flag? Use a large number (4000) for accurate hashrates with --farm-recheck.

efrister · 2017-07-14T10:49:09Z

@pabloi I get 23 MH/s using "-U --cuda-parallel-hash 4 -S $pool -SP 1 -O $walletETH.$rigName --farm-recheck 4000". I got 23 before when using Claymore, and I get the same when I just leave out the --cuda-parallel-hash.

pabloi · 2017-07-14T11:12:27Z

@efrister That is strange. Did you compile it yourself?

efrister · 2017-07-14T11:16:54Z

@pabloi I'm using it pre-packaged with the Simplemining OS (simplemining.net). Don't think that it's self-compiled there, but I can't say for sure. Using their beta image, so maybe it is related to that.

pabloi · 2017-07-14T12:00:06Z

@efrister I am not familiar with simple mining but the CUDA miner is not compiled by default, so I built it myself setting the appropriate flag (see instructions on repository main page). Perhaps that is the issue.

Singman33 · 2017-07-16T23:58:32Z

So, you are getting 23Mh/s with both Ethminer and Claymore. This is perfect, use Ethminer, it's still free and no dev fees ! What are you asking for ? The default argument for --cuda-parallel-hash is 4, everything is ok.
I'm using Ubuntu 16.04 and I'm getting the same hashrate with Claymore and Ethminer, this is a great improvement.

petunder · 2017-07-22T08:15:03Z

Tonight I experimented with the version of miner 0.11.0. Locally, it shows a significant increase in the hashrate. In my case (3x1050 ti) from 36 to 40 Mh / s. However, Nicehash does not actually take this speed. As a result, overnight average dropped exactly 36 Mh/s. It means that the miner worked in the void. With other pools I did not try to run a new version. At the morning I returned the version of Genoil, everything works fine, the speed is normally accepted by the Nicehash.

ajthemacboy · 2017-07-22T18:35:53Z

@petunder I had similar luck on Ethermine. The miner's reported hashrate was about 100-102 MH/s but the pool reported an average hashrate of 94. Claymore's miner reports a hashrate of 94 as well.

spieiga · 2017-07-25T12:39:05Z

trust the hashrate you are seeing locally on the GPUs. the hashrates reported by the pools depend on how many shares you are contributing. they can't measure your true hashrate, all they can go by is how many shares you submit. and the shares you submit depend on the difficulty of the block you are mining.

charlesxabier · 2017-08-01T20:15:43Z

Glad to say my hasrate improved from 62 to 66 !!! ( 3 gtx1050ti + 1 1060 6gb).

petunder · 2017-08-04T08:41:37Z

@ajthemacboy It seems that this fork of ethminer does not send to the pool all the shares found. Genoil version works fine, local hashrate strong equal pool accepted hashrate.

djglamrock · 2017-08-06T15:43:28Z

Downloaded the exe and put it into a folder with my bat file and when I try and run the exe it keeps saying it is getting work package and then failed to submit hash and client connection error and it cant connect to http blah blah. I even tried copying the four dll files from the ethermine folder I am currently using that works and still the same thing. Thoughts?

chfast reviewed May 15, 2017

View reviewed changes

chfast suggested changes May 15, 2017

View reviewed changes

davilizh force-pushed the master branch from 22ddd71 to 8c4391c Compare May 16, 2017 09:38

davilizh force-pushed the master branch from 8c4391c to 63ec702 Compare May 16, 2017 09:41

1. delete launch bound in ethash_cuda_miner_kernel.cu 2. re-format th…

2f945b2

…e for loop in keccak.cuh

chfast reviewed May 16, 2017

View reviewed changes

delete dead code resulting from launch_bound deleting

de2c1e3

davilizh commented May 17, 2017

View reviewed changes

add space between equations in the for loop

c99b693

chfast approved these changes May 17, 2017

View reviewed changes

chfast force-pushed the master branch from 7ad10e3 to aabbf15 Compare May 24, 2017 08:56

davilizh changed the title ~~The optimized code by Nvidia Architecturer.~~ The optimized code by David Li May 26, 2017

chfast mentioned this pull request Jun 9, 2017

CUDA 8 problem relocation R_X86_64_32S against #38

Closed

Merge branch 'master' into master-davilizh

4c89e92

blinkin69 mentioned this pull request Jul 5, 2017

GTX 1060 6GB - max. Mhs 20.66 #119

Closed

ethereum-mining locked and limited conversation to collaborators Aug 6, 2017

The optimized code by David Li #18

The optimized code by David Li #18

Conversation

davilizh commented May 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chfast commented May 15, 2017

Benchmarks for Nvidia GTX 1070 (mobile)

Before

After

Improvement ~3%

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

qtxwang commented May 25, 2017

qtxwang commented May 25, 2017

davilizh commented May 26, 2017

chfast commented May 26, 2017

davilizh commented May 26, 2017

chfast commented May 26, 2017

davilizh commented May 26, 2017

brianorwhatever commented Jun 12, 2017

davilizh commented Jun 14, 2017 • edited

andrusha commented Jun 19, 2017

davilizh commented Jun 23, 2017 • edited

kiwina commented Jun 24, 2017

cfelicio commented Jun 24, 2017

chfast commented Jun 24, 2017

emily-pesce commented Jul 2, 2017

jacksgituk commented Jul 2, 2017 • edited

spieiga commented Jul 2, 2017

emily-pesce commented Jul 2, 2017 • edited

davilizh commented Jul 3, 2017

murathai commented Jul 5, 2017

froggrog commented Jul 5, 2017

froggrog commented Jul 5, 2017

LtMerlin commented Jul 5, 2017

spieiga commented Jul 5, 2017

emily-pesce commented Jul 6, 2017

LtMerlin commented Jul 6, 2017

LtMerlin commented Jul 6, 2017

rizwansarwar commented Jul 7, 2017

froggrog commented Jul 7, 2017

efrister commented Jul 14, 2017

pabloi commented Jul 14, 2017

efrister commented Jul 14, 2017

pabloi commented Jul 14, 2017

efrister commented Jul 14, 2017

pabloi commented Jul 14, 2017

Singman33 commented Jul 16, 2017

petunder commented Jul 22, 2017

ajthemacboy commented Jul 22, 2017

spieiga commented Jul 25, 2017

charlesxabier commented Aug 1, 2017

petunder commented Aug 4, 2017

djglamrock commented Aug 6, 2017

davilizh commented Jun 14, 2017 •

edited

davilizh commented Jun 23, 2017 •

edited

jacksgituk commented Jul 2, 2017 •

edited

emily-pesce commented Jul 2, 2017 •

edited