Skip to content
This repository has been archived by the owner on Apr 24, 2022. It is now read-only.

Can't mix NVIDIA + multiple AMD cards #725

Closed
nachitox opened this issue Feb 10, 2018 · 29 comments
Closed

Can't mix NVIDIA + multiple AMD cards #725

nachitox opened this issue Feb 10, 2018 · 29 comments

Comments

@nachitox
Copy link

nachitox commented Feb 10, 2018

Hi,
I used to have 1 GTX 1070 and 1 AMD RX 470 and ethminer worked fine.
Now, I installed a AMD RX 480 and I can't make the 3 work together.

This is the device list
devices

If I run ethminer -F http://eth-us.dwarfpool.com:80/WALLET --farm-recheck 200 -G only the AMD cards work
cmd-1

If I run ethminer -F http://eth-us.dwarfpool.com:80/WALLET --farm-recheck 200 -X (the command I used to run when I had only 2 cards)
the DAG is loaded twice on a AMD card and the rest it's idle. The NVIDIA card works fine.
cmd-2

I tried adding --opencl-devices 0 1 but the result is the same.

Am I doing something wrong?

Thank you

@DeadManWalkingTO
Copy link
Contributor

Have you tried running two different instances, one for the NVidia cards (-U) and one for the OpenCL cards (-G)?

@nachitox
Copy link
Author

nachitox commented Feb 10, 2018

Yes, I'm doing that, but that's a "hack". You are doubling the requests by 2
Shouldn't ethminer handle the case well with the -X flag ?

@SnowLeopard71
Copy link
Contributor

Try "--opencl-platform 0"
I ran 5 GTX 1060s and an RX570, but the temperature and fan speed were not reported on the RX570 in mixed mode.

@nachitox
Copy link
Author

--opencl-platform 0 is the default value. Just in case I try it and it's the same result as -X alone
--opencl-platform 1 only uses the NVIDIA card but treats it like two (5GB used instead of 2.5) 😕

@AnjinMeili
Copy link

I am also having this same issue.

I have a mixed rig with three nvidia and two amd gpus. Using ethminer in either opencl or cuda mode works just fine for the card type (-G or -U), but not in mixed mode (-X).

With -X, and --opencl-platform 1 (AMD gpus are in as platform 1, NVidia is 0), and any combination of --opencl-devices & --cuda-devices, only the cuda devices actually mine.

My documentation matches the above case fairly well, but more information can be provided if needed.

Using the latest 4.15 linux build with Cuda 9.1, NV 390.25, & AMDGPU PRO 17.50. Ethminer 0.14.0.dev1

@nachitox
Copy link
Author

Is this a bug/issue or I'm doing something wrong?

@cmdrscotty
Copy link

same issue here with a gtx 1060 and r9 390

shows up as:

OPENCL
[0] [0] GTX 1060
[0] [1] r9 390

CUDA

[0] [0] GTX 1060

unable to get it to use the r9 and gtx under -x. have to launch two instances for it to work.

@jean-m-cyr
Copy link
Contributor

I think recent PR #710 might help with that. Not the same but I can split my Nvidia rig using -X --opencl-device parameters.

@cmdrscotty Not sure if it will work, I don't have a mixed rig. Have you tried with the head code?

ethminer -X --opencl-device 1

@cmdrscotty
Copy link

@jean-m-cyr sure have. Tried --opencl-device 0 as well as 1, program still reverts to cuda mining on the gtx1060
Also tried --opencl-devices 0 and 1 no dice.

Only time I've gotten it to use something other than the gtx is --opencl-platform 1. But then it tries to do opencl on both the gtx and r9.

Wish there was a way to tell it to do only opencl on the AMD and only cuda on the Nvidia, but it seems it can't do that

@jean-m-cyr
Copy link
Contributor

@cmdrscotty I've got a AMD GPU arriving tomorrow. Will see

@Davesmacer
Copy link
Contributor

Davesmacer commented Feb 15, 2018

Same issue here. In mixed mode works only with one card and treats it like two. I am currently working with 2 separate consoles. I'm using Windows 10

@AnjinMeili
Copy link

AnjinMeili commented Feb 15, 2018

Some data for this issue. Issue is with mixed GPU rig and mining on both AMD and NVIDIA concurrently.

Linux 4.15 kernel, NVidia 390.25, AMDGPU-PRO 17.50, OpenCL 1.2, CUDA 9.1.84

ethminer 0.14.0.dev1

Output of --list-devices
screen shot 2018-02-15 at 2 16 54 pm

Running in OpenCL mode "-G --opencl-platform 1", both AMD cards work fine as a group:
screen shot 2018-02-15 at 2 21 50 pm

Next in CUDA mode "-U", all three cards work fine in a group:
screen shot 2018-02-15 at 2 40 20 pm

And in MIXED mode "-X --opencl-platform 1", the cards are all identified properly, but only the CUDA devices post any work.
screen shot 2018-02-15 at 2 26 17 pm

If I start without any options for "-X", it fails as follows:
screen shot 2018-02-15 at 2 47 43 pm

And if I start in OpenCL mode without options "-G", it defaults to using the NVIDIA cards only.

@nachitox
Copy link
Author

@jean-m-cyr Could you replicate the issue?

@SnowLeopard71
Copy link
Contributor

@AnjinMeili Could you possibly try with only one AMD card (and keep all Nvidias)?
Just to know if when I ran 5 Nvidias and 1 AMD successfully (except for AMD temp), it was a fluke or not.

@Davesmacer
Copy link
Contributor

Davesmacer commented Feb 16, 2018

I just tried using only 1 card with --opencl-device 0, but it keeps selecting both of the cards, and ignore the Nvidia's (I used the -X option). It works fine this way as if I were running the command with -G. I'd need to shut down and disconnect one of the cards to do the test with -X and --opencl-device option. Maybe I'll do it later tonight with more time.

The issue I have is when I place both of my AMD, the program seems to use only one and use it as it were two. It gives me exact same hashrate as if I open two terminals running with the same card. The other card is not even getting warm or accelerating its clock. I'm using latest master build in Windows 10

Note: AMD temps are working fine for me with -G. I'll paste some output images later tonight.

@Davesmacer
Copy link
Contributor

@jean-m-cyr any luck? I'd like to take a look into this issue tomorrow night if you don't have the time or cards to replicate it... It is next on my Mining todo's...

@unknown2this
Copy link

unknown2this commented Feb 18, 2018

Same issue as everyone else.
Listing OpenCL devices.
FORMAT: [platformID] [deviceID] deviceName
[0] [0] GeForce GTX 1060 6GB
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 6367739904
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1591934976
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
[0] [1] GeForce GTX 1050 Ti
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 4233625600
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1058406400
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
[0] [2] GeForce GTX 1060 6GB
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 6367739904
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1591934976
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
[0] [3] GeForce GTX 1060 6GB
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 6367739904
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1591934976
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024
[1] [0] Ellesmere
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 1642287104
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1384792064
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256
[1] [1] Ellesmere
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 1642287104
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1384792064
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256
[1] [2] Ellesmere
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 1646170112
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1385005056
CL_DEVICE_MAX_WORK_GROUP_SIZE: 256

Listing CUDA devices.
FORMAT: [deviceID] deviceName
[0] GeForce GTX 1060 6GB
Compute version: 6.1
cudaDeviceProp::totalGlobalMem: 6367739904
[1] GeForce GTX 1050 Ti
Compute version: 6.1
cudaDeviceProp::totalGlobalMem: 4233625600
[2] GeForce GTX 1060 6GB
Compute version: 6.1
cudaDeviceProp::totalGlobalMem: 6367739904
[3] GeForce GTX 1060 6GB
Compute version: 6.1
cudaDeviceProp::totalGlobalMem: 6367739904

Running this command, only the Nvidia (cuda) devices operate normally, the AMDs stay at 0.0 Mh/s.
./ethminer --farm-recheck 200 -HWMON -X -F http://127.0.0.1:8080/Grig1 --cuda-devices 0 1 2 3 --opencl-devices 3 --opencl-platform 1 --opencl-device 0 1 2

Running separate ethminer processes with respective -G and -U work fine
./ethminer --farm-recheck 200 -HWMON -G -F http://127.0.0.1:8080/Grig2 --opencl-platform 1
./ethminer --farm-recheck 200 -HWMON -U -F http://127.0.0.1:8080/Grig1 --cuda-devices 0 1 2 3

Surprisingly, with this command, all the cuda devices and the single referenced AMD opencl device work fine. But adding 1 or 2 more AMD gpus, the hashrate stays at 0.0 Mh/s for the AMD gpus.
./ethminer --farm-recheck 200 -HWMON -X -F http://127.0.0.1:8080/Grig1 --cuda-devices 0 1 2 3 --opencl-platform 1 --opencl-device 0

@Davesmacer
Copy link
Contributor

@jean-m-cyr I think I found the issue here! I will work on solution later this night if you don't reply and tell me otherwise, but I'm thinking is about GPU index number used in mixed mode.

Thing is, On Farm::start() method (Farm.h line 115), when in mixed mode, it initializes the CL sealers giving it the next index after the NVIDIA gpu's (for example, if I have 9 NVIDIA's, the index given to the create sealer method of first AMD card is 9). Everything seemed ok that far. Later, when I reviewed the CLMiner::init() method (CLMiner.cpp line 505), I figured out that lines:

// get GPU device of the default platform

vectorcl::Device devices = getDevices(platforms, platformIdx);
if (devices.empty())
{
ETHCL_LOG("No OpenCL devices found.");
return false;
}

// use selected device
unsigned deviceId = s_devices[index] > -1 ? s_devices[index] : index;
m_hwmoninfo.deviceIndex = deviceId;
cl::Device& device = devices[min(deviceId, devices.size() - 1)];

Do something that might turn out in my error. As I specified --opencl-platform, getDevices returns only my 2 AMD cards, which means that at that point devices.size() is 2. Knowing that index attribute of the class is greater than 2 (Initialization of OpenCL cards' index start by last Nvidia's index), the line:

cl::Device& device = devices[min<unsigned>(deviceId, devices.size() - 1)];

always gives me the last AMD card. That might explain why when I run -X option, the first AMD card connected just won't run, and the two AMD cards that appear in the program hashrate are actually the same, thus giving half of the hashrate for each of the workers.

@Davesmacer
Copy link
Contributor

As I figured, a simple line change in

cl::Device& device = devices[min<unsigned>(deviceId, devices.size() - 1)];

to
cl::Device& device = devices[deviceId % devices.size()];

Fixed the main issue!! lml

However, hwinfo monitors got broken after the first couple of progress outputs. It keeps mining, but it gives 0 to all hwinfo monitors, see picture below (gpu/7 and gpu/8 are the AMD):

image

I guess a similar fix must be set to hwinfo reading somewhere... Anyway, I'm creating a new branch to make the pull request when I get those monitors working

@smurfy
Copy link
Collaborator

smurfy commented Feb 19, 2018

the monitors is probably a wrong index problem, but great find to get it working

@nachitox
Copy link
Author

Don't let the pull request die. A lot of people is waiting for this fix

@Davesmacer
Copy link
Contributor

It won't die. Is currently working on Windows, We are fixing some bugs to show monitors correctly in all the cases, but I think it should work on linux too (except for monitors)

@smurfy
Copy link
Collaborator

smurfy commented Feb 27, 2018

@nachitox please test the latest version of the PR. It hopefully works. I can't test on mixed setup

@unknown2this
Copy link

I built a linux version of the ethminer a couple of days ago and can confirm the -X works with mixed cards. I'm also using these commands to designate 3 AMD and 4 Nvidia cards: --opencl-platform 1 --opencl-devices 0 1 2 --cuda-devices 0 1 2 3.

The only issue is that when no matter what single opencl-devices I select, it keeps picking the same card. (--opencl-devices 0 or --opencl-devices 1 or --opencl-devices 2 all choose the same card).

@nachitox
Copy link
Author

./ethminer -F http://eth-us.dwarfpool.com:80/WALLET --farm-recheck 200 -X

captura de pantalla de 2018-02-26 23-32-28

Looks like it works fine with 1 NVIDIA + 3 AMD cards. The hashrate seems fine and all cards are working and memory is properly set.

captura de pantalla de 2018-02-27 00-06-06

But -HWMON is not working. gpu/2 and gpu/3 show the same temp and fan percentage.
also, -HWMON 1 does not show the AMD power usage (works on NVIDIA cards)

@smurfy
Copy link
Collaborator

smurfy commented Feb 27, 2018

@nachitox thanks for testing

gpu/2 and gpu/3 show the same temp and fan percentage.

i think i know what the problem is. i will do some tests over the next couple of days.

does not show the AMD power usage (works on NVIDIA cards)

does ethminer run as root? Power needs root access on linux.

@smurfy
Copy link
Collaborator

smurfy commented Feb 27, 2018

Ok, pushed a new version it should fix displaying the wrong temp / fans.
I tested as root and i see AMD Power usage.

@Davesmacer
Copy link
Contributor

Davesmacer commented Feb 28, 2018

Last commit on PR works perfect for my setup - 7 Nvidia's and 2 AMD's on Windows

@AndreaLanfranchi
Copy link
Collaborator

Resolved by #1704

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants