-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Conversation
Hey @smurfy, do you know if adl overdrive API and sysfs use the same index to poll devices? I'm guessing ADL API start first AMD with 0 index and ignores the NVIDIA, but I don't think sysfs is the same. I just don't know how sysfs polling works... Question is to see how I should tackle the hwinfo fix, by saving right index or by changing it before different API calls... |
Both go 0 to n of the same type. so if you have 5 nvidia and 3 amd you have: sysfs then maps to all amd cards in sysfs. but the interface uses indexes starting at 0 even if you have nvidia cards which are for example card0-card4 in the file system. then 0 sysfs internally map to card5 Adl should be the same, only that adl returns a bunch of logical cards and which then gets iterated to get the actual physical cards. |
Thank you, I think I got it... |
It is showing correctly the monitors with the last commit, but they seem to be in different order. Monitor for card 0 shows in card 1. I've been working on that, but couldn't manage to solve it yet. @smurfy any ideas? Maybe ADL uses different enumeration or something... Edit: I figured with an old image that the issue was there before with -G option, I just didn't pay attention. I'm trying to figure it out anyway |
@Davesmacer Just saw this go by in gitter, may be of interrest: "to be able to match sysfs first you have to match opencl device to pci device |
I doing the mapping to PCI actually! But @smurfy told me he had already the mapping for sysfs. I'm trying to figure that out to make a correct mapping for ADL... It is very simple to save PCI id given the card index (0...n), but then, mapping that to ADL format is the tricky part. I'll check on gitter to see if something of help comes out... |
i have method to get pci:id on sysfs :) we need a method to get pci:id from opencl |
the mapping need to be done in all 3 implementations. |
oh i see! and you also have that method in nvml right? I'll do it in all 3 wrappers then... |
yes, nvml has a mapping between nvml device id + cuda device id based on pci-id. |
commited and pushed pci-id fetching |
Awesome @smurfy!!! I'll do the matching in the wrappers so we have a common ID for the program. Should be ready at night. Thank you |
All right guys! It is working for ADL on Windows now, I just feel like it can be improved somehow before I pass the same code to the other wrappers to map OpenCL ID... I believe it got a little messy when I had to make calls to OpenCL from the wrapper. I guess it would be nicer to write a wrapper method in cl2 or something... |
I'm checking the tests... I'll commit when fixed. Don't review yet haha |
All right! I got it working for ADL, It is really simple to pass the mapping to the other wrappers (nvml and sysfs), but I wanted you to check if it can be improved before passing the code to the other wrappers... I tried fixing warnings on Linux, but the "casting between pointer-to-function and pointer-to-object" warning is practically a compiler issue... I couldn't find any workaroud to solve it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awsome Job so far,
In my opinion the mapping array gathering should be in the wrappers, like with cuda->nvml. So putting it inside the ADL file is good.
As mentioned in my comments, the actual use of the mapping should be done in the CALLER, not in the wrapper itself. all wrapper functions should use its own id, because for example nvml need to support opencl and cuda.
libhwmon/wrapadl.cpp
Outdated
@@ -180,7 +245,8 @@ int wrap_adl_get_tempC(wrap_adl_handle *adlh, int gpuindex, unsigned int *tempC) | |||
return -1; | |||
|
|||
ADLTemperature *temperature = new ADLTemperature(); | |||
rc = adlh->adlOverdrive5TemperatureGet(adlh->phys_logi_device_id[gpuindex], 0, temperature); | |||
int adlidx = adlh->opencl_adl_device_id[gpuindex]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpuindex should be always adl id, the caller should use the mapping array if needed
libhwmon/wrapadl.cpp
Outdated
@@ -197,7 +263,8 @@ int wrap_adl_get_fanpcnt(wrap_adl_handle *adlh, int gpuindex, unsigned int *fanp | |||
|
|||
ADLFanSpeedValue *fan = new ADLFanSpeedValue(); | |||
fan->iSpeedType = 1; | |||
rc = adlh->adlOverdrive5FanSpeedGet(adlh->phys_logi_device_id[gpuindex], 0, fan); | |||
int adlidx = adlh->opencl_adl_device_id[gpuindex]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpuindex should be always adl id, the caller should use the mapping array if needed
libhwmon/wrapadl.cpp
Outdated
@@ -213,7 +280,8 @@ int wrap_adl_get_power_usage(wrap_adl_handle *adlh, int gpuindex, unsigned int* | |||
return -1; | |||
} | |||
int power = 0; | |||
rc = adlh->adl2Overdrive6CurrentPowerGet(adlh->context, adlh->phys_logi_device_id[gpuindex], 0, &power); | |||
int adlidx = adlh->opencl_adl_device_id[gpuindex]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpuindex should be always adl id, the caller should use the mapping array if needed
libethcore/Farm.h
Outdated
@@ -263,26 +273,29 @@ class Farm: public FarmFace | |||
unsigned int tempC = 0, fanpcnt = 0, powerW = 0; | |||
if (hwInfo.deviceIndex >= 0) { | |||
if (hwInfo.deviceType == HwMonitorInfoType::NVIDIA && nvmlh) { | |||
wrap_nvml_get_tempC(nvmlh, hwInfo.deviceIndex, &tempC); | |||
wrap_nvml_get_fanpcnt(nvmlh, hwInfo.deviceIndex, &fanpcnt); | |||
int typeidx = hwInfo.deviceIndex % m_cuda_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the mapping here ocl->nvml, but we need a additional field in hwInfo, if the deviceIndex source is CUDA or OpenCL. In case of cuda i would also use the cuda->nvml mapping to be on the safe side.
libethcore/Farm.h
Outdated
} | ||
} | ||
else if (hwInfo.deviceType == HwMonitorInfoType::AMD && adlh) { | ||
wrap_adl_get_tempC(adlh, hwInfo.deviceIndex, &tempC); | ||
wrap_adl_get_fanpcnt(adlh, hwInfo.deviceIndex, &fanpcnt); | ||
int typeidx = hwInfo.deviceIndex % m_opencl_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do the ocl->adl mapping here
libethcore/Farm.h
Outdated
} | ||
} | ||
#if defined(__linux) | ||
// Overwrite with sysfs data if present | ||
if (hwInfo.deviceType == HwMonitorInfoType::AMD && sysfsh) { | ||
wrap_amdsysfs_get_tempC(sysfsh, hwInfo.deviceIndex, &tempC); | ||
wrap_amdsysfs_get_fanpcnt(sysfsh, hwInfo.deviceIndex, &fanpcnt); | ||
int typeidx = hwInfo.deviceIndex % m_opencl_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do the ocl->sysfs mapping here
Got it!! I'll be making the changes and replying the ADL mapping to NVML and SYSFS wrappers later this afternoon. |
All right!! @smurfy ADL mapping with OpenCL is working with the proper changes... Please review so I can pass the code to all the wrappers... Should be very easy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrappers look good, and the current OPENCL AMD Stuff should work with sysfs.
For nvidia we need the additional fetching of pci-id via opencl for that.
Also i hope i made clear the reasoning behind the need of physical/logical GPU stuff in stupid ADL ^^
libhwmon/wrapadl.cpp
Outdated
@@ -192,12 +255,12 @@ int wrap_adl_get_tempC(wrap_adl_handle *adlh, int gpuindex, unsigned int *tempC) | |||
int wrap_adl_get_fanpcnt(wrap_adl_handle *adlh, int gpuindex, unsigned int *fanpcnt) | |||
{ | |||
wrap_adlReturn_t rc; | |||
if (gpuindex < 0 || gpuindex >= adlh->adl_gpucount) | |||
if (gpuindex < 0 || gpuindex >= adlh->log_gpucount){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use adl_gpucount not log_gpucount. the adl count contain only physical devices and not all logical devices which can be a LOT. All other wrappers use the physical index so if you have 5 cards index 0-4 works.
If we use logical 0-50 or so would be valid indexes, but get the same values till we are on the next physical device. like 0-7 or so is first card...8-14 the next.
The logical count also varies per physical card. it depends on its connectors and stuff.
libhwmon/wrapadl.cpp
Outdated
ADLFanSpeedValue *fan = new ADLFanSpeedValue(); | ||
fan->iSpeedType = 1; | ||
rc = adlh->adlOverdrive5FanSpeedGet(adlh->phys_logi_device_id[gpuindex], 0, fan); | ||
rc = adlh->adlOverdrive5FanSpeedGet(gpuindex, 0, fan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
phys_logi_device_id maps to the first logical adapter so this needs to be used too
libhwmon/wrapadl.cpp
Outdated
@@ -140,6 +160,33 @@ return NULL; | |||
if (adapterID == last_adapter) { | |||
continue; | |||
} | |||
#if ETH_ETHASHCL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be AFTER the logical to physical loop, not inside of it.
The logical gpus are just an annoyance and we should only use the physical ones.
(i also had a bug in the beginning, because i thought we get the actual physical gpus, but got logical. so i added the mapping after the first implementation)
So you can do a for (unsigned gpuindex =0; gpuindex<adlh->adl_gpucount;gpuindex++)
and use adlh->devs[adlh->phys_logi_device_id[gpuindex]].iBusNumber
libhwmon/wrapadl.cpp
Outdated
return -1; | ||
} | ||
int power = 0; | ||
rc = adlh->adl2Overdrive6CurrentPowerGet(adlh->context, adlh->phys_logi_device_id[gpuindex], 0, &power); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, keep the physical mapping in place please
I'm on it @smurfy... I got rid of phys->log mapping because it was getting me the wrong index. I have to see how to make proper mapping between opencl->phys and then phys->log. I guess that doing another loop after the phys->log mapping as you pointed out would solve that. Thank you |
I did the correction on ADL. I'll work on the other wrappers this night! |
libhwmon/wrapadl.cpp
Outdated
adapterIndex, adlh->devs[i].iBusNumber, adlh->devs[i].iDeviceNumber, adlh->devs[i].iFunctionNumber, | ||
j, (int)topology.pcie.bus, (int)topology.pcie.device, (int)topology.pcie.function); | ||
printf("[DEBUG] - ADL GPU[%d]%d,%d,%d matches OpenCL GPU[%d]%d,%d,%d\n", | ||
adapterIndex, adlh->devs[i].iBusNumber, adlh->devs[i].iDeviceNumber, adlh->devs[i].iFunctionNumber, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug uses not phys to logi :)
libhwmon/wrapadl.cpp
Outdated
|
||
int wrap_adl_get_tempC(wrap_adl_handle *adlh, int gpuindex, unsigned int *tempC) | ||
{ | ||
wrap_adlReturn_t rc; | ||
if (gpuindex < 0 || gpuindex >= adlh->adl_gpucount) | ||
if (gpuindex < 0 || gpuindex >= adlh->log_gpucount){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still using log_gpucount instead adl_gpucount here.(and in other functions too)
@smurfy @Davesmacer This is all getting fairly confusing! What is the guiding philosophy now with respect to device indices??? Is there a unique per card index... or two sets of indices for CUDA vs. CL... or, 3 sets by how you retrieve card info? |
its fairly simple:
|
@Davesmacer This is a rather long chain of commits. Do you know about rebase squash, and push --force? |
6325d28
to
ac11b3d
Compare
@jean-m-cyr I finally got some time to rebase last couple of days of failed commits haha. Sorry to reply until now, but I have to keep attending regular work. @smurfy Please check now. I believe all the code is there: |
Why not many will be running mixed rigs in Linux? In my case, I just buy whatever card with good efficiency that comes to my suppliers on a reasonable price. With video cards getting sold out every day, I can't really decide which brand to buy... |
I'll be able to test once the update is merged. |
Oh I get it, I didn't even manage to try configuring both drivers, I switched to Windows after not being able to OC the Nvidia cards... I guess I could, but I don't think that is an ethminer issue. |
Quick steps to configure NVIDIA and AMD on Ubuntu. You can overclock later with nvidia-settings
|
Hello everyone! @nachitox thank you! I'll try to configure both GPU's when I switch to Linux. On the meantime I'm stuck on Winshit. @smurfy the mapping using those undocumented tokens seemed to work ok, it actually mapped devices on the same order that OCL gave them. I don't know if this is only my machine or if they are always in the same order for NVML and OCL. However, when trying to actually mine with OCL and my Nvidia cards, the program stopped abruptly as if some uncaught exception raised after line 690 of CLMiner: ETHCL_LOG("Creating mining buffer"); I reviewed the code and noticed nothing strange. I think the bug is in the lines I added/changed in CLMiner.cpp. I'll dig deeper into this later tonight, but if you have any ideas of what this might be I'll be grateful. |
Sorry, i'm a bit busy at the moment but i will rebase and fix all remaining stuff here. |
e5a9106
to
f5ceb15
Compare
Ok, found some time :) Since i do not have a mixed setup to test with i cant test it.
|
358bee8
to
750f592
Compare
After some thinking while trying to sleep, i think the m_cuda_count/m_opencl_count approach is wrong and does not really work. especially with custom device selection. In my opinion the deviceIndex should be set correctly with the Miner class. I just need to figure out how to set it. I think the correct value is already there somewhere :) Samples of correct values: 4 Nvidia cuda cards: index 0-3 4 nvidia with device selection --cuda-devices 1 3: index 1 3 Especially the last sample will not work with current version. Since i can test this without mixed cards. I will do that tonight. |
I have no read all of the comments, but what do you think about enumerating devices globally, without distinguishing the vendor? |
@chfast well, the enumeration of devices will be done only once per wrapper on miner startup.
Only double code is opencl get devices + count. Its around 10 lines. (1:1 on adl/sysfs, slight modified on nvml) Which can be probably moved to wrapper. everything else is quite wrapper api specific. opencl and cuda fetching is to provide mapping arrays between internal api device index and opencl/cuda device index, which can and is different depending on operating system and possible enabled/disabled devices. |
i think you need perl installed |
Ok, i rebased again and added a fix of the device ids. Tested with AMD on linux only. So hopefully this PR is now complete. I have some HWMON optimisation in my mind but i will do them with a separate PR. |
Works like a charm on Windows 10 with 7 Nvidia and 2 AMD. Also with 18 Nvidia shows correct monitors. It also works specifying devices with --opencl-devices and cuda-devices! You're awesome @smurfy! |
Looks like this might be it! Time for: git rebase -i HEAD~n |
…cted cards. Multiple AMD+NVIDIA HW monitors fix Help corrected in -X option. Some minor bugs fixed. Working on switched monitors values Help minor change Added methods to get pci-id WrapADL is working now and showing correctly the monitors for my mixed rig with two AMD's. Lost include fix Fixing OpenCL index mapping in all the wrappers. Merge from master Farm.h mappings Mapping NVML with OCL Some cleanup and refactoring Using the device index, needs testing
Rebased and squashed, ready to merge in my opinion :) |
I managed to get my hashrate working for all the cards, I'm currently working on fixing hwinfo monitors which are not working for the AMD cards. Please refer to #725 last comments to see the problem and solution.
Please don't close the PR until I fix monitors.