-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More OSX glue #10
More OSX glue #10
Conversation
There should be build log error message. |
Hi, nice to see someone prepared to fix a PR 👍 I pushed an update - there was a 1024 character limit that was in the way of printing a build log. If you rebase you should be able to see what the error is (in all likelihood something caused by the old gfx card, which will be probably unfixable unfortunately) There are other issues that would need changing before I can just merge it (rather than reworking it)
|
Thanks for the log fix, it's now handing out quite a bit more information. :) Regarding the missing definitions, CL_VERSION_2_0 is undefined on macOS 10.12 with Xcode 8.1. It seems only up to 1.2 is supported which may well be our culprit. |
So I dug around a bit and couldn't find amd_bitalign or the cl_amd_media_ops2 extension on OSX, which is used in wolf-skein.cl:33 and only part of a more recent version of AMD's APP SDK (3.0 on Linux here has it) which coincidentally support OpenCL 2.0 as well. Thus, unless someone feels inclined to port the kernels to OSX OpenCL 1.2 - which I think is unreasonable to ask for - then this PR is only of questionable use. Then again, I could be wrong and the fix is easy. Just haven't wrapped my head around it yet. |
I will be very busy tomorrow, but I should have some time to look into it during the weekend. |
That would be amazing indeed! Just let me know if you can use me to get the necessary information. Until then I'll study up on the OpenCL parts and see if I can tweak a fix in myself. :) |
Just an update, it is possible to tweak it to not use the semi-new amd extensions, however I'm fairly short on time right now. If you want to play around with it - you are more than welcome to. You will need to implement two functions used from the extensions amd_bitalign and amd_bfe |
Already been looking into it. I'm not very proficient with OpenCL so we'll see how far I'll get. Also working with time constraints here. ;) |
Using the reference docs I tried to wrap my head around amd_bfe after finding a neat example for amd_bitalign. I haven't been able to verify it works for there are a few remaining errors during OpenCL kernel compilation. Attaching log: |
It is more likely to be a problem with the worksize - have you tried setting it to 1? If that doesn't work try setting the second argument on reqd_work_group_size attributes to 1 too. |
Also - great job! |
So with a worksize of 1 and an intensity of 40 I seem to be hitting the sweet spot on this R9 M370X (2GB). 25H/s... well, not worth it but it seems to be hashing away. I'll leave the miner running for a few hours to verify correctness. EDIT: |
So, getting a GPU compute error whenever a share is found (?). Other than that it seems to be happily hashing and detecting new blocks. Not very useful yet indeed. I'm betting it's the implementation of amd_bfe of mine which is amateurish at best. I'll have another shot at it when I find more time. |
That would cause it. Shares are double checked on CPU, if your algo is wrong then all of them will be incorrect. |
Another thing - nicehash has special nonce requirements. Double check on a kosher pool. |
Alright, here's the sweet taste of success (for me): 60H/s are really disappointing though. On windows with the same R9 M395X (iMac 27", Late 2015) I get roughly 450H/s with Claymore. I'd like to think the emulation is still too inefficient hoping it's not just Apple's stuff being crazy Photoshop and Premiere centrist when it comes to OpenCL. @fireice-uk Anyway, this basically works on macOS 10.12.4 albeit slow. So this could be merged if you are happy with the way it changes things. |
If you want to play around some more I don't mind guiding you. This is the line where the performance is breaking: Try doing it in other way such as an and mask and shift. https://github.com/fireice-uk/xmr-stak-cpu/blob/master/crypto/soft_aes.c#L163 |
`global_worker_size` needs to be a multiple of `local_worker_size`, this is not always the case in the current code. Remove the bit magic with a easy to read `ceil`.
amd_gpu/gpu.c
Outdated
@@ -163,10 +163,12 @@ const char* err_to_str(cl_int ret) | |||
return "CL_INVALID_LINKER_OPTIONS"; | |||
case CL_INVALID_DEVICE_PARTITION_COUNT: | |||
return "CL_INVALID_DEVICE_PARTITION_COUNT"; | |||
#ifndef __APPLE__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This guard is not only a fix for APPLE it fixes also ubuntu 14.04
. My suggestion is to check the AMD APP SDK VERSION and disable this code for old APP SDK's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need to do both as there is no direct support of AMD for Apple's OS. We need to deal with Apple's SDK which seems to be an entirely different thing.
I'll investigate checking for AMD APPSDK version for you on one of my Ubuntu boxes.
EDIT:
Does not seem to be necessary to fix for Ubuntu. Seems with regular AMD APPSDK 3.0 and an extra -I in C/CXXFLAGS it just builds and works like a charm. Maybe you accidentally mixed in opencl-headers etc. due to cmake not detecting an SDK if not installed in /usr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed it in #15 because this is an issue with OpenCl < 2.0 and independent of the APPSDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, makes sense now. Should've taken the cucumber slices off my eyes first. :)
I'll check whether the OpenCL version check is enough already on Apple too. Since my PR isn't merged yet, I'm assuming fireice-uk agrees performance is too abysmal. Thus I hope yours gets merged first so I can pull that change in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I won't be able to spend too much time on this in the near future so the PR may end up pulling master a few times anyway.
guard unknown defined from OpenCl 2.X
- allow compiling without microhttpd - allow compiling without OpenSSL - use find opencl to provide platform independence - install `config.txt` with command `make install` - copy opencl folder on `make install` to the install directory
…sychocrypt/xmr-stak-amd
I read that you have not time for a rebase. I will put all your changes on top of my after they are merged, after that you could test if it is compiling on MAC. |
Merged your changes and it still builds like a charm. I removed Apple specifics where applicable too. Performs as before. |
@fireice-uk Opened a dev branch to simplify releases and and support to test non stable changes, I will create a new pull request (in your name) to the dev on top of all my changes, because this pull also contains some config changes. thx for testing!! |
{ "index" : 3, "intensity" : 1000, "worksize" : 8, "affine_to_cpu" : false }, | ||
{ "index" : 4, "intensity" : 1000, "worksize" : 8, "affine_to_cpu" : false }, | ||
{ "index" : 5, "intensity" : 1000, "worksize" : 8, "affine_to_cpu" : false }, | ||
{ "index" : 0, "intensity" : 16, "worksize" : 1, "affine_to_cpu" : false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw With my changes it should now possible to change the worksize 1
to 8
// Description | ||
// dst.s0 = (uint) (((((long)src0.s0) << 32) | (long)src1.s0) >> (src2.s0 & 31)) | ||
// similar operation applied to other components of the vectors. | ||
#define amd_bitalign(src0, src1, src2) (src0 << (32-src2)) | (src1 >> src2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BUG: the parentheses are wrong placed and missing cast to 64 bit: this is solve din #17
I will close this PR all needed changes are now in #17 and the BUGs are fixed. |
So I was trying to build the miner on macOS Sierra with both Xcode 8.1 and a homebrewed GCC 6.3 and failed. This PR includes - what I think are - fixes to get it to build properly.
I have tested on a MBP Mid-2015 with an R9 M370X that has 2GB dedicated VRAM but still can't get the miner to start for it fails to build the CL kernel it seems.
With 8 / 1000 I get:
With 8/ 256 I get:
Then again, maybe this PR helps someone to debug those issues. Let me know if there's anything I can try. Can potentially also give temporary access to a Mac with AMD graphics for debugging.