More OSX glue #10

kth5 · 2017-03-21T08:53:38Z

So I was trying to build the miner on macOS Sierra with both Xcode 8.1 and a homebrewed GCC 6.3 and failed. This PR includes - what I think are - fixes to get it to build properly.

I have tested on a MBP Mid-2015 with an R9 M370X that has 2GB dedicated VRAM but still can't get the miner to start for it fails to build the CL kernel it seems.

With 8 / 1000 I get:

Error CL_INVALID_BUFFER_SIZE when calling clCreateBuffer to create hash scratchpads buffer.

With 8/ 256 I get:

Error CL_BUILD_PROGRAM_FAILURE when calling clBuildProgram.

Then again, maybe this PR helps someone to debug those issues. Let me know if there's anything I can try. Can potentially also give temporary access to a Mac with AMD graphics for debugging.

fireice-uk · 2017-03-21T12:13:05Z

There should be build log error message.

kth5 · 2017-03-21T15:06:32Z

This is unfortunately all I get with verbosity on 4:

My config:

"gpu_thread_num" : 1,
"gpu_threads_conf" : [
{ "index" : 0, "intensity" : 256, "worksize" : 8, "affine_to_cpu" : false }
],
"platform_index" : 0,
"use_tls" : false,
"tls_secure_algo" : true,
"tls_fingerprint" : "",
"pool_address" : "some.stratum.pool:1234",
"wallet_address" : "",
"pool_password" : "x",
"call_timeout" : 10,
"retry_time" : 10,
"giveup_limit" : 0,
"verbose_level" : 4,
"h_print_time" : 60,
"output_file" : "",
"httpd_port" : 0,
"prefer_ipv4" : true,

If there's another quick way to increase verbosity I'm not currently aware but I'm already diving into more detail.

fireice-uk · 2017-03-21T18:42:19Z

Hi, nice to see someone prepared to fix a PR 👍 I pushed an update - there was a 1024 character limit that was in the way of printing a build log. If you rebase you should be able to see what the error is (in all likelihood something caused by the old gfx card, which will be probably unfixable unfortunately)

There are other issues that would need changing before I can just merge it (rather than reworking it)

config.txt needs to stay tracked, otherwise people will need to write their own
I'm fairly sure CL_INVALID_PIPE_SIZE and CL_INVALID_DEVICE_QUEUE will exist in the right opencl version on MacOS too. Is CL_VERSION_2_0 defined in your headers?

kth5 · 2017-03-22T01:32:55Z

Thanks for the log fix, it's now handing out quite a bit more information. :)

log.txt

Regarding the missing definitions, CL_VERSION_2_0 is undefined on macOS 10.12 with Xcode 8.1. It seems only up to 1.2 is supported which may well be our culprit.

kth5 · 2017-03-22T02:05:26Z

So I dug around a bit and couldn't find amd_bitalign or the cl_amd_media_ops2 extension on OSX, which is used in wolf-skein.cl:33 and only part of a more recent version of AMD's APP SDK (3.0 on Linux here has it) which coincidentally support OpenCL 2.0 as well.

Thus, unless someone feels inclined to port the kernels to OSX OpenCL 1.2 - which I think is unreasonable to ask for - then this PR is only of questionable use. Then again, I could be wrong and the fix is easy. Just haven't wrapped my head around it yet.

fireice-uk · 2017-03-22T02:32:10Z

I will be very busy tomorrow, but I should have some time to look into it during the weekend.

kth5 · 2017-03-22T07:06:26Z

That would be amazing indeed! Just let me know if you can use me to get the necessary information. Until then I'll study up on the OpenCL parts and see if I can tweak a fix in myself. :)

fireice-uk · 2017-03-26T18:28:55Z

Just an update, it is possible to tweak it to not use the semi-new amd extensions, however I'm fairly short on time right now. If you want to play around with it - you are more than welcome to.

You will need to implement two functions used from the extensions amd_bitalign and amd_bfe

kth5 · 2017-03-27T01:35:01Z

Already been looking into it. I'm not very proficient with OpenCL so we'll see how far I'll get. Also working with time constraints here. ;)

kth5 · 2017-03-29T03:58:18Z

Using the reference docs I tried to wrap my head around amd_bfe after finding a neat example for amd_bitalign. I haven't been able to verify it works for there are a few remaining errors during OpenCL kernel compilation. Attaching log:

log.txt

kth5 · 2017-03-29T04:05:13Z

Last two commits got the kernel to compile, guess I had the return type wrong. Anyway, now the CL kernel builds but during execution throws the following:

Will investigate this further in a while, it may still be a problem caused by the two emulated functions introduced in this PR.

fireice-uk · 2017-03-29T15:02:23Z

It is more likely to be a problem with the worksize - have you tried setting it to 1? If that doesn't work try setting the second argument on reqd_work_group_size attributes to 1 too.

fireice-uk · 2017-03-29T15:02:35Z

Also - great job!

kth5 · 2017-03-30T03:54:09Z

So with a worksize of 1 and an intensity of 40 I seem to be hitting the sweet spot on this R9 M370X (2GB). 25H/s... well, not worth it but it seems to be hashing away. I'll leave the miner running for a few hours to verify correctness.

EDIT:
R9 M395 4GB gets a 60H/s out of worksize 1 / intensity 100. I'll let both run for a while. Hasn't found any shares yet so can't say still if good.

kth5 · 2017-03-30T04:11:30Z

So, getting a GPU compute error whenever a share is found (?). Other than that it seems to be happily hashing and detecting new blocks. Not very useful yet indeed.

I'm betting it's the implementation of amd_bfe of mine which is amateurish at best. I'll have another shot at it when I find more time.

fireice-uk · 2017-03-30T09:06:39Z

That would cause it. Shares are double checked on CPU, if your algo is wrong then all of them will be incorrect.

fireice-uk · 2017-03-30T09:08:29Z

Another thing - nicehash has special nonce requirements. Double check on a kosher pool.

kth5 · 2017-03-30T14:29:29Z

Alright, here's the sweet taste of success (for me):

60H/s are really disappointing though. On windows with the same R9 M395X (iMac 27", Late 2015) I get roughly 450H/s with Claymore. I'd like to think the emulation is still too inefficient hoping it's not just Apple's stuff being crazy Photoshop and Premiere centrist when it comes to OpenCL.

@fireice-uk Anyway, this basically works on macOS 10.12.4 albeit slow. So this could be merged if you are happy with the way it changes things.

fireice-uk · 2017-03-30T18:59:37Z

If you want to play around some more I don't mind guiding you. This is the line where the performance is breaking:
https://github.com/fireice-uk/xmr-stak-amd/blob/master/opencl/wolf-aes.cl#L74

Try doing it in other way such as an and mask and shift.
For reference this is a CPU version using this technique:

https://github.com/fireice-uk/xmr-stak-cpu/blob/master/crypto/soft_aes.c#L163

`global_worker_size` needs to be a multiple of `local_worker_size`, this is not always the case in the current code. Remove the bit magic with a easy to read `ceil`.

psychocrypt · 2017-04-03T19:53:04Z

amd_gpu/gpu.c

@@ -163,10 +163,12 @@ const char* err_to_str(cl_int ret)
 		return "CL_INVALID_LINKER_OPTIONS";
 	case CL_INVALID_DEVICE_PARTITION_COUNT:
 		return "CL_INVALID_DEVICE_PARTITION_COUNT";
+#ifndef __APPLE__


This guard is not only a fix for APPLE it fixes also ubuntu 14.04. My suggestion is to check the AMD APP SDK VERSION and disable this code for old APP SDK's

We'd need to do both as there is no direct support of AMD for Apple's OS. We need to deal with Apple's SDK which seems to be an entirely different thing.

I'll investigate checking for AMD APPSDK version for you on one of my Ubuntu boxes.

EDIT:
Does not seem to be necessary to fix for Ubuntu. Seems with regular AMD APPSDK 3.0 and an extra -I in C/CXXFLAGS it just builds and works like a charm. Maybe you accidentally mixed in opencl-headers etc. due to cmake not detecting an SDK if not installed in /usr.

I fixed it in #15 because this is an issue with OpenCl < 2.0 and independent of the APPSDK.

Ahh, makes sense now. Should've taken the cucumber slices off my eyes first. :)

I'll check whether the OpenCL version check is enough already on Apple too. Since my PR isn't merged yet, I'm assuming fireice-uk agrees performance is too abysmal. Thus I hope yours gets merged first so I can pull that change in.

Unfortunately I won't be able to spend too much time on this in the near future so the PR may end up pulling master a few times anyway.

guard unknown defined from OpenCl 2.X

- allow compiling without microhttpd - allow compiling without OpenSSL - use find opencl to provide platform independence - install `config.txt` with command `make install` - copy opencl folder on `make install` to the install directory

psychocrypt · 2017-04-07T09:52:30Z

Could you try to build this branch on top of #13, #15 and #16

I fixed a bug in #13 and refactor cmake in #16 thus all should be more platform independent.

…sychocrypt/xmr-stak-amd

…-stak-amd

psychocrypt · 2017-04-07T10:04:21Z

I read that you have not time for a rebase. I will put all your changes on top of my after they are merged, after that you could test if it is compiling on MAC.

kth5 · 2017-04-07T10:08:05Z

Merged your changes and it still builds like a charm. I removed Apple specifics where applicable too. Performs as before.

psychocrypt · 2017-04-07T10:19:33Z

@fireice-uk Opened a dev branch to simplify releases and and support to test non stable changes, I will create a new pull request (in your name) to the dev on top of all my changes, because this pull also contains some config changes.

thx for testing!!

psychocrypt · 2017-04-07T10:20:34Z

config.txt

-	{ "index" : 3, "intensity" : 1000, "worksize" : 8, "affine_to_cpu" : false },
-	{ "index" : 4, "intensity" : 1000, "worksize" : 8, "affine_to_cpu" : false },
-	{ "index" : 5, "intensity" : 1000, "worksize" : 8, "affine_to_cpu" : false },
+	{ "index" : 0, "intensity" : 16, "worksize" : 1, "affine_to_cpu" : false }


btw With my changes it should now possible to change the worksize 1 to 8

psychocrypt · 2017-06-02T21:32:14Z

opencl/cryptonight.cl

+//    Description
+//      dst.s0 =  (uint) (((((long)src0.s0) << 32) | (long)src1.s0) >> (src2.s0 & 31))
+//      similar operation applied to other components of the vectors.
+#define amd_bitalign(src0, src1, src2) (src0  << (32-src2)) | (src1 >> src2)


BUG: the parentheses are wrong placed and missing cast to 64 bit: this is solve din #17

psychocrypt · 2017-06-02T21:37:09Z

I will close this PR all needed changes are now in #17 and the BUGs are fixed.

kth5 added 4 commits March 21, 2017 16:34

add config.txt to ignore list

dd9bd39

fix includes on OSX and exclude two error types on compile time

c16f47a

fix Clang error on osx for uint64 used as int by cast

2a08107

handle osx/Clang specifics in CmakeList

618c1de

remove stray char

09dbddd

kth5 added 2 commits March 22, 2017 09:28

remote config.txt from ignore again

f6912ad

Merge branch 'master' of https://github.com/fireice-uk/xmr-stak-amd

5f2e00c

kth5 added 3 commits March 29, 2017 11:01

emulate amd_bitalign in case extension isn't available

2836da6

add note where the code came from

bafe5a9

attempt to emulate amd_{bitalign,bfe} from cl_amd_media_ops(2)

9dca57e

kth5 added 2 commits March 29, 2017 12:00

no need for an else case

1231872

fix return type of emulated amd_bfe

23720c1

fix up amd_bfe emulation

8543c3b

fix condition

d676a22

`global_worker_size` needs to be a multiple of `local_worker_size`, this is not always the case in the current code. Remove the bit magic with a easy to read `ceil`.

psychocrypt reviewed Apr 3, 2017

View reviewed changes

psychocrypt mentioned this pull request Apr 3, 2017

fix global_worker_size calculation #13

Merged

1 task

add OpenCl 1.2 support

a111e01

guard unknown defined from OpenCl 2.X

psychocrypt mentioned this pull request Apr 7, 2017

add OpenCl 1.2 support #15

Merged

refactor cmake

cd422eb

- allow compiling without microhttpd - allow compiling without OpenSSL - use find opencl to provide platform independence - install `config.txt` with command `make install` - copy opencl folder on `make install` to the install directory

kth5 added 2 commits April 7, 2017 18:01

Merge branch 'fix-kernelStartRestrictionNotFulfilled' of github.com:p…

cd7c258

…sychocrypt/xmr-stak-amd

Merge branch 'topic-openCL_1.2_support' of github.com:psychocrypt/xmr…

812f99f

…-stak-amd

kth5 added 2 commits April 7, 2017 18:06

Merge branch 'topic-cmake' of github.com:psychocrypt/xmr-stak-amd

bc3fafa

Merge branch 'master' of github.com:kth5/xmr-stak-amd

08f31d3

psychocrypt reviewed Apr 7, 2017

View reviewed changes

psychocrypt mentioned this pull request Apr 8, 2017

Mac support #17

Merged

4 tasks

psychocrypt reviewed Jun 2, 2017

View reviewed changes

psychocrypt closed this Jun 2, 2017

lamba84 mentioned this pull request Oct 22, 2017

Error UNKNOWN_ERROR when calling clGetPlatformIDs for number of platforms. #115

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More OSX glue #10

More OSX glue #10

kth5 commented Mar 21, 2017

fireice-uk commented Mar 21, 2017

kth5 commented Mar 21, 2017 •

edited

Loading

fireice-uk commented Mar 21, 2017 •

edited

Loading

kth5 commented Mar 22, 2017 •

edited

Loading

kth5 commented Mar 22, 2017

fireice-uk commented Mar 22, 2017

kth5 commented Mar 22, 2017

fireice-uk commented Mar 26, 2017

kth5 commented Mar 27, 2017

kth5 commented Mar 29, 2017 •

edited

Loading

kth5 commented Mar 29, 2017

fireice-uk commented Mar 29, 2017

fireice-uk commented Mar 29, 2017

kth5 commented Mar 30, 2017 •

edited

Loading

kth5 commented Mar 30, 2017

fireice-uk commented Mar 30, 2017

fireice-uk commented Mar 30, 2017

kth5 commented Mar 30, 2017 •

edited

Loading

fireice-uk commented Mar 30, 2017 •

edited

Loading

psychocrypt Apr 3, 2017

kth5 Apr 5, 2017 •

edited

Loading

psychocrypt Apr 7, 2017

kth5 Apr 7, 2017

kth5 Apr 7, 2017

psychocrypt commented Apr 7, 2017

psychocrypt commented Apr 7, 2017

kth5 commented Apr 7, 2017

psychocrypt commented Apr 7, 2017

psychocrypt Apr 7, 2017 •

edited

Loading

psychocrypt Jun 2, 2017

psychocrypt commented Jun 2, 2017

More OSX glue #10

More OSX glue #10

Conversation

kth5 commented Mar 21, 2017

fireice-uk commented Mar 21, 2017

kth5 commented Mar 21, 2017 • edited Loading

fireice-uk commented Mar 21, 2017 • edited Loading

kth5 commented Mar 22, 2017 • edited Loading

kth5 commented Mar 22, 2017

fireice-uk commented Mar 22, 2017

kth5 commented Mar 22, 2017

fireice-uk commented Mar 26, 2017

kth5 commented Mar 27, 2017

kth5 commented Mar 29, 2017 • edited Loading

kth5 commented Mar 29, 2017

fireice-uk commented Mar 29, 2017

fireice-uk commented Mar 29, 2017

kth5 commented Mar 30, 2017 • edited Loading

kth5 commented Mar 30, 2017

fireice-uk commented Mar 30, 2017

fireice-uk commented Mar 30, 2017

kth5 commented Mar 30, 2017 • edited Loading

fireice-uk commented Mar 30, 2017 • edited Loading

psychocrypt Apr 3, 2017

Choose a reason for hiding this comment

kth5 Apr 5, 2017 • edited Loading

Choose a reason for hiding this comment

psychocrypt Apr 7, 2017

Choose a reason for hiding this comment

kth5 Apr 7, 2017

Choose a reason for hiding this comment

kth5 Apr 7, 2017

Choose a reason for hiding this comment

psychocrypt commented Apr 7, 2017

psychocrypt commented Apr 7, 2017

kth5 commented Apr 7, 2017

psychocrypt commented Apr 7, 2017

psychocrypt Apr 7, 2017 • edited Loading

Choose a reason for hiding this comment

psychocrypt Jun 2, 2017

Choose a reason for hiding this comment

psychocrypt commented Jun 2, 2017

kth5 commented Mar 21, 2017 •

edited

Loading

fireice-uk commented Mar 21, 2017 •

edited

Loading

kth5 commented Mar 22, 2017 •

edited

Loading

kth5 commented Mar 29, 2017 •

edited

Loading

kth5 commented Mar 30, 2017 •

edited

Loading

kth5 commented Mar 30, 2017 •

edited

Loading

fireice-uk commented Mar 30, 2017 •

edited

Loading

kth5 Apr 5, 2017 •

edited

Loading

psychocrypt Apr 7, 2017 •

edited

Loading