Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More OSX glue #10

Closed
wants to merge 20 commits into from
Closed

More OSX glue #10

wants to merge 20 commits into from

Conversation

kth5
Copy link

@kth5 kth5 commented Mar 21, 2017

So I was trying to build the miner on macOS Sierra with both Xcode 8.1 and a homebrewed GCC 6.3 and failed. This PR includes - what I think are - fixes to get it to build properly.

I have tested on a MBP Mid-2015 with an R9 M370X that has 2GB dedicated VRAM but still can't get the miner to start for it fails to build the CL kernel it seems.

With 8 / 1000 I get:

Error CL_INVALID_BUFFER_SIZE when calling clCreateBuffer to create hash scratchpads buffer.

With 8/ 256 I get:

Error CL_BUILD_PROGRAM_FAILURE when calling clBuildProgram.

Then again, maybe this PR helps someone to debug those issues. Let me know if there's anything I can try. Can potentially also give temporary access to a Mac with AMD graphics for debugging.

@fireice-uk
Copy link
Owner

There should be build log error message.

@kth5
Copy link
Author

kth5 commented Mar 21, 2017

This is unfortunately all I get with verbosity on 4:
image

My config:

"gpu_thread_num" : 1,
"gpu_threads_conf" : [
{ "index" : 0, "intensity" : 256, "worksize" : 8, "affine_to_cpu" : false }
],
"platform_index" : 0,
"use_tls" : false,
"tls_secure_algo" : true,
"tls_fingerprint" : "",
"pool_address" : "some.stratum.pool:1234",
"wallet_address" : "",
"pool_password" : "x",
"call_timeout" : 10,
"retry_time" : 10,
"giveup_limit" : 0,
"verbose_level" : 4,
"h_print_time" : 60,
"output_file" : "",
"httpd_port" : 0,
"prefer_ipv4" : true,

If there's another quick way to increase verbosity I'm not currently aware but I'm already diving into more detail.

@fireice-uk
Copy link
Owner

fireice-uk commented Mar 21, 2017

Hi, nice to see someone prepared to fix a PR 👍 I pushed an update - there was a 1024 character limit that was in the way of printing a build log. If you rebase you should be able to see what the error is (in all likelihood something caused by the old gfx card, which will be probably unfixable unfortunately)

There are other issues that would need changing before I can just merge it (rather than reworking it)

  • config.txt needs to stay tracked, otherwise people will need to write their own
  • I'm fairly sure CL_INVALID_PIPE_SIZE and CL_INVALID_DEVICE_QUEUE will exist in the right opencl version on MacOS too. Is CL_VERSION_2_0 defined in your headers?

@kth5
Copy link
Author

kth5 commented Mar 22, 2017

Thanks for the log fix, it's now handing out quite a bit more information. :)

log.txt

Regarding the missing definitions, CL_VERSION_2_0 is undefined on macOS 10.12 with Xcode 8.1. It seems only up to 1.2 is supported which may well be our culprit.

@kth5
Copy link
Author

kth5 commented Mar 22, 2017

So I dug around a bit and couldn't find amd_bitalign or the cl_amd_media_ops2 extension on OSX, which is used in wolf-skein.cl:33 and only part of a more recent version of AMD's APP SDK (3.0 on Linux here has it) which coincidentally support OpenCL 2.0 as well.

Thus, unless someone feels inclined to port the kernels to OSX OpenCL 1.2 - which I think is unreasonable to ask for - then this PR is only of questionable use. Then again, I could be wrong and the fix is easy. Just haven't wrapped my head around it yet.

@fireice-uk
Copy link
Owner

I will be very busy tomorrow, but I should have some time to look into it during the weekend.

@kth5
Copy link
Author

kth5 commented Mar 22, 2017

That would be amazing indeed! Just let me know if you can use me to get the necessary information. Until then I'll study up on the OpenCL parts and see if I can tweak a fix in myself. :)

@fireice-uk
Copy link
Owner

Just an update, it is possible to tweak it to not use the semi-new amd extensions, however I'm fairly short on time right now. If you want to play around with it - you are more than welcome to.

You will need to implement two functions used from the extensions amd_bitalign and amd_bfe

@kth5
Copy link
Author

kth5 commented Mar 27, 2017

Already been looking into it. I'm not very proficient with OpenCL so we'll see how far I'll get. Also working with time constraints here. ;)

@kth5
Copy link
Author

kth5 commented Mar 29, 2017

Using the reference docs I tried to wrap my head around amd_bfe after finding a neat example for amd_bitalign. I haven't been able to verify it works for there are a few remaining errors during OpenCL kernel compilation. Attaching log:

log.txt

@kth5
Copy link
Author

kth5 commented Mar 29, 2017

Last two commits got the kernel to compile, guess I had the return type wrong. Anyway, now the CL kernel builds but during execution throws the following:

image

Will investigate this further in a while, it may still be a problem caused by the two emulated functions introduced in this PR.

@fireice-uk
Copy link
Owner

It is more likely to be a problem with the worksize - have you tried setting it to 1? If that doesn't work try setting the second argument on reqd_work_group_size attributes to 1 too.

@fireice-uk
Copy link
Owner

Also - great job!

@kth5
Copy link
Author

kth5 commented Mar 30, 2017

So with a worksize of 1 and an intensity of 40 I seem to be hitting the sweet spot on this R9 M370X (2GB). 25H/s... well, not worth it but it seems to be hashing away. I'll leave the miner running for a few hours to verify correctness.

EDIT:
R9 M395 4GB gets a 60H/s out of worksize 1 / intensity 100. I'll let both run for a while. Hasn't found any shares yet so can't say still if good.

m395-4gb

@kth5
Copy link
Author

kth5 commented Mar 30, 2017

So, getting a GPU compute error whenever a share is found (?). Other than that it seems to be happily hashing and detecting new blocks. Not very useful yet indeed.

I'm betting it's the implementation of amd_bfe of mine which is amateurish at best. I'll have another shot at it when I find more time.

@fireice-uk
Copy link
Owner

That would cause it. Shares are double checked on CPU, if your algo is wrong then all of them will be incorrect.

@fireice-uk
Copy link
Owner

Another thing - nicehash has special nonce requirements. Double check on a kosher pool.

@kth5
Copy link
Author

kth5 commented Mar 30, 2017

Alright, here's the sweet taste of success (for me):

image

60H/s are really disappointing though. On windows with the same R9 M395X (iMac 27", Late 2015) I get roughly 450H/s with Claymore. I'd like to think the emulation is still too inefficient hoping it's not just Apple's stuff being crazy Photoshop and Premiere centrist when it comes to OpenCL.

@fireice-uk Anyway, this basically works on macOS 10.12.4 albeit slow. So this could be merged if you are happy with the way it changes things.

@fireice-uk
Copy link
Owner

fireice-uk commented Mar 30, 2017

If you want to play around some more I don't mind guiding you. This is the line where the performance is breaking:
https://github.com/fireice-uk/xmr-stak-amd/blob/master/opencl/wolf-aes.cl#L74

Try doing it in other way such as an and mask and shift.
For reference this is a CPU version using this technique:

https://github.com/fireice-uk/xmr-stak-cpu/blob/master/crypto/soft_aes.c#L163

`global_worker_size` needs to be a multiple of `local_worker_size`, this is not always the case in the current code.
Remove the bit magic with a easy to read `ceil`.
amd_gpu/gpu.c Outdated
@@ -163,10 +163,12 @@ const char* err_to_str(cl_int ret)
return "CL_INVALID_LINKER_OPTIONS";
case CL_INVALID_DEVICE_PARTITION_COUNT:
return "CL_INVALID_DEVICE_PARTITION_COUNT";
#ifndef __APPLE__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guard is not only a fix for APPLE it fixes also ubuntu 14.04. My suggestion is to check the AMD APP SDK VERSION and disable this code for old APP SDK's

Copy link
Author

@kth5 kth5 Apr 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to do both as there is no direct support of AMD for Apple's OS. We need to deal with Apple's SDK which seems to be an entirely different thing.

I'll investigate checking for AMD APPSDK version for you on one of my Ubuntu boxes.

EDIT:
Does not seem to be necessary to fix for Ubuntu. Seems with regular AMD APPSDK 3.0 and an extra -I in C/CXXFLAGS it just builds and works like a charm. Maybe you accidentally mixed in opencl-headers etc. due to cmake not detecting an SDK if not installed in /usr.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it in #15 because this is an issue with OpenCl < 2.0 and independent of the APPSDK.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, makes sense now. Should've taken the cucumber slices off my eyes first. :)

I'll check whether the OpenCL version check is enough already on Apple too. Since my PR isn't merged yet, I'm assuming fireice-uk agrees performance is too abysmal. Thus I hope yours gets merged first so I can pull that change in.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I won't be able to spend too much time on this in the near future so the PR may end up pulling master a few times anyway.

guard unknown defined from OpenCl 2.X
@psychocrypt psychocrypt mentioned this pull request Apr 7, 2017
- allow compiling without microhttpd
- allow compiling without OpenSSL
- use find opencl to provide platform independence
- install `config.txt` with command `make install`
- copy opencl folder on `make install` to the install directory
@psychocrypt
Copy link
Collaborator

Could you try to build this branch on top of #13, #15 and #16

I fixed a bug in #13 and refactor cmake in #16 thus all should be more platform independent.

@psychocrypt
Copy link
Collaborator

I read that you have not time for a rebase. I will put all your changes on top of my after they are merged, after that you could test if it is compiling on MAC.

@kth5
Copy link
Author

kth5 commented Apr 7, 2017

Merged your changes and it still builds like a charm. I removed Apple specifics where applicable too. Performs as before.

@psychocrypt
Copy link
Collaborator

@fireice-uk Opened a dev branch to simplify releases and and support to test non stable changes, I will create a new pull request (in your name) to the dev on top of all my changes, because this pull also contains some config changes.

thx for testing!!

{ "index" : 3, "intensity" : 1000, "worksize" : 8, "affine_to_cpu" : false },
{ "index" : 4, "intensity" : 1000, "worksize" : 8, "affine_to_cpu" : false },
{ "index" : 5, "intensity" : 1000, "worksize" : 8, "affine_to_cpu" : false },
{ "index" : 0, "intensity" : 16, "worksize" : 1, "affine_to_cpu" : false }
Copy link
Collaborator

@psychocrypt psychocrypt Apr 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw With my changes it should now possible to change the worksize 1 to 8

@psychocrypt psychocrypt mentioned this pull request Apr 8, 2017
4 tasks
// Description
// dst.s0 = (uint) (((((long)src0.s0) << 32) | (long)src1.s0) >> (src2.s0 & 31))
// similar operation applied to other components of the vectors.
#define amd_bitalign(src0, src1, src2) (src0 << (32-src2)) | (src1 >> src2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG: the parentheses are wrong placed and missing cast to 64 bit: this is solve din #17

@psychocrypt
Copy link
Collaborator

I will close this PR all needed changes are now in #17 and the BUGs are fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants