Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Nvidia support: errors out with clEnqueueReadBuffer (-5) with both CUDA 7.5 libs and system libs #6

Closed
Motoma opened this issue Oct 29, 2016 · 20 comments

Comments

@Motoma
Copy link

Motoma commented Oct 29, 2016

Ubuntu Linux 4.4.0 x86_64

CUDA Driver = CUDART
CUDA Driver Version = 8.0
CUDA Runtime Version = 7.5
Device0 = GeForce GTX 970
Device1 = GeForce GTX 970

Compiled silentarmy against both the libraries provided by apt-get as well as those included in the CUDA SDK.

Regardless of the command I run, I receive:

Building program
Hash tables will use 1744.8 MB
Running...
clEnqueueReadBuffer (-5)
@mbevand
Copy link
Owner

mbevand commented Oct 30, 2016

Thanks for your report. As per cl.h, error -5 is:
#define CL_OUT_OF_RESOURCES -5

Could you report the full output of "clinfo"? It sounds like your cards doesn't support silentarmy allocating 1744.8 MB of GPU memory.

@Motoma
Copy link
Author

Motoma commented Oct 30, 2016

You may be right:
Max memory allocation 1072873472 (1023MiB)

Here is the output of clinfo: clinfo.txt

@mbevand
Copy link
Owner

mbevand commented Oct 30, 2016

Could you try editing param.h and defining NR_ROWS_LOG to 16, then recompile and try again? This will reduce memory usage (at the cost of performance) and may work on your cards.

@Motoma
Copy link
Author

Motoma commented Oct 30, 2016

Still no luck:

$ ./silentarmy --nonces 100 -v -v
Solving default all-zero 140-byte header
Found 1 OpenCL platform(s)
Building program
Hash tables will use 402.7 MB
Running...

Solving nonce 0000000000000000000000000000000000000000000000000000000000000000
Round 0
Dropped: 0 (coll) 0 (stor)
Round 1
Dropped: 0 (coll) 0 (stor)
Round 2
clEnqueueReadBuffer (-5)

Thanks for the quick response and for looking into this.

@mbevand
Copy link
Owner

mbevand commented Oct 30, 2016

Weird. I'll put it on my todo list to look into this issue. solardiz had reported the same problem when trying to run silentarmy on his Nvidia cards.

@mbevand
Copy link
Owner

mbevand commented Oct 31, 2016

Could you try checking out the latest revision? I fixed unaligned memory accesses and I think it may fix your issue on Nvidia cards...(5f68344)

@GibsT
Copy link

GibsT commented Oct 31, 2016

I'm using your latest version and linux cuda8 with a gtx 960 and i modified the rows to 16
On latest build it is still getting the error

Solving default all-zero 140-byte header
Found 1 OpenCL platform(s)
Building program
Hash tables will use 402.7 MB
Running...

Solving nonce 0000000000000000000000000000000000000000000000000000000000000000
clEnqueueReadBuffer (-5)

Some info about the card
[0] GeForce GTX 960
CL_DEVICE_TYPE: GPU
CL_DEVICE_GLOBAL_MEM_SIZE: 4236312576
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 1059078144
CL_DEVICE_MAX_WORK_GROUP_SIZE: 1024

@Motoma
Copy link
Author

Motoma commented Oct 31, 2016

I can back @GibsT's assessment that this problem still persists on master.

@mbevand
Copy link
Owner

mbevand commented Oct 31, 2016

Ok guys, thanks for testing. I'll try to get my hands on an Nvidia GPU to fix this bug.

@GibsT
Copy link

GibsT commented Oct 31, 2016

I can debug for you if you want. If there is anything you need me to do
over here let me know. Im a C++ programmer but I don't really have any
experience with crypto algo's or kernels

On Oct 31, 2016 8:49 AM, "mbevand" notifications@github.com wrote:

Ok guys, thanks for testing. I'll try to get my hands on an Nvidia GPU to
fix this bug.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#6 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AWBJYwLwicA4gIjTDAVOnsvJZDY_U5jAks5q5f6zgaJpZM4KkLdW
.

@ivanmladek
Copy link

Same here, I can help with GPU debugging:
[OPENCL]:Found suitable OpenCL device [GeForce GTX 1070] with 8507555840 bytes of GPU memory
[OPENCL]:Using platform: NVIDIA CUDA
[OPENCL]:Using device: GeForce GTX 1070(OpenCL 1.2 CUDA)

@mbevand
Copy link
Owner

mbevand commented Nov 2, 2016

I am told that on Nvidia CL_OUT_OF_RESOURCES could be a very generic error (eg. the kernel is accessing memory outside the bounds of a buffer). So try to run with "-v -v -v" to see at what step in Equihash is the clEnqueueReadBuffer() call that fails (please attach the output to this bug). Try to comment out big chunks of the OpenCL kernel to see if the error disappears. Etc. Maybe Nvidia has a debugger? I don't know. I have only worked with AMD GPUs in the past. In a few days I should find the time to try debugging this.

@GibsT
Copy link

GibsT commented Nov 4, 2016

[356702.608922] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Warp Exception on (GPC 0, TPC 0): Misaligned Address [356702.608928] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Global Exception on (GPC 0, TPC 0): Physical Multiple Warp Errors [356702.608932] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x504648=0x5000f 0x504650=0x4 0x504644=0xd3eff2 0x50464c=0x7f [356702.608945] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Warp Exception on (GPC 0, TPC 1): Misaligned Address [356702.608949] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Global Exception on (GPC 0, TPC 1): Physical Multiple Warp Errors [356702.608952] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x504e48=0x14000f 0x504e50=0x4 0x504e44=0xd3eff2 0x504e4c=0x7f [356702.608965] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Warp Exception on (GPC 0, TPC 2): Misaligned Address [356702.608968] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Global Exception on (GPC 0, TPC 2): Physical Multiple Warp Errors [356702.608971] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x505648=0x7000f 0x505650=0x4 0x505644=0xd3eff2 0x50564c=0x7f [356702.608984] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Warp Exception on (GPC 0, TPC 3): Misaligned Address [356702.608988] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Global Exception on (GPC 0, TPC 3): Physical Multiple Warp Errors [356702.608991] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x505e48=0xa000f 0x505e50=0x4 0x505e44=0xd3eff2 0x505e4c=0x7f [356702.609005] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Warp Exception on (GPC 1, TPC 0): Misaligned Address [356702.609009] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Global Exception on (GPC 1, TPC 0): Physical Multiple Warp Errors [356702.609012] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x50c648=0xf 0x50c650=0x4 0x50c644=0xd3eff2 0x50c64c=0x7f [356702.609026] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Warp Exception on (GPC 1, TPC 1): Misaligned Address [356702.609030] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Global Exception on (GPC 1, TPC 1): Physical Multiple Warp Errors [356702.609033] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x50ce48=0x2000f 0x50ce50=0x4 0x50ce44=0xd3eff2 0x50ce4c=0x7f [356702.609046] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Warp Exception on (GPC 1, TPC 2): Misaligned Address [356702.609049] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Global Exception on (GPC 1, TPC 2): Physical Multiple Warp Errors [356702.609052] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x50d648=0x5000f 0x50d650=0x4 0x50d644=0xd3eff2 0x50d64c=0x7f [356702.609065] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Warp Exception on (GPC 1, TPC 3): Misaligned Address [356702.609068] NVRM: Xid (PCI:0000:01:00): 13, Graphics SM Global Exception on (GPC 1, TPC 3): Physical Multiple Warp Errors [356702.609071] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ESR 0x50de48=0xf 0x50de50=0x4 0x50de44=0xd3eff2 0x50de4c=0x7f [356702.609082] NVRM: Xid (PCI:0000:01:00): 13, Graphics Exception: ChID 0041, Class 0000b1c0, Offset 00001b0c, Data 00000000
and
`Solving default all-zero 140-byte header
Found 1 OpenCL platform(s)
Using GPU device ID 0
Building program
Hash tables will use 1208.0 MB
Running...

Solving nonce 0000000000000000000000000000000000000000000000000000000000000000
Round 0
Dropped: 0 (coll) 0 (stor)
Round 1
Dropped: 0 (coll) 0 (stor)
Round 2
clEnqueueReadBuffer (-5)
`

@mbevand
Copy link
Owner

mbevand commented Nov 4, 2016

It's very interesting to me that it fails at Round 2 and not earlier. I should have time to start investigating this bug in the next 2-3 days.

@GibsT
Copy link

GibsT commented Nov 4, 2016

Not sure if this helps but I remember reading somewhere about how amd uses 64 threads per wave and nvidia likes to use 32 per warp. 512 bits bandwidth. 64kb shared memory cache. Up to 64 warps per multiprocessor

@mbevand mbevand changed the title clEnqueueReadBuffer (-5) when running Nvidia (both CUDA 7.5 libs and system libs) No Nvidia support: errors out with clEnqueueReadBuffer (-5) with both CUDA 7.5 libs and system libs Nov 5, 2016
@mbevand
Copy link
Owner

mbevand commented Nov 6, 2016

So I have good news. The CL_OUT_OF_RESOURCES error is caused by unaligned memory accesses (which happen at round 2 and above.) The fix is relatively straightforward. This means Nvidia will be supported soon.

@dacox
Copy link

dacox commented Nov 7, 2016

@Motoma how are you using NVIDIA cards? Are you installing any of the AMD stuff from the README?

@mbevand
Copy link
Owner

mbevand commented Nov 7, 2016

I will update the README with instructions for Nvidia.

@Motoma
Copy link
Author

Motoma commented Nov 7, 2016

@dacox No, I'm not following the AMD instructions in the readme. I have used both the OpenCL libraries included in the apt repository as well as the ones included in the Nvidia CUDA SDK.

@mbevand
Copy link
Owner

mbevand commented Nov 8, 2016

Nvidia is now supported in SILENTARMY v4: a03e308

README.md has been updated with instructions on how to install the Nvidia packages on Ubuntu 16.04

@mbevand mbevand closed this as completed Nov 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants