Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash under OSX #1677

Open
GreenAsJade opened this issue Aug 1, 2018 · 13 comments
Open

Crash under OSX #1677

GreenAsJade opened this issue Aug 1, 2018 · 13 comments

Comments

@GreenAsJade
Copy link

This crash happens when built with

#define USE_OPENCL

It does not happen when built with

// #define USE_OPENCL

Steps to reproduce:

  • get weights from lizzie: network.gz
  • follow build instructions for Mac
  • ./leelaz -w network.gz
  • play black q4
  • genmove black

Result:

Leela: play black q4

Passes: 0 Black (X) Prisoners: 0
White (O) to move White (O) Prisoners: 0

a b c d e f g h j k l m n o p q r s t
19 . . . . . . . . . . . . . . . . . . . 19
18 . . . . . . . . . . . . . . . . . . . 18
17 . . . . . . . . . . . . . . . . . . . 17
16 . . . + . . . . . + . . . . . + . . . 16
15 . . . . . . . . . . . . . . . . . . . 15
14 . . . . . . . . . . . . . . . . . . . 14
13 . . . . . . . . . . . . . . . . . . . 13
12 . . . . . . . . . . . . . . . . . . . 12
11 . . . . . . . . . . . . . . . . . . . 11
10 . . . + . . . . . + . . . . . + . . . 10
9 . . . . . . . . . . . . . . . . . . . 9
8 . . . . . . . . . . . . . . . . . . . 8
7 . . . . . . . . . . . . . . . . . . . 7
6 . . . . . . . . . . . . . . . . . . . 6
5 . . . . . . . . . . . . . . . . . . . 5
4 . . . + . . . . . + . . . . .(X). . . 4
3 . . . . . . . . . . . . . . . . . . . 3
2 . . . . . . . . . . . . . . . . . . . 2
1 . . . . . . . . . . . . . . . . . . . 1
a b c d e f g h j k l m n o p q r s t

Hash: C231EF71B9CB954B Ko-Hash: 5223DC630503F965

Black time: 01:00:00
White time: 01:00:00

Leela: genmove black
Thinking at most 36.3 seconds...
NN eval=1.000000
Abort trap: 6

@gcp
Copy link
Member

gcp commented Aug 1, 2018

Almost certainly broken graphics drivers, just like many Macs. Nothing we can do to fix that.

You skipped all the output from the graphics/OpenCL detection.

@featurecat
Copy link

is there a fix for Mac users? or do they have no choice to use gpu mode?

@uestccokey
Copy link

I met this crash, too.

@gcp
Copy link
Member

gcp commented Aug 1, 2018

is there a fix for Mac users?

Whether you have problems or not depends entirely on the graphics card in your Mac and the driver versions that you have for it. If your drivers are broken you can hope Apple issues an update (hahaha) or use the CPU.

@Mardak
Copy link
Collaborator

Mardak commented Aug 1, 2018

For those running into problems, what's the output from the beginning of running ./leelaz ? It should include the OpenCL version / date and devices with driver info, etc. as well as which device ended up being selected.

@GreenAsJade
Copy link
Author

leela zero says

Initializing OpenCL (autodetect precision).
Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 (Oct 31 2017 18:30:00)
Platform profile: FULL_PROFILE
Platform name: Apple
Platform vendor: Apple
Device ID: 0
Device name: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz
Device type: CPU
Device vendor: Intel
Device driver: 1.1
Device speed: 2200 MHz
Device cores: 8 CU
Device score: 512
Device ID: 1
Device name: Iris Pro
Device type: GPU
Device vendor: Intel
Device driver: 1.2(Dec 19 2017 21:05:44)
Device speed: 1200 MHz
Device cores: 40 CU
Device score: 612
Selected platform: Apple
Selected device: Iris Pro
with OpenCL 1.2 capability.

@Mardak
Copy link
Collaborator

Mardak commented Aug 4, 2018

So.. I updated to 10.13.6 and now this device causes my machine to either hard lock or Abort trap: 6

Device name:   Intel(R) HD Graphics 530
Device type:   GPU
Device vendor: Intel Inc.
Device driver: 1.2(May  8 2018 15:59:46)
Device speed:  1050 MHz
Device cores:  24 CU
Device score:  612

The above was with 10.13.5 and didn't crash but updating to Device driver: 1.2(Jun 25 2018 19:28:17) crashes now.

@pasky
Copy link

pasky commented Aug 8, 2018

Can confirm it crashes with Abort trap: 6 for me too (after the first NN eval print), and also hard locked once.

Detected 1 OpenCL platforms.
Platform version: OpenCL 1.2 (May 24 2018 20:07:03)
Platform profile: FULL_PROFILE
Platform name:    Apple
Platform vendor:  Apple
Device ID:     0
Device name:   Intel(R) Core(TM) i5-6360U CPU @ 2.00GHz
Device type:   CPU
Device vendor: Intel
Device driver: 1.1
Device speed:  2000 MHz
Device cores:  4 CU
Device score:  512
Device ID:     1
Device name:   Intel(R) Iris(TM) Graphics 540
Device type:   GPU
Device vendor: Intel Inc.
Device driver: 1.2(Jun 25 2018 19:28:17)
Device speed:  1000 MHz
Device cores:  48 CU
Device score:  612
Selected platform: Apple
Selected device: Intel(R) Iris(TM) Graphics 540
with OpenCL 1.2 capability.
Half precision compute support: NO

Started OpenCL SGEMM tuner.
Will try 290 valid configurations.
(1/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=16 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 2.8279 ms (41.7 GFLOPS)
(4/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=16 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 2.3406 ms (50.4 GFLOPS)
(5/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 2.1636 ms (54.5 GFLOPS)
(10/290) KWG=32 KWI=2 MDIMA=8 MDIMC=8 MWG=16 NDIMB=8 NDIMC=8 NWG=16 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=2 1.5910 ms (74.1 GFLOPS)
(125/290) KWG=16 KWI=2 MDIMA=8 MDIMC=8 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=4 VWN=2 1.5321 ms (77.0 GFLOPS)
(136/290) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=4 VWN=2 1.2712 ms (92.8 GFLOPS)
(169/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=4 VWN=2 1.2648 ms (93.3 GFLOPS)
(234/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=32 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=2 VWN=4 1.1822 ms (99.8 GFLOPS)
(263/290) KWG=16 KWI=2 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=4 VWN=4 1.1791 ms (100.0 GFLOPS)
(281/290) KWG=16 KWI=8 MDIMA=16 MDIMC=16 MWG=64 NDIMB=8 NDIMC=8 NWG=32 SA=1 SB=1 STRM=0 STRN=0 VWM=4 VWN=4 1.0947 ms (107.8 GFLOPS)
Wavefront/Warp size: 16
Max workgroup size: 256
Max workgroup dimensions: 256 256 256
Measuring performance - 1.37 n/s single vs. 0.37 n/s half - Using OpenCL single precision (less than 5% slower than half)

@pasky
Copy link

pasky commented Aug 8, 2018

Regarding "integrated intel graphics not making much difference" - the on-paper performance of Iris 540 is 230GFLOPS in single precision, Leela can squeeze out 108GFLOPS. Meanwhile, the CPU peak performance according to whetstone is 16GFLOPS - so not using GPU on my macbook makes Leela still 6x-14x slower, very roughly. It makes a difference even for integrated GPUs. :)

@bood
Copy link
Collaborator

bood commented Aug 23, 2018

Also hit this issue lately, but only with next branch, Leela Zero 0.15 does not have this issue though.

Probably related to changes from PR #1643, I found it work before that commit. Possibly the GPU on Mac has problem supporting that. How hard it would be to support both F(4x4, 3x3) and F(2x2, 3x3)? It is a pain for me now writing new code on my MBP. @gcp @Ttl

@bood
Copy link
Collaborator

bood commented Aug 24, 2018

@Ttl To work around this problem , I tried to revert #1643 but keeps the batch support:
https://github.com/bood/leela-zero/tree/winograd_2x2_revert

Though no error reported, the NN eval seems to be totally off on the first move:

Leela: genmove b
Thinking at most 36.0 seconds...
NN eval=1.000000
Playouts: 788, Win: 46.77%, PV: D3 E6 C12 pass R4 C14 O15 B15
Playouts: 1263, Win: 66.38%, PV: D3 E6 C12 pass R4 C14 O15 B15

Could you take a look and see what I did wrong here?

@Ttl
Copy link
Member

Ttl commented Aug 24, 2018

You have transposed some of the matrices. Type the matrices from https://arxiv.org/abs/1509.09308 equation (7) exactly as they are written.

diff --git a/src/kernels/convolve3.opencl b/src/kernels/convolve3.opencl
index 4149632..83f005a 100644
--- a/src/kernels/convolve3.opencl
+++ b/src/kernels/convolve3.opencl
@@ -30,10 +30,10 @@ void __in_transform_eq(real x[WINOGRAD_ALPHA][WINOGRAD_ALPHA], __global net_t *
     real T2[WINOGRAD_ALPHA][WINOGRAD_ALPHA];
 
     const real Bt[WINOGRAD_ALPHA * WINOGRAD_ALPHA] = \
-                       {1.0, 0.0, 0.0, 0.0,
-                         0.0, 1.0, -1.0, 1.0,
-                         -1.0, 1.0, 1.0, 0.0,
-                         0.0, 0.0, 0.0, -1.0};
+                       {1.0, 0.0, -1.0, 0.0,
+                         0.0, 1.0, 1.0, 0.0,
+                         0.0, -1.0, 1.0, 0.0,
+                         0.0, 1.0, 0.0, -1.0};
 
     // Calculates transpose(B).x.B
     for (int i = 0; i < WINOGRAD_ALPHA; i++){
@@ -154,10 +154,8 @@ void __out_transform_eq(__global const net_t * restrict M, real o[WINOGRAD_M * W
     }
 
     const real At[WINOGRAD_M * WINOGRAD_ALPHA] = \
-                    {1.0, 1.0,
-                     1.0, 1.0,
-                     1.0, -1.0,
-                     0.0, -1.0};
+                    {1.0, 1.0, 1.0, 0.0,
+                    0.0, 1.0, -1.0, -1.0};
 
     // Calculates transpose(A).temp_m.A
     for (int i = 0; i < WINOGRAD_M; i++){

CPUPipe.cpp also has to be fixed.

I don't see how changing the matrix sizes would fix the problem. Before changing to F(2x2, 3x3) try commenting out args += " -DWINOGRAD_SIMD"; on OpenCL.cpp:790 and see if that helps.

@bood
Copy link
Collaborator

bood commented Aug 24, 2018

@Ttl Thanks. It works now. FYI commenting out args += " -DWINOGRAD_SIMD" does not help.
Another symptom is when using F(4x4), it cost significantly more time on m_program.build(args.c_str());

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants