Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rocm-opencl-runtime] Blender demo file eventually crashes while compiling kernels #258

Closed
kode54 opened this issue Jun 15, 2020 · 17 comments
Labels
runtime error wontfix This will not be worked on

Comments

@kode54
Copy link

kode54 commented Jun 15, 2020

Attempting to use blender 17:2.83-1 to render the Blender 2.81 - The Junk Shop demo from here:

https://www.blender.org/download/demo-files/

Using:

llvm-amdgpu 3.50-2
comgr 3.50-2
rocm-opencl-runtime 3.50-1 (modified to use Release build, but the original did the same thing)

Results in Blender crashing partway through compiling the render kernels, never achieving any render output. This used to work with rocm-opencl-runtime 3.30-1 on the same system and GPU.

@c0d3st0rm
Copy link

c0d3st0rm commented Jun 15, 2020

I had this happen with Blender, but I'm not sure which demo file (maybe classroom) - did you try any others? And what GPU do you have?

I think it's worth noting that HIP also didn't work, and would segfault when I tried a simple program. In both cases, after the program crashed, it left my GPU (Vega 56) in a strange power state (I think) where it electrically whined, loudly. I haven't touched it since, but something is definitely wrong.

@tpkessler
Copy link
Member

Hey @kode54. I can reproduce our problem:

Unknown register class
UNREACHABLE executed at /home/torsten/Dokumente/rocm-arch/llvm-amdgpu/src/llvm-project-rocm-3.5.0/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:846!
Unknown register class
UNREACHABLE executed at /home/torsten/Dokumente/rocm-arch/llvm-amdgpu/src/llvm-project-rocm-3.5.0/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:846!
sh: line 1: 20333 Aborted                 (core dumped) "/usr/bin/blender" "--background" "--factory-startup" "--python-expr" "import _cycles; _cycles.opencl_compile(r'0', r'Vega 10 XL/XT [Radeon RX Vega 56/64]', r'AMD Accelerated Parallel Processing', r'-cl-no-signed-zeros -cl-mad-enable -cl-std=CL2.0 -D__KERNEL_OPENCL_AMD__ -D__KERNEL_CL_KHR_FP16__ -D__SPLIT_KERNEL__ -D__COMPUTE_DEVICE_GPU__ -D__KERNEL_EXPERIMENTAL__ -D__NODES_MAX_GROUP__=3 -D__NODES_FEATURES__=1 -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_SHADER_RAYTRACE__', r'kernel_shader_eval.cl', r'/home/torsten/.cache/cycles/kernels/cycles_kernel_split_shader_eval_0E8EF254DEC7EFB502A3021719DBE5F8_00EB851A7B3EC9D347F0B0F8092C9C4F.clbin')" > /dev/null
Cycles: compiling OpenCL program split_shader_eval...
sh: line 1: 20550 Aborted                 (core dumped) "/usr/bin/blender" "--background" "--factory-startup" "--python-expr" "import _cycles; _cycles.opencl_compile(r'0', r'Vega 10 XL/XT [Radeon RX Vega 56/64]', r'AMD Accelerated Parallel Processing', r'-cl-no-signed-zeros -cl-mad-enable -cl-std=CL2.0 -D__KERNEL_OPENCL_AMD__ -D__KERNEL_CL_KHR_FP16__ -D__SPLIT_KERNEL__ -D__COMPUTE_DEVICE_GPU__ -D__KERNEL_EXPERIMENTAL__ -D__NODES_MAX_GROUP__=0 -D__NODES_FEATURES__=0 -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_SHADER_RAYTRACE__', r'kernel_split_bundle.cl', r'/home/torsten/.cache/cycles/kernels/cycles_kernel_split_bundle_BCC3E5D9BEEFFC232AA9564AB760CF2E_F7A0F4162B5AFB07B0C148FBB8085F3C.clbin')" > /dev/null

I'm using llvm-amdgpu 3.5.0-2, comgr 3.5.0-2 and rocm-opencl-runtime 3.5.0-1 on a Vega 56. Could you also please post our error output here?

@tpkessler
Copy link
Member

@c0d3st0rm Could you please open a separate issue for HIP, please? Can you run the HIP-Examples?

git clone https://github.com/ROCm-Developer-Tools/HIP-Examples.git
cd HIP-Examples && git submodule init && git submodule update && ./test_all.sh

@c0d3st0rm
Copy link

c0d3st0rm commented Jun 15, 2020

@tpkessler it will be a while before I can I think. The llvm-amdgpu build kept on failing because it ran out of memory during linking (6C/12T, 32GB), and some of the other packages had broken build-time dependencies on other ROCm packages (I had to install package X to be able to build Y, for eg) - I should be able to reproduce this later, however it might be my AUR manager (pikaur) making the builds fail, I'm not sure.

@tpkessler
Copy link
Member

You can use the binary packages hosted by arch4edu. From our README:

It is also recommended to use the arch4edu binary repository as it will greatly speed up your installation time.
For directions see Add arch4edu to your Archlinux.

@c0d3st0rm
Copy link

I'll build it from source now, and see if I can recreate the issue - what information do you need?

@c0d3st0rm
Copy link

c0d3st0rm commented Jun 15, 2020

So even just running clinfo, my GPU starts whining like crazy and the GPU tach (see Vega reference cards) is stuck at a constant 50%. Something is definitely wrong. It starts when I run, and doesn't stop - I'll run the HIP examples soon.

@c0d3st0rm
Copy link

Here's the logs (I can't upload it as a file for some reason):

HIP-Examples logs ``` $ ./test_all.sh

==== vectorAdd ====
rm -f ./vectoradd_hip.exe
rm -f vectoradd_hip.o
rm -f /opt/rocm/hip/src/*.o
/opt/rocm/hip/bin/hipcc -g -c -o vectoradd_hip.o vectoradd_hip.cpp
/opt/rocm/hip/bin/hipcc vectoradd_hip.o -o vectoradd_hip.exe
./vectoradd_hip.exe
System minor 0
System major 9
agent prop name Vega 10 XL/XT [Radeon RX Vega 56/64]
hip Device prop succeeded
PASSED!

==== gpu-burn ====
rm -rf build
mkdir -p build
/opt/rocm/hip/bin/hipcc -I/opt/rocm/hip/include -I/opt/rocm/hcc/include -O3 -c -o build/AmdGpuMonitor.o AmdGpuMonitor.cpp
mkdir -p build
/opt/rocm/hip/bin/hipcc -I/opt/rocm/hip/include -I/opt/rocm/hcc/include -O3 -c -o build/BurnKernel.o BurnKernel.cpp
mkdir -p build
/opt/rocm/hip/bin/hipcc -I/opt/rocm/hip/include -I/opt/rocm/hcc/include -O3 -c -o build/common.o common.cpp
mkdir -p build
/opt/rocm/hip/bin/hipcc -I/opt/rocm/hip/include -I/opt/rocm/hcc/include -O3 -c -o build/gpuburn.o gpuburn.cpp
/opt/rocm/hip/bin/hipcc -lm -lpthread -o build/gpuburn-hip build/AmdGpuMonitor.o build/BurnKernel.o build/common.o build/gpuburn.o
Total no. of GPUs found: 1
Init Burn Thread for device (0)
Temps: [GPU3: 42 C] 5s
Burn Thread using device (0)
Temps: [GPU3: 47 C] 4s
Temps: [GPU3: 50 C] 3s
Temps: [GPU3: 51 C] 2s
Temps: [GPU3: 52 C] 1s
Stopping burn thread on device (0)

==== strided-access ====
rm -f strided-access *.o
/opt/rocm/hip/bin/hipcc -std=c++11 -O3 -o strided-access benchmark-hip.cpp

Using device: Vega 10 XL/XT [Radeon RX Vega 56/64]

stride time GB/sec

0 0.000713 336.606
1 0.003724 64.4468
2 0.007212 33.2779
3 0.003699 64.8824
4 0.003828 62.6959
5 0.004892 49.0597
6 0.00515 46.6019
7 0.005924 40.5132
8 0.00653 36.7534
9 0.007665 31.3112
10 0.008263 29.0451
11 0.009036 26.5604
12 0.009712 24.7117
13 0.010546 22.7574
14 0.011369 21.11
15 0.012308 19.4995
16 0.012879 18.635
17 0.013404 17.9051
18 0.01349 17.791
19 0.013862 17.3135
20 0.014034 17.1013
21 0.014299 16.7844
22 0.014481 16.5734
23 0.01457 16.4722
24 0.014944 16.06
25 0.014633 16.4013
26 0.014353 16.7212
27 0.014226 16.8705
28 0.01392 17.2414
29 0.013902 17.2637
30 0.013706 17.5106
31 0.013445 17.8505
32 0.01246 19.2616

==== rtm8 ====
Using HIP_PATH=/opt/rocm/hip
hipcc -std=c++11 -O3 -o rtm8_hip rtm8.cpp
memory (MB) = 984.096000
pts (billions) = 1.122751
Tflops = 0.075224
dt = 0.517063
pt_rate (millions/sec) = 2171.400645
flop_rate (Gflops) = 145.483843
speedup = 0.307021

==== reduction ====
rm -f reduction .o
/opt/rocm/hip/bin/hipcc -std=c++11 -O3 -o reduction reduction.cpp
./reduction 1024
1024*4
ARRAYSIZE: 1024
Array size: 0.00390625 MB
The average performance of reduction is 0.0275322 GBytes/sec
VERIFICATION: result is CORRECT

./reduction 8388608
ARRAYSIZE: 8388608
Array size: 32 MB
The average performance of reduction is 32.0517 GBytes/sec
VERIFICATION: result is CORRECT

./reduction 16777216
ARRAYSIZE: 16777216
Array size: 64 MB
The average performance of reduction is 48.2587 GBytes/sec
VERIFICATION: result is CORRECT

./reduction 33554432
ARRAYSIZE: 33554432
Array size: 128 MB
The average performance of reduction is 77.2912 GBytes/sec
VERIFICATION: result is CORRECT

./reduction 67108864
ARRAYSIZE: 67108864
Array size: 256 MB
The average performance of reduction is 119.766 GBytes/sec
VERIFICATION: result is CORRECT

./reduction 134217728
ARRAYSIZE: 134217728
Array size: 512 MB
The average performance of reduction is 172.528 GBytes/sec
VERIFICATION: result is CORRECT

./reduction 268435456
ARRAYSIZE: 268435456
Array size: 1024 MB
The average performance of reduction is 236.438 GBytes/sec
VERIFICATION: result is CORRECT

./reduction 536870912
ARRAYSIZE: 536870912
Array size: 2048 MB
The average performance of reduction is 284.13 GBytes/sec
VERIFICATION: result is CORRECT

==== mini-nbody ====
hipcc -I../ -DSHMOO nbody-orig.cpp -o nbody-orig
./nbody-orig 1024
1024, 1.464
./nbody-orig 2048
2048, 3.353
./nbody-orig 4096
4096, 7.153
./nbody-orig 8192
8192, 16.307
./nbody-orig 16384
16384, 31.840
./nbody-orig 32768
32768, 55.158
./nbody-orig 65536
65536, 69.340
./nbody-orig 131072
131072, 71.836
./nbody-orig 262144
262144, 75.026
./nbody-orig 524288
524288, 76.958
hipcc -I../ -DSHMOO nbody-soa.cpp -o nbody-soa
./nbody-soa 1024
1024, 1.599
./nbody-soa 2048
2048, 3.438
./nbody-soa 4096
4096, 7.798
./nbody-soa 8192
8192, 17.671
./nbody-soa 16384
16384, 31.747
./nbody-soa 32768
32768, 55.523
./nbody-soa 65536
65536, 69.510
./nbody-soa 131072
131072, 71.886
hipcc -I../ -DSHMOO nbody-block.cpp -o nbody-block
./nbody-block 1024
1024, 1.677
./nbody-block 2048
2048, 3.762
./nbody-block 4096
4096, 8.317
./nbody-block 8192
8192, 18.614
./nbody-block 16384
16384, 33.261
./nbody-block 32768
32768, 57.309
./nbody-block 65536
65536, 70.118
./nbody-block 131072
131072, 71.924

==== add4 ====
rm -f gpu-stream-hip *.o
/opt/rocm/hip/bin/hipcc -std=c++11 -O3 -c hip-stream.cpp -o hip-stream.o
g++ -std=c++11 -O3 -c -o common.o common.cpp
/opt/rocm/hip/bin/hipcc -std=c++11 -O3 common.o hip-stream.o -lm -o gpu-stream-hip
./gpu-stream-hip
GPU-STREAM
Version: 1.0
Implementation: HIP
GridSize: 26214400 work-items
GroupSize: 1024 work-items
Operations/Work-item: 1
Precision: double

Running kernels 10 times
Array size: 200.0 MB (=0.2 GB) 0 bytes padding
Total size: 1000.0 MB (=1.0 GB)
Using HIP device Vega 10 XL/XT [Radeon RX Vega 56/64] (compute_units=56)
Driver: 313700
d_a=0x7fba4d800000
d_b=0x7fba40e00000
d_c=0x7fba34400000
d_d=0x7fba27a00000
d_e=0x7fba1b000000
Function MBytes/sec Min (sec) Max Average
Copy 330015.115 0.00127 0.00146 0.00130
Mul 330787.073 0.00127 0.00146 0.00129
Add4 314286.956 0.00334 0.00375 0.00339
Triad 311378.523 0.00202 0.00204 0.00203
GEOMEAN 321495.089
./gpu-stream-hip --groups 256 --groupSize 256
GPU-STREAM
Version: 1.0
Implementation: HIP
GridSize: 65536 work-items
GroupSize: 256 work-items
Operations/Work-item: 400
Using looper kernels:
Precision: double

Running kernels 10 times
Array size: 200.0 MB (=0.2 GB) 0 bytes padding
Total size: 1000.0 MB (=1.0 GB)
Using HIP device Vega 10 XL/XT [Radeon RX Vega 56/64] (compute_units=56)
Driver: 313700
d_a=0x7fbeee600000
d_b=0x7fbee1c00000
d_c=0x7fbed5200000
d_d=0x7fbec8800000
d_e=0x7fbebbe00000
Function MBytes/sec Min (sec) Max Average
Copy 328988.242 0.00127 0.00146 0.00130
Mul 329890.265 0.00127 0.00147 0.00130
Add4 307937.908 0.00341 0.00382 0.00347
Triad 309474.060 0.00203 0.00206 0.00204
GEOMEAN 318903.526
./gpu-stream-hip --float
GPU-STREAM
Version: 1.0
Implementation: HIP
Warning: If number of iterations set >= 8, expect rounding errors with single precision
GridSize: 26214400 work-items
GroupSize: 1024 work-items
Operations/Work-item: 1
Precision: float

Running kernels 10 times
Array size: 100.0 MB (=0.1 GB) 0 bytes padding
Total size: 500.0 MB (=0.5 GB)
Using HIP device Vega 10 XL/XT [Radeon RX Vega 56/64] (compute_units=56)
Driver: 313700
d_a=0x7f62ffe00000
d_b=0x7f62f9800000
d_c=0x7f62f3200000
d_d=0x7f62ecc00000
d_e=0x7f62e6600000
Function MBytes/sec Min (sec) Max Average
Copy 298268.829 0.00070 0.00084 0.00075
Mul 297188.885 0.00071 0.00085 0.00075
Add4 310974.637 0.00169 0.00208 0.00185
Triad 304796.454 0.00103 0.00118 0.00107
GEOMEAN 302756.745
./gpu-stream-hip --float --groups 256 --groupSize 256
GPU-STREAM
Version: 1.0
Implementation: HIP
Warning: If number of iterations set >= 8, expect rounding errors with single precision
GridSize: 65536 work-items
GroupSize: 256 work-items
Operations/Work-item: 400
Using looper kernels:
Precision: float

Running kernels 10 times
Array size: 100.0 MB (=0.1 GB) 0 bytes padding
Total size: 500.0 MB (=0.5 GB)
Using HIP device Vega 10 XL/XT [Radeon RX Vega 56/64] (compute_units=56)
Driver: 313700
d_a=0x7f11f7c00000
d_b=0x7f11f1600000
d_c=0x7f11eb000000
d_d=0x7f11e4a00000
d_e=0x7f11de400000
Function MBytes/sec Min (sec) Max Average
Copy 315691.136 0.00066 0.00077 0.00070
Mul 315934.160 0.00066 0.00077 0.00070
Add4 307951.474 0.00170 0.00213 0.00184
Triad 301402.419 0.00104 0.00123 0.00109
GEOMEAN 310185.861

==== cuda-stream ====
rm -f stream *.o
/opt/rocm/hip/bin/hipcc -std=c++11 -O3 -o stream stream.cpp
STREAM Benchmark implementation in HIP
Array size (double precision) = 536.87 MB
using 192 threads per block, 349526 blocks
output in IEC units (KiB = 1024 B)

Function Rate (GiB/s) Avg time(s) Min time(s) Max time(s)

Copy: 316.1695 0.00327369 0.00316286 0.00348091
Scale: 315.7648 0.00329926 0.00316691 0.00374413
Add: 292.8438 0.00524600 0.00512218 0.00543213
Triad: 292.9665 0.00525757 0.00512004 0.00542402

==== Rodinia ====
\033[0;35m--CLEAN: backprop\033[0m
\033[0;35m--CLEAN: bfs\033[0m
\033[0;35m--CLEAN: b+tree\033[0m
\033[0;35m--CLEAN: cfd\033[0m
\033[0;35m--CLEAN: dwt2d\033[0m
\033[0;35m--CLEAN: gaussian\033[0m
\033[0;35m--CLEAN: heartwall\033[0m
\033[0;35m--CLEAN: hotspot\033[0m
\033[0;35m--CLEAN: hybridsort\033[0m
\033[0;35m--CLEAN: kmeans\033[0m
\033[0;35m--CLEAN: lavaMD\033[0m
\033[0;35m--CLEAN: lud\033[0m
\033[0;35m--CLEAN: myocyte\033[0m
\033[0;35m--CLEAN: nn\033[0m
\033[0;35m--CLEAN: nw\033[0m
\033[0;35m--CLEAN: pathfinder\033[0m
\033[0;35m--CLEAN: srad\033[0m
\033[0;35m--CLEAN: streamcluster\033[0m
\033[0;35m--TESTING: backprop\033[0m
\033[0;31mBUILD FAILURE!!\033[0m
\033[0;35m--TESTING: bfs\033[0m
executing: ../../test/bfs/run0.cmd... \033[0;32mPASSED!\033[0m
executing: ../../test/bfs/run1.cmd... \033[0;32mPASSED!\033[0m
\033[0;35m--TESTING: b+tree\033[0m
executing: ../../test/b+tree/run0.cmd... \033[0;31mFAILED!\033[0m
\033[0;35m--TESTING: cfd\033[0m
executing: ../../test/cfd/run0.cmd... \033[0;32mPASSED!\033[0m
executing: ../../test/cfd/run1.cmd... \033[0;32mPASSED!\033[0m
\033[0;35m--TESTING: dwt2d\033[0m
executing: ../../test/dwt2d/run0.cmd... \033[0;32mPASSED!\033[0m
executing: ../../test/dwt2d/run1.cmd... \033[0;32mPASSED!\033[0m
\033[0;35m--TESTING: gaussian\033[0m
executing: ../../test/gaussian/run0.cmd... \033[0;32mPASSED!\033[0m
executing: ../../test/gaussian/run1.cmd... \033[0;32mPASSED!\033[0m
executing: ../../test/gaussian/run2.cmd... \033[0;32mPASSED!\033[0m
executing: ../../test/gaussian/run3.cmd... \033[0;32mPASSED!\033[0m
executing: ../../test/gaussian/run4.cmd... \033[0;32mPASSED!\033[0m
\033[0;35m--TESTING: heartwall\033[0m
executing: ../../test/heartwall/run0.cmd... \033[0;32mPASSED!\033[0m
\033[0;35m--TESTING: hotspot\033[0m
executing: ../../test/hotspot/run0.cmd... \033[0;32mPASSED!\033[0m
\033[0;35m--TESTING: hybridsort\033[0m
executing: ../../test/hybridsort/run0.cmd... \033[0;32mPASSED!\033[0m
\033[0;35m--TESTING: kmeans\033[0m
executing: ../../test/kmeans/run0.cmd... \033[0;32mPASSED!\033[0m
executing: ../../test/kmeans/run1.cmd... \033[0;32mPASSED!\033[0m
executing: ../../test/kmeans/run2.cmd... \033[0;32mPASSED!\033[0m
executing: ../../test/kmeans/run3.cmd... \033[0;32mPASSED!\033[0m
\033[0;35m--TESTING: lavaMD\033[0m
executing: ../../test/lavaMD/run0.cmd... \033[0;32mPASSED!\033[0m
executing: ../../test/lavaMD/run1.cmd... \033[0;32mPASSED!\033[0m
executing: ../../test/lavaMD/run2.cmd... \033[0;32mPASSED!\033[0m
executing: ../../test/lavaMD/run3.cmd... \033[0;32mPASSED!\033[0m
executing: ../../test/lavaMD/run4.cmd... \033[0;32mPASSED!\033[0m
\033[0;35m--TESTING: lud\033[0m
executing: ../../test/lud/run0.cmd... \033[0;32mPASSED!\033[0m
\033[0;35m--TESTING: myocyte\033[0m
\033[0;31mBUILD FAILURE!!\033[0m
\033[0;35m--TESTING: nn\033[0m
executing: ../../test/nn/run0.cmd... \033[0;32mPASSED!\033[0m
\033[0;35m--TESTING: nw\033[0m
executing: ../../test/nw/run0.cmd... \033[0;32mPASSED!\033[0m
\033[0;35m--TESTING: pathfinder\033[0m
executing: ../../test/pathfinder/run0.cmd... \033[0;32mPASSED!\033[0m
\033[0;35m--TESTING: srad\033[0m
executing: ../../test/srad/run0.cmd... \033[0;32mPASSED!\033[0m
\033[0;35m--TESTING: streamcluster\033[0m
executing: ../../test/streamcluster/run0.cmd... \033[0;32mPASSED!\033[0m

</details>

@c0d3st0rm
Copy link

c0d3st0rm commented Jun 15, 2020

Also, I dumped /sys/class/drm/card0/device/hwmon/hwmonN before+after the whine starts/ends (dump, run clinfo, dump): here are the significant ones (the others were temps etc):

diff before_whine/freq1_input after_whine/freq1_input
1c1
< 26000000
---
> 1115000000
diff before_whine/freq2_input after_whine/freq2_input
1c1
< 167000000
---
> 800000000
diff before_whine/in0_input after_whine/in0_input
1c1
< 750
---
> 931
diff before_whine/power1_average after_whine/power1_average
1c1
< 3000000
---
> 8000000

Looks like the GPU might not be clocking down?

Edit: additional info from /sys/kernel/debug/dri/0/amdgpu_pm_info:

30,31c30,31
<       167 MHz (MCLK)
<       26 MHz (SCLK)
---
>       800 MHz (MCLK)
>       1117 MHz (SCLK)
34,35c34,35
<       800 mV (VDDGFX)
<       3.0 W (average GPU)
---
>       931 mV (VDDGFX)
>       8.0 W (average GPU)
37c37
< GPU Temperature: 43 C
---
> GPU Temperature: 45 C

Also, /sys/class/drm/card0/device/power_dpm_state says the GPU is stuck in performance mode.

I think this warrants a separate issue by this point?

@tpkessler
Copy link
Member

Ok, the hip examples work. In view of the related issues you found it seems to be an upstream issue. Do you mind reporting your problem there? Have you tried to set the DPM states as suggested here?

@c0d3st0rm
Copy link

I have, and it makes no difference. I'll report this upstream - sorry to spam this thread. OpenCL now seems to work too, however I was experiencing the same issue as the OP.

@c0d3st0rm
Copy link

So regarding the Blender crashes - I wasn't wrong. OpenCL on my Vega with ROCm 3.5.0 still crashes just after compiling (or failing to compile) the render kernels for the classroom demo:

find_node_operation: Failed for (DRIVER, 'cycles.seed')
find_node_operation: Failed for (DRIVER, 'cycles.seed')
Cycles: compiling OpenCL program split_shadow_blocked_dl...
sh: line 1:  2122 Segmentation fault      (core dumped) "/usr/bin/blender" "--background" "--factory-startup" "--python-expr" "import _cycles; _cycles.opencl_compile(r'0', r'Vega 10 XL/XT [Radeon RX Vega 56/64]', r'AMD Accelerated Parallel Processing', r'-cl-no-signed-zeros -cl-mad-enable -cl-std=CL2.0 -D__KERNEL_OPENCL_AMD__ -D__KERNEL_CL_KHR_FP16__ -D__SPLIT_KERNEL__ -D__COMPUTE_DEVICE_GPU__ -D__NODES_MAX_GROUP__=3 -D__NODES_FEATURES__=4 -D__NO_HAIR__ -D__NO_OBJECT_MOTION__ -D__NO_CAMERA_MOTION__ -D__NO_BAKING__ -D__NO_VOLUME__ -D__NO_SUBSURFACE__ -D__NO_BRANCHED_PATH__ -D__NO_PATCH_EVAL__ -D__NO_SHADER_RAYTRACE__', r'kernel_shadow_blocked_dl.cl', r'/home/user/.cache/cycles/kernels/cycles_kernel_split_shadow_blocked_dl_D516B9F3527A3AA2C4BC42514E8BF4DA_469FDE4105D570A0129B8C6976545EF8.clbin')" > /dev/null
Cycles: compiling OpenCL program split_shadow_blocked_dl...
Writing: /tmp/classroom.crash.txt
[1]    2040 segmentation fault (core dumped)  blender
# backtrace from /tmp/classroom.crash.txt:
blender(BLI_system_backtrace+0x34) [0x5606e86baba4]
blender(+0xd03e7d) [0x5606e6403e7d]
/usr/lib/libpthread.so.0(+0x14960) [0x7fa5dec1a960]
/opt/rocm/lib/libamd_comgr.so(+0x5d7170) [0x7fa47557c170]
/opt/rocm/lib/libamd_comgr.so(+0x3a0f4fd) [0x7fa4789b44fd]
/opt/rocm/lib/libamd_comgr.so(+0x37b3c45) [0x7fa478758c45]
/opt/rocm/lib/libamd_comgr.so(+0x463a3c0) [0x7fa4795df3c0]
/opt/rocm/lib/libamd_comgr.so(+0x40bdd71) [0x7fa479062d71]
/opt/rocm/lib/libamd_comgr.so(+0x463be19) [0x7fa4795e0e19]
/opt/rocm/lib/libamd_comgr.so(+0xf8244d) [0x7fa475f2744d]
/opt/rocm/lib/libamd_comgr.so(+0xf1a2e6) [0x7fa475ebf2e6]
/opt/rocm/lib/libamd_comgr.so(+0x1cc1bd9) [0x7fa476c66bd9]
/opt/rocm/lib/libamd_comgr.so(+0x1c7967e) [0x7fa476c1e67e]
/opt/rocm/lib/libamd_comgr.so(+0x6c1895) [0x7fa475666895]
/opt/rocm/lib/libamd_comgr.so(+0x677ce6) [0x7fa47561cce6]
/opt/rocm/lib/libamd_comgr.so(+0x67908c) [0x7fa47561e08c]
/opt/rocm/lib/libamd_comgr.so(+0x679896) [0x7fa47561e896]
/opt/rocm/lib/libamd_comgr.so(+0x6b9319) [0x7fa47565e319]
/opt/rocm/lib/libamd_comgr.so(amd_comgr_do_action+0x3b9) [0x7fa475662329]
/opt/rocm/lib/libamdocl64.so(+0xfd8be) [0x7fa59237b8be]
/opt/rocm/lib/libamdocl64.so(+0x100dad) [0x7fa59237edad]
/opt/rocm/lib/libamdocl64.so(+0x105449) [0x7fa592383449]
/opt/rocm/lib/libamdocl64.so(+0xb40c7) [0x7fa5923320c7]
blender(_ZN3ccl12OpenCLDevice13OpenCLProgram12build_kernelEPKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x204) [0x5606e7553834]
blender(_ZN3ccl12OpenCLDevice13OpenCLProgram14compile_kernelEPKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x303) [0x5606e75568d3]
blender(_ZN3ccl12OpenCLDevice13OpenCLProgram7compileEv+0xaaa) [0x5606e755a77a]
blender(_ZN3ccl13TaskScheduler10thread_runEi+0x96) [0x5606e8072b06]
blender(_ZN3ccl6thread3runEPv+0x1f) [0x5606e8074d6f]
/usr/lib/libstdc++.so.6(+0xcfb74) [0x7fa5d4a4fb74]
/usr/lib/libpthread.so.0(+0x9422) [0x7fa5dec0f422]
/usr/lib/libc.so.6(clone+0x43) [0x7fa5d4771bf3]

and then leaves my Vega in the weird power state.

@acxz acxz changed the title [rocm-opencl-runtime 3.50-1] Blender demo file eventually crashes while compiling kernels [rocm-opencl-runtime] Blender demo file eventually crashes while compiling kernels Jun 21, 2020
@tpkessler
Copy link
Member

tpkessler commented Jun 22, 2020

I opened an issue upstream regarding the blender issues. Today I got an answer:

The ROCm stack is primarily focused on ML and HPC applications. Unfortunately Blender does not fall into those categories, hence the ROCm releases might not be validated against it.
I'd suggest you using the amdgpu-pro driver instead if youre' planning to use Blender. Even though it's the same codebase, that driver release goes through a full workstation certification.

ROCm/ROCm-OpenCL-Runtime#123

@acxz
Copy link
Member

acxz commented Dec 2, 2020

@kode54 @c0d3st0rm @tpkessler Has anyone been able to reproduce the error with the latest rocm-opencl-runtime?

If not or if the error still exists, I'll close this issue with a upstream-wontfix label within a week.

@kode54
Copy link
Author

kode54 commented Dec 2, 2020

With the latest runtime (3.9.0-2) and dependencies, and blender 17:2.91.0-4, it still crashes. However it makes it past the kernel compilation stage, and actually "starts" the rendering phase, at which point it crashes and dumps a truncated core. The backtrace is a single address which happens to be total nonsense, and nothing more.

@acxz acxz added the wontfix This will not be worked on label Dec 2, 2020
@acxz acxz closed this as completed Dec 2, 2020
@kode54
Copy link
Author

kode54 commented Dec 3, 2020

Right, gotcha, I'll keep using opencl-amd version 19.50 until some day in the next 15-20 years when I may be able to afford a new gpu again. Then maybe rocm and amd and blender will have their shit together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
runtime error wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants