Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA Runtime error #41

Closed
jczh98 opened this issue Sep 21, 2020 · 11 comments
Closed

CUDA Runtime error #41

jczh98 opened this issue Sep 21, 2020 · 11 comments

Comments

@jczh98
Copy link

jczh98 commented Sep 21, 2020

Environment: Windows10+2xRTX2080TI+CUDA 11.0+Optix 7.1
Testing scene: Each scene of pbrt-v4-scenes
Error Message:

20200921.130209 D:/work/pbrt-v4/src/pbrt/gpu/accel.cpp:1054 ] FATAL CUDA error: unspecified launch failure
0x00007FF6D0BA4160 - pbrt::PrintStackTrace + line 120
(D:\work\pbrt-v4\src\pbrt\util\check.cpp )      0x00007FF6D0BA4520 - pbrt::CheckCallbackScope::Fail + line 148
(D:\work\pbrt-v4\src\pbrt\util\log.cpp   )      0x00007FF6D07555D0 - pbrt::LogFatal + line 177
(D:\work\pbrt-v4\src\pbrt\util\log.h     )      0x00007FF6D07348B0 - pbrt::LogFatal<char const *> + line 112
(D:\work\pbrt-v4\src\pbrt\gpu\accel.cpp  )      0x00007FF6D0C99080 - pbrt::GPUAccel::getParamBuffer + line 1056
(D:\work\pbrt-v4\src\pbrt\gpu\accel.cpp  )      0x00007FF6D0C962C0 - pbrt::GPUAccel::IntersectShadowTr + line 1153
(D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)       0x00007FF6D0816D80 - pbrt::GPUPathIntegrator::TraceShadowRays + line 247
(D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)       0x00007FF6D0815E50 - pbrt::GPUPathIntegrator::Render + line 437
(D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)       0x00007FF6D0813DA0 - pbrt::GPURender + line 620
(D:\work\pbrt-v4\src\pbrt\cmd\pbrt.cpp   )      0x00007FF6D0719F40 - main + line 239
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF6D1089C90 - invoke_main + line 79
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF6D1089A40 - __scrt_common_main_seh + line 288
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF6D1089A20 - __scrt_common_main + line 331
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp) 0x00007FF6D1089D50 - mainCRTStartup + line 17
(unknown                                 )      0x00007FF8173E7BC0 - BaseThreadInitThunk
(unknown                                 )      0x00007FF81818CEB0 - RtlUserThreadStart
@mmp
Copy link
Owner

mmp commented Sep 22, 2020

Hm, that's strange... It's not immediately clear to me what the problem might be. Would you mind trying a few things?

  • If you could try a debug build, that might give a bit more information about what's going wrong.
  • Then if you could also run with --log-level verbose and attach the output, that would also help make clear where exactly it's crashing. (The problem is likely not in that IntersectShadowTr method, due to the CPU and GPU executing asynchronously...)
  • I can't imagine why it would matter, but I'd be interested to hear what happens if you run one time with --gpu-device 0 and another time with --gpu-device 1.

Thanks!

@jczh98
Copy link
Author

jczh98 commented Sep 23, 2020

Thank you for your prompt reply. Currently I compiles pbrt-v4 in Debug mode with commit 9624b3e3a95ecafa5229e32c1b37ce90d85a1384, test-scene is head

  • for --log-level verbose outputs
[ 4468.000 20200923.100710 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 1
[ 4216.000 20200923.100710 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 2
[ 2700.000 20200923.100710 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 3
[ 960.000 20200923.100710 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 4
[ 12420.000 20200923.100710 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 5
[ 2256.000 20200923.100710 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 6
[ 11884.000 20200923.100710 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 7
[ 592.000 20200923.100710 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:34 ] VERBOSE GPU CUDA driver 11.1, CUDA runtime 11.0
[ 592.000 20200923.100710 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:53 ] VERBOSE CUDA device 0 (GeForce RTX 2080 Ti) with 11264 MiB, 68 SMs running at 1620 MHz with shader model 7.5
[ 592.000 20200923.100710 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:53 ] VERBOSE CUDA device 1 (GeForce RTX 2080 Ti) with 11264 MiB, 68 SMs running at 1620 MHz with shader model 7.5
[ 592.000 20200923.100710 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:66 ] VERBOSE Selecting GPU device 0
[ 592.000 20200923.100710 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:82 ] VERBOSE Reset stack size to 8192
[ 592.000 20200923.100711 D:\work\pbrt-v4\src\pbrt\parser.cpp:121 ] VERBOSE Creating Tokenizer for D:\work\pbrt-v4-scenes-master\head\head.pbrt
[ 592.000 20200923.100711 D:\work\pbrt-v4\src\pbrt/film.h:189 ] VERBOSE Created film with full resolution [ 1920, 1080 ], pixelBounds [ [ 576, 162 ] - [ 1536, 756 ] ]
[ 592.000 20200923.100711 D:/work/pbrt-v4/src/pbrt/cameras.cpp:222 ] VERBOSE Camera min pos differentials: [ 0, 0, -0 ], [ 0, 0, -0 ]
[ 592.000 20200923.100711 D:/work/pbrt-v4/src/pbrt/cameras.cpp:224 ] VERBOSE Camera min dir differentials: [ -0.0003902614, -0.000063970685, 0 ], [ -0.000021576883, 0.00042261186, -5.9604645e-8 ]
[ 592.000 20200923.100713 D:\work\pbrt-v4\src\pbrt\util\image.cpp:1137 ] VERBOSE Read EXR image D:\work\pbrt-v4-scenes-master\head\textures\doge2_equiarea.exr (4096 x 4096)
[ 592.000 20200923.100736 D:/work/pbrt-v4/src/pbrt/gpu/accel.cpp:568 ] VERBOSE Optix successfully initialized
[ 592.000 20200923.100736 D:\work\pbrt-v4\src\pbrt\parsedscene.cpp:858 ] VERBOSE Loading 0,1 textures in parallel, 0,0 serially
[ 592.000 20200923.100748 D:\work\pbrt-v4\src\pbrt\parsedscene.cpp:888 ] VERBOSE Loading serial textures
[ 592.000 20200923.100748 D:\work\pbrt-v4\src\pbrt\parsedscene.cpp:915 ] VERBOSE Done creating textures
[ 592.000 20200923.100748 D:/work/pbrt-v4/src/pbrt/gpu/pathintegrator.cpp:204 ] VERBOSE Will render in 1 passes 594 scanlines per pass

Rendering: [                                                                                   ] 00 20200923.100749 D:/work/pbrt-v4/src/pbrt/gpu/pathintegrator.cpp:359 ] VERBOSE Starting to submit work for sample 0
[ 592.000 20200923.100749 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Reset ray queue]: block size 1024
Rendering: [                                                                                   ]  (0.1s|?s)  [ 592.000 20200923.100749 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Generate Camera rays]: block size 512
[ 592.000 20200923.100749 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Update camera ray stats]: block size 1024
[ 592.000 20200923.100749 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Generate ray samples - HaltonSampler]: block size 1024
[ 592.000 20200923.100749 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Reset queues before tracing rays]: block size 1024
Rendering: [                                                                                   ]  (0.1s|?s)  [ 592.000 20200923.100749 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Handle escaped rays]: block size 512
[ 592.000 20200923.100749 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Handle emitters hit by indirect rays]: block size 512
[ 592.000 20200923.100749 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [DiffuseMaterial + BxDF Eval (Basic tex)]: block size 512
[ 592.000 20200923.100749 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [SubsurfaceMaterial + BxDF Eval (Basic tex)]: block size 512
[ 592.000 20200923.100749 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Incorporate shadow ray contribution]: block size 1024
[ 592.000 20200923.100749 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Reset shadowRayQueue]: block size 1024
Rendering: [                                                                                   ] launch.h: (0.2s|?s)  45 ] VERBOSE [Handle medium transitions]: block size 1024
[ 592.000 20200923.100749 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Get BSSRDF and enqueue probe ray]: block size 512
[ 592.000 20200923.100749 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Handle out-scattering after SSS]: block siRendering: [                                                                                   ]  (1.4s|?s)  (D:\work\pbrt-v4\src\pbrt\util\check.cpp ) 0x00007FF6B3EE4160 - pbrt::PrintStackTrace + line 120
(D:\work\pbrt-v4\src\pbrt\util\check.cpp )      0x00007FF6B3EE4520 - pbrt::CheckCallbackScope::Fail + line 148
(D:\work\pbrt-v4\src\pbrt\util\log.cpp   )      0x00007FF6B3A955D0 - pbrt::LogFatal + line 177
(D:\work\pbrt-v4\src\pbrt\util\log.h     )      0x00007FF6B3A748B0 - pbrt::LogFatal<char const *> + line 112
Rendering: [                                                                                   ]  (1.5s|?s)  (D:\work\pbrt-v4\src\pbrt\gpu\accel.cpp  ) 0x00007FF6B3FD9080 - pbrt::GPUAccel::getParamBuffer + line 1056
(D:\work\pbrt-v4\src\pbrt\gpu\accel.cpp  )      0x00007FF6B3FD60A0 - pbrt::GPUAccel::IntersectShadow + line 1117
(D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)       0x00007FF6B3B56D80 - pbrt::GPUPathIntegrator::TraceShadowRays + line 252
(D:\work\pbrt-v4\src\pbrt\gpu\subsurface.cpp)   0x00007FF6B3FAD7C0 - pbrt::GPUPathIntegrator::SampleSubsurface + line 197
(D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)       0x00007FF6B3B55E50 - pbrt::GPUPathIntegrator::Render + line 441
(D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)       0x00007FF6B3B53DA0 - pbrt::GPURender + line 620
(D:\work\pbrt-v4\src\pbrt\cmd\pbrt.cpp   )      0x00007FF6B3A59F40 - main + line 239
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF6B43C9C90 - invoke_main + line 79
Rendering: [                                                                                   ]  (1.6s|?s)  (D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)  0x00007FF6B43C9A40 - __scrt_common_main_seh + line 288
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF6B43C9A20 - __scrt_common_main + line 331
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp) 0x00007FF6B43C9D50 - mainCRTStartup + line 17
(unknown                                 )      0x00007FFB67CF7BC0 - BaseThreadInitThunk
(unknown                                 )      0x00007FFB6896CE30 - RtlUserThreadStart
  • for --gpu-device 0 with verbose outputs
[ 3064.000 20200923.100818 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:34 ] VERBOSE GPU CUDA driver 11.1, CUDA runtime 11.0
[ 5536.000 20200923.100818 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 3
[ 13092.000 20200923.100818 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 5
[ 13004.000 20200923.100818 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 1
[ 6096.000 20200923.100818 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 4
[ 7960.000 20200923.100818 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 2
[ 1376.000 20200923.100818 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 6
[ 10040.000 20200923.100818 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 7
[ 3064.000 20200923.100818 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:53 ] VERBOSE CUDA device 0 (GeForce RTX 2080 Ti) with 11264 MiB, 68 SMs running at 1620 MHz with shader model 7.5
[ 3064.000 20200923.100818 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:66 ] VERBOSE Selecting GPU device 0
[ 3064.000 20200923.100818 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:82 ] VERBOSE Reset stack size to 8192
[ 3064.000 20200923.100818 D:\work\pbrt-v4\src\pbrt\parser.cpp:121 ] VERBOSE Creating Tokenizer for D:\work\pbrt-v4-scenes-master\head\head.pbrt
[ 3064.000 20200923.100818 D:\work\pbrt-v4\src\pbrt/film.h:189 ] VERBOSE Created film with full resolution [ 1920, 1080 ], pixelBounds [ [ 576, 162 ] - [ 1536, 756 ] ]
[ 3064.000 20200923.100819 D:/work/pbrt-v4/src/pbrt/cameras.cpp:222 ] VERBOSE Camera min pos differentials: [ 0, 0, -0 ], [ 0, 0, -0 ]
[ 3064.000 20200923.100819 D:/work/pbrt-v4/src/pbrt/cameras.cpp:224 ] VERBOSE Camera min dir differentials: [ -0.0003902614, -0.000063970685, 0 ], [ -0.000021576883, 0.00042261186, -5.9604645e-8 ]
[ 3064.000 20200923.100821 D:\work\pbrt-v4\src\pbrt\util\image.cpp:1137 ] VERBOSE Read EXR image D:\work\pbrt-v4-scenes-master\head\textures\doge2_equiarea.exr (4096 x 4096)
[ 3064.000 20200923.100843 D:/work/pbrt-v4/src/pbrt/gpu/accel.cpp:568 ] VERBOSE Optix successfully initialized
[ 3064.000 20200923.100843 D:\work\pbrt-v4\src\pbrt\parsedscene.cpp:858 ] VERBOSE Loading 0,1 textures in parallel, 0,0 serially
[ 3064.000 20200923.100855 D:\work\pbrt-v4\src\pbrt\parsedscene.cpp:888 ] VERBOSE Loading serial textures
[ 3064.000 20200923.100855 D:\work\pbrt-v4\src\pbrt\parsedscene.cpp:915 ] VERBOSE Done creating textures
[ 3064.000 20200923.100855 D:/work/pbrt-v4/src/pbrt/gpu/pathintegrator.cpp:204 ] VERBOSE Will render in 1 passes 594 scanlines per pass

Rendering: [                                                                                   ]  3064.000 20200923.100855 D:/work/pbrt-v4/src/pbrt/gpu/pathintegrator.cpp:359 ] VERBOSE Starting to submit work for sample 0
[ 3064.000 20200923.100855 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Reset ray queue]: block size 1024
[ 3064.000 20200923.100855 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Generate Camera rays]: block size 512
[ 3064.000 20200923.100855 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Update camera ray stats]: block size 1024
[ 3064.000 20200923.100855 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Generate ray samples - HaltonSampler]: block size 1024
[ 3064.000 20200923.100856 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Reset queues before tracing rays]: block size 1024
[ 3064.000 20200923.100856 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Handle escaped rays]: block size 512
[ 3064.000 20200923.100856 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Handle emitters hit by indirect rays]: block size 512
[ 3064.000 20200923.100856 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [DiffuseMaterial + BxDF Eval (Basic tex)]: block size 512
[ 3064.000 20200923.100856 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [SubsurfaceMaterial + BxDF Eval (Basic tex)]: block size 512
Rendering: [                                                                                   ]  (0.1s|?s)  [ 3064.000 20200923.100856 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Incorporate shadow ray contribution]: block size 1024
[ 3064.000 20200923.100856 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Reset shadowRayQueue]: block size 1024
[ 3064.000 20200923.100856 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Handle medium transitions]: block size 1024
[ 3064.000 20200923.100856 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Get BSSRDF and enqueue probe ray]: block size 512
[ 3064.000 20200923.100856 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Handle out-scattering after SSS]: block size 512
24
[ 3064.000 20200923.100856 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Update Film]: block size 640
[ 3064.000 20200923.100856 D:\work\pbrt-v4\src\pbrt/util/progressreporter.h:57 ] FATAL Check failed: cudaEventRecord(gpuEvents[gpuEventsLaunchedOffset]) == cudaSuccess with cudaEventRecord(gpuEvents[gpuEventsLaunchedOffset]) = 719, cudaSuccess = 0
Rendering: [                                                                                   ]  (0.2s|?s)  (D:\work\pbrt-v4\src\pbrt\util\check.cpp ) 0x00007FF6B3EE4160 - pbrt::PrintStackTrace + line 120
(D:\work\pbrt-v4\src\pbrt\util\check.cpp )      0x00007FF6B3EE4520 - pbrt::CheckCallbackScope::Fail + line 148
(D:\work\pbrt-v4\src\pbrt\util\log.cpp   )      0x00007FF6B3A955D0 - pbrt::LogFatal + line 177
(D:\work\pbrt-v4\src\pbrt\util\log.h     )      0x00007FF6B3B65660 - pbrt::LogFatal<char const (&)[52],char const (&)[12Rendering: [                                                                                   ] 1 (0.2s|?s)  12
(D:\work\pbrt-v4\src\pbrt\util\progressreporter.h)      0x00007FF6B3B82B80 - pbrt::ProgressReporter::Update + line 56
(D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)       0x00007FF6B3B55E50 - pbrt::GPUPathIntegrator::Render + line 459
(D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)       0x00007FF6B3B53DA0 - pbrt::GPURender + line 620
(D:\work\pbrt-v4\src\pbrt\cmd\pbrt.cpp   )      0x00007FF6B3A59F40 - main + line 239
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF6B43C9C90 - invoke_main + line 79
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF6B43C9A40 - __scrt_common_main_seh + line 288
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF6B43C9A20 - __scrt_common_main + line 331
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp) 0x00007FF6B43C9D50 - mainCRTStartup + line 17
(unknown                                 )      0x00007FFB67CF7BC0 - BaseThreadInitThunk
(unknown                                 )      0x00007FFB6896CE30 - RtlUserThreadStart
  • for --gpu-device 1 with verbose outputs
[ 10088.000 20200923.100943 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 1
[ 8616.000 20200923.100943 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 2
[ 12836.000 20200923.100943 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:34 ] VERBOSE GPU CUDA driver 11.1, CUDA runtime 11.0
[ 1556.000 20200923.100943 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 4
[ 2164.000 20200923.100943 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 3
[ 11188.000 20200923.100943 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 5
[ 10816.000 20200923.100943 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 6
[ 1668.000 20200923.100943 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 7
[ 12836.000 20200923.100943 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:53 ] VERBOSE CUDA device 0 (GeForce RTX 2080 Ti) with 11264 MiB, 68 SMs running at 1620 MHz with shader model 7.5
[ 12836.000 20200923.100943 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:66 ] VERBOSE Selecting GPU device 0
[ 12836.000 20200923.100943 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:82 ] VERBOSE Reset stack size to 8192
[ 12836.000 20200923.100943 D:\work\pbrt-v4\src\pbrt\parser.cpp:121 ] VERBOSE Creating Tokenizer for D:\work\pbrt-v4-scenes-master\head\head.pbrt
[ 12836.000 20200923.100943 D:\work\pbrt-v4\src\pbrt/film.h:189 ] VERBOSE Created film with full resolution [ 1920, 1080 ], pixelBounds [ [ 576, 162 ] - [ 1536, 756 ] ]
[ 12836.000 20200923.100943 D:/work/pbrt-v4/src/pbrt/cameras.cpp:222 ] VERBOSE Camera min pos differentials: [ 0, 0, -0 ], [ 0, 0, -0 ]
[ 12836.000 20200923.100943 D:/work/pbrt-v4/src/pbrt/cameras.cpp:224 ] VERBOSE Camera min dir differentials: [ -0.0003902614, -0.000063970685, 0 ], [ -0.000021576883, 0.00042261186, -5.9604645e-8 ]
[ 12836.000 20200923.100946 D:\work\pbrt-v4\src\pbrt\util\image.cpp:1137 ] VERBOSE Read EXR image D:\work\pbrt-v4-scenes-master\head\textures\doge2_equiarea.exr (4096 x 4096)
[ 12836.000 20200923.101009 D:/work/pbrt-v4/src/pbrt/gpu/accel.cpp:568 ] VERBOSE Optix successfully initialized
[ 12836.000 20200923.101009 D:\work\pbrt-v4\src\pbrt\parsedscene.cpp:858 ] VERBOSE Loading 0,1 textures in parallel, 0,0 serially
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt\parsedscene.cpp:888 ] VERBOSE Loading serial textures
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt\parsedscene.cpp:915 ] VERBOSE Done creating textures
[ 12836.000 20200923.101021 D:/work/pbrt-v4/src/pbrt/gpu/pathintegrator.cpp:204 ] VERBOSE Will render in 1 passes 594 scanlines per pass

Rendering: [                                                                                   ] 0200923.101021 D:/work/pbrt-v4/src/pbrt/gpu/pathintegrator.cpp:359 ] VERBOSE Starting to submit work for sample 0
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Reset ray queue]: block size 1024
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Generate Camera rays]: block size 512
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Update camera ray stats]: block size 1024
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Generate ray samples - HaltonSampler]: block size 1024
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Reset queues before tracing rays]: block size 1024
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Handle escaped rays]: block size 512
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Handle emitters hit by indirect rays]: block size 512
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [DiffuseMaterial + BxDF Eval (Basic tex)]: block size 512
Rendering: [                                                                                   ]  (0.1s|?s)  [ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [SubsurfaceMaterial + BxDF Eval (Basic tex)]: block size 512
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Incorporate shadow ray contribution]: block size 1024
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Reset shadowRayQueue]: block size 1024
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Handle medium transitions]: block size 1024
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Get BSSRDF and enqueue probe ray]: block size 512
[ 12836.000 20200923.101021 D:\work\pbrt-v4\src\pbrt/gpu/launch.h:45 ] VERBOSE [Handle out-scattering after SSS]: block [ 12836.000 20200923.101021 D:/work/pbrt-v4/src/pbrt/gpu/accel.cpp:1054 ] FATAL CUDA error: unspecified launch failure
Rendering: [                                                                                   ]  (0.1s|?s)  (D:\work\pbrt-v4\src\pbrt\util\check.cpp ) 0x00007FF6B3EE4160 - pbrt::PrintStackTrace + line 120
(D:\work\pbrt-v4\src\pbrt\util\check.cpp )      0x00007FF6B3EE4520 - pbrt::CheckCallbackScope::Fail + line 148
(D:\work\pbrt-v4\src\pbrt\util\log.cpp   )      0x00007FF6B3A955D0 - pbrt::LogFatal + line 177
(D:\work\pbrt-v4\src\pbrt\util\log.h     )      0x00007FF6B3A748B0 - pbrt::LogFatal<char const *> + line 112
(D:\work\pbrt-v4\src\pbrt\gpu\accel.cpp  )      0x00007FF6B3FD9080 - pbrt::GPUAccel::getParamBuffer + line 1056
(D:\work\pbrt-v4\src\pbrt\gpu\accel.cpp  )      0x00007FF6B3FD60A0 - pbrt::GPUAccel::IntersectShadow + line 1117
(D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)       0x00007FF6B3B56D80 - pbrt::GPUPathIntegrator::TraceShadowRays + line 252
(D:\work\pbrt-v4\src\pbrt\gpu\subsurface.cpp)   0x00007FF6B3FAD7C0 - pbrt::GPUPathIntegrator::SampleSubsurface + line 197
(D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)       0x00007FF6B3B55E50 - pbrt::GPUPathIntegrator::Render + line 441
Rendering: [                                                                                   ]  (0.2s|?s)  (D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)  0x00007FF6B3B53DA0 - pbrt::GPURender + line 620
(D:\work\pbrt-v4\src\pbrt\cmd\pbrt.cpp   )      0x00007FF6B3A59F40 - main + line 239
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF6B43C9C90 - invoke_main + line 79
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF6B43C9A40 - __scrt_common_main_seh + line 288
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF6B43C9A20 - __scrt_common_main + line 331
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp) 0x00007FF6B43C9D50 - mainCRTStartup + line 17
(unknown                                 )      0x00007FFB67CF7BC0 - BaseThreadInitThunk
(unknown                                 )      0x00007FFB6896CE30 - RtlUserThreadStart

Thanks for your help

@mmp
Copy link
Owner

mmp commented Sep 23, 2020

It looks like either you're not actually running a debug build or (more likely..) the build rules on Windows aren't setting the preprocessor definitions correctly there. (One way or another, in debug builds, NDEBUG should not be #defined, and so some additional logging and synchronization around kernel launches should be happening.)

In any case, could you try adding #undef NDEBUG at line 25 of src/pbrt/gpu/launch.h and recompiling and send a log from that? Sorry for the trouble and thanks for your help chasing this down...

@jczh98
Copy link
Author

jczh98 commented Sep 24, 2020

  • for --log-level verbose outputs
[ 17696.000 20200924.200920 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:34 ] VERBOSE GPU CUDA driver 11.1, CUDA runtime 11.0
[ 10528.000 20200924.200920 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 2
[ 8788.000 20200924.200920 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 1
[ 16920.000 20200924.200920 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 6
[ 16796.000 20200924.200920 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 4
[ 1996.000 20200924.200920 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 3
[ 16828.000 20200924.200920 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 5
[ 13224.000 20200924.200920 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 7
[ 17696.000 20200924.200920 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:53 ] VERBOSE CUDA device 0 (GeForce RTX 2080 Ti) with 11264 MiB, 68 SMs running at 1620 MHz with shader model 7.5
[ 17696.000 20200924.200920 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:53 ] VERBOSE CUDA device 1 (GeForce RTX 2080 Ti) with 11264 MiB, 68 SMs running at 1620 MHz with shader model 7.5
[ 17696.000 20200924.200920 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:66 ] VERBOSE Selecting GPU device 0
[ 17696.000 20200924.200920 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:82 ] VERBOSE Reset stack size to 8192
[ 17696.000 20200924.200920 D:\work\pbrt-v4\src\pbrt\parser.cpp:121 ] VERBOSE Creating Tokenizer for D:\work\pbrt-v4-scenes-master\head\head.pbrt
[ 17696.000 20200924.200920 D:\work\pbrt-v4\src\pbrt/film.h:189 ] VERBOSE Created film with full resolution [ 1920, 1080 ], pixelBounds [ [ 576, 162 ] - [ 1536, 756 ] ]
[ 17696.000 20200924.200920 D:/work/pbrt-v4/src/pbrt/cameras.cpp:222 ] VERBOSE Camera min pos differentials: [ 0, 0, -0 2614, -0.000063970685, 0 ], [ -0.000021576883, 0.00042261186, -5.9604645e-8 ]
[ 17696.000 20200924.200922 D:\work\pbrt-v4\src\pbrt\util\image.cpp:1137 ] VERBOSE Read EXR image D:\work\pbrt-v4-scenes-master\head\textures\doge2_equiarea.exr (4096 x 4096)
[ 17696.000 20200924.200943 D:/work/pbrt-v4/src/pbrt/gpu/accel.cpp:575 ] VERBOSE Optix successfully initialized
[ 17696.000 20200924.200943 D:/work/pbrt-v4/src/pbrt/gpu/accel.cpp:610 ] FATAL OptiX call optixModuleCreateFromPTX(optixContext, &moduleCompileOptions, &pipelineCompileOptions, ptxCode.c_str(), ptxCode.size(), log, &logSize, &optixModule) failed with code 7001: "Invalid value"
(D:\work\pbrt-v4\src\pbrt\util\check.cpp )      0x00007FF692361220 - pbrt::PrintStackTrace + line 120
(D:\work\pbrt-v4\src\pbrt\util\check.cpp )      0x00007FF6923615E0 - pbrt::CheckCallbackScope::Fail + line 148
(D:\work\pbrt-v4\src\pbrt\util\log.cpp   )      0x00007FF691F14790 - pbrt::LogFatal + line 177
(D:\work\pbrt-v4\src\pbrt\util\log.h     )      0x00007FF6924610F0 - pbrt::LogFatal<int,char const *> + line 112
(D:\work\pbrt-v4\src\pbrt\gpu\accel.cpp  )      0x00007FF69244F3C0 - pbrt::GPUAccel::GPUAccel + line 608
(D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)       0x00007FF691FD3C40 - pbrt::GPUPathIntegrator::GPUPathIntegrator + line 159
(D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)       0x00007FF691FD3690 - pbrt::GPURender + line 570
(D:\work\pbrt-v4\src\pbrt\cmd\pbrt.cpp   )      0x00007FF691ED9B40 - main + line 237
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF692842F10 - invoke_main + line 79
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF692842CC0 - __scrt_common_main_seh + line 288
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF692842CA0 - __scrt_common_main + line 331
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp) 0x00007FF692842FD0 - mainCRTStartup + line 17
(unknown                                 )      0x00007FFB67CF7BC0 - BaseThreadInitThunk
(unknown                                 )      0x00007FFB6896CE30 - RtlUserThreadStart
  • for --gpu-device 0/1 with verbose outputs
[ 5668.000 20200924.201632 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:34 ] VERBOSE GPU CUDA driver 11.1, CUDA runtime 11.0
[ 17944.000 20200924.201632 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 2
[ 16844.000 20200924.201632 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 1
[ 16720.000 20200924.201632 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 4
[ 7120.000 20200924.201632 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 5
[ 18080.000 20200924.201632 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 3
[ 12908.000 20200924.201632 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 6
[ 15892.000 20200924.201632 D:\work\pbrt-v4\src\pbrt\util\parallel.cpp:138 ] VERBOSE Started execution in worker thread 7
[ 5668.000 20200924.201632 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:53 ] VERBOSE CUDA device 0 (GeForce RTX 2080 Ti) with 11264 MiB, 68 SMs running at 1620 MHz with shader model 7.5
[ 5668.000 20200924.201632 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:66 ] VERBOSE Selecting GPU device 0
[ 5668.000 20200924.201632 D:/work/pbrt-v4/src/pbrt/gpu/init.cpp:82 ] VERBOSE Reset stack size to 8192
[ 5668.000 20200924.201633 D:\work\pbrt-v4\src\pbrt\parser.cpp:121 ] VERBOSE Creating Tokenizer for D:\work\pbrt-v4-scenes-master\head\head.pbrt
[ 5668.000 20200924.201633 D:\work\pbrt-v4\src\pbrt/film.h:189 ] VERBOSE Created film with full resolution [ 1920, 1080 ], pixelBounds [ [ 576, 162 ] - [ 1536, 756 ] ]
[ 5668.000 20200924.201633 D:/work/pbrt-v4/src/pbrt/cameras.cpp:222 ] VERBOSE Camera min pos differentials: [ 0, 0, -0 ], [ 0, 0, -0 ]
[ 5668.000 20200924.201633 D:/work/pbrt-v4/src/pbrt/cameras.cpp:224 ] VERBOSE Camera min dir differentials: [ -0.0003902614, -0.000063970685, 0 ], [ -0.000021576883, 0.00042261186, -5.9604645e-8 ]
[ 5668.000 20200924.201635 D:\work\pbrt-v4\src\pbrt\util\image.cpp:1137 ] VERBOSE Read EXR image D:\work\pbrt-v4-scenes-master\head\textures\doge2_equiarea.exr (4096 x 4096)
[ 5668.000 20200924.201659 D:/work/pbrt-v4/src/pbrt/gpu/accel.cpp:575 ] VERBOSE Optix successfully initialized
[ 5668.000 20200924.201659 D:/work/pbrt-v4/src/pbrt/gpu/accel.cpp:610 ] FATAL OptiX call optixModuleCreateFromPTX(optixContext, &moduleCompileOptions, &pipelineCompileOptions, ptxCode.c_str(), ptxCode.size(), log, &logSize, &optixModule) failed with code 7001: "Invalid value"
(D:\work\pbrt-v4\src\pbrt\util\check.cpp )      0x00007FF692361220 - pbrt::PrintStackTrace + line 120
(D:\work\pbrt-v4\src\pbrt\util\check.cpp )      0x00007FF6923615E0 - pbrt::CheckCallbackScope::Fail + line 148
(D:\work\pbrt-v4\src\pbrt\util\log.cpp   )      0x00007FF691F14790 - pbrt::LogFatal + line 177
(D:\work\pbrt-v4\src\pbrt\util\log.h     )      0x00007FF6924610F0 - pbrt::LogFatal<int,char const *> + line 112
(D:\work\pbrt-v4\src\pbrt\gpu\accel.cpp  )      0x00007FF69244F3C0 - pbrt::GPUAccel::GPUAccel + line 608
(D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)       0x00007FF691FD3C40 - pbrt::GPUPathIntegrator::GPUPathIntegrator + line 159
(D:\work\pbrt-v4\src\pbrt\gpu\pathintegrator.cpp)       0x00007FF691FD3690 - pbrt::GPURender + line 570
(D:\work\pbrt-v4\src\pbrt\cmd\pbrt.cpp   )      0x00007FF691ED9B40 - main + line 237
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF692842F10 - invoke_main + line 79
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF692842CC0 - __scrt_common_main_seh + line 288
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl)       0x00007FF692842CA0 - __scrt_common_main + line 331
(D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_main.cpp) 0x00007FF692842FD0 - mainCRTStartup + line 17
(unknown                                 )      0x00007FFB67CF7BC0 - BaseThreadInitThunk
(unknown                                 )      0x00007FFB6896CE30 - RtlUserThreadStart

I'm trying to figure out where's the actual bug but failed. Really eagering and thanking you for your help

@jiangwei007
Copy link

@neverfelly yes, i'm running on windows with gpu rendering had error same as your . before early commit has running correct.

mmp added a commit that referenced this issue Jan 22, 2021
@mmp
Copy link
Owner

mmp commented Jan 22, 2021

Unfortunately I don't have the bandwidth to debug Windows GPU support at the moment and am unlikely to be able to for a few months. I have therefore updated the build so that building with GPU support on Windows is prohibited for now. Needless to say, PRs to fix this would be highly welcome. :-)

@pierremoreau
Copy link
Contributor

Some additional information from running PBRT through the CUDA debugger on Windows:

  • the issue is taking place while running the intersect shadow OptiX program for the very first hit at the very first pixel, so possibly something that is not configured properly for the limited unified memory support found on Windows (I’m guessing).
  • inside the OptiX program, it looks like the crash is occurring in SOA<SampleSpectrum>::operator[]() when performing the Load() inside the for-loop; I sadly can not see the content of any of the variables there nor see the call stack that led to this call.

@mmp
Copy link
Owner

mmp commented Feb 11, 2021

Interesting... There haven't been many changes to the OptiX code between the last time that it worked on Windows and now, but one of the few relates to shadow rays and updating the SampledSpectrum for pixels when they are unoccluded: 82ace32.

However, I can't see anything suspicious in there...

@pierremoreau
Copy link
Contributor

Thanks for the pointer! I will try reverting that commit, or commenting out the reads&writes to params.pixelSampleState and see if that helps.

@pierremoreau
Copy link
Contributor

Reverting that commit allows me to render killeroo-gold.pbrt without any issues (apart from #108) in Debug mode on Windows. When trying in Release mode instead, I still get a launch error (could be in a different place though, I have not checked yet).

mmp added a commit that referenced this issue Apr 17, 2021
This fixes the debug build on Windows on the GPU. (Release crashes with OptiX complaining about malformed PTX.)

The issue is essentially the same as why *this is copied in GPU lambdas
rather than being passed as a pointer; we are accessing the GPUPathIntegrator
in read-only fashion from the CPU during rendering and with unified
memory on Windows, it isn't allowed to concurrently access it on the GPU.

This also fits with the data point that 82ace32 is when things first started crashing.

Issues #41, #48, #72, #89 , and #96.
@mmp
Copy link
Owner

mmp commented Apr 17, 2021

While Windows still has issues, at least this particular one is fixed now!

@mmp mmp closed this as completed Apr 17, 2021
Dolkar pushed a commit to Dolkar/pbrt-v4-myod-integration that referenced this issue May 8, 2023
Dolkar pushed a commit to Dolkar/pbrt-v4-myod-integration that referenced this issue May 8, 2023
This fixes the debug build on Windows on the GPU. (Release crashes with OptiX complaining about malformed PTX.)

The issue is essentially the same as why *this is copied in GPU lambdas
rather than being passed as a pointer; we are accessing the GPUPathIntegrator
in read-only fashion from the CPU during rendering and with unified
memory on Windows, it isn't allowed to concurrently access it on the GPU.

This also fits with the data point that 82ace32 is when things first started crashing.

Issues mmp#41, mmp#48, mmp#72, mmp#89 , and mmp#96.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants