Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Renderer] GPU crash - IOCTL_KGSL_GPU_COMMAND #263

Closed
kanerogers opened this issue Jul 12, 2022 · 6 comments
Closed

[Renderer] GPU crash - IOCTL_KGSL_GPU_COMMAND #263

kanerogers opened this issue Jul 12, 2022 · 6 comments
Assignees
Labels
bug Something isn't working rendering An issue with the rendering system showstopper SHOWSTOPPER - FIX IT

Comments

@kanerogers
Copy link
Collaborator

Background

Shared memory between CPU and GPU needs to be handled carefully:

07-12 18:40:02.018 11598 11675 I VrApi   : FPS=62/72,Prd=40ms,Tear=0,Early=0,Stale=20,VSnc=0,Lat=-1,Fov=0,CPU4/GPU=2/3,1171/490MHz,OC=FF,TA=0/0/0,SP=N/N/N,Mem=1804MHz,Free=560MB,PLS=0,Temp=31.0C/0.0C,TW=1.05ms,App=13.52ms,GD=0.00ms,CPU&GPU=20.85ms,LCnt=1,GPU%=0.96,CPU%=0.07(W0.10),DSF=1.00
07-12 18:40:02.069 11598 11624 I RustStdoutStderr: Adding Damaged Helmet to world
07-12 18:40:02.069 11598 11624 I RustStdoutStderr: [HOTHAM_STRESS_TEST] There are now 237 models
07-12 18:40:02.400 11598 11625 W Adreno-GSL: <gsl_ldd_control:556>: ioctl fd 56 code 0xc040094a (IOCTL_KGSL_GPU_COMMAND) failed: errno 71 Protocol error
07-12 18:40:02.400 11598 11625 W Adreno-GSL: <log_gpu_snapshot:464>: panel.gpuSnapshotPath is not set.not generating user snapshot
07-12 18:40:02.458 11598 11625 W Adreno-GSL: <gsl_ldd_control:556>: ioctl fd 56 code 0xc040094a (IOCTL_KGSL_GPU_COMMAND) failed: errno 35 Resource deadlock would occur
07-12 18:40:02.458 11598 11625 W Adreno-GSL: <log_gpu_snapshot:464>: panel.gpuSnapshotPath is not set.not generating user snapshot
@kanerogers kanerogers self-assigned this Jul 12, 2022
@kanerogers kanerogers added bug Something isn't working rendering An issue with the rendering system labels Jul 12, 2022
kanerogers added a commit that referenced this issue Jul 12, 2022
Closes #238.

Known issues:
- Culling begins to cause deadlocks at around 200 objects
- This will be resolved in #263
@kanerogers kanerogers added the showstopper SHOWSTOPPER - FIX IT label Jul 12, 2022
@kanerogers kanerogers changed the title [Renderer] Create frame independent buffers for all resources [Renderer] GPU crash on deadlocked Jul 13, 2022
@kanerogers kanerogers changed the title [Renderer] GPU crash on deadlocked [Renderer] GPU crash - IOCTL_KGSL_GPU_COMMAND Jul 13, 2022
@kanerogers
Copy link
Collaborator Author

Work done so far:

  • Created a separate buffer for all resources that are written by the CPU / read by the GPU each frame.
  • Added an explicit pipeline barrier in the compute stage

@kanerogers
Copy link
Collaborator Author

I'm still seeing this issue, which is pretty frustrating. My next approach is to use the arm best practices layer and keep digging through any other optimisation problems.

@kanerogers
Copy link
Collaborator Author

Closed in #266

@kanerogers
Copy link
Collaborator Author

As an update, we've reached out to Meta (Oculus) re: this issue and they're looking into it.

@kanerogers kanerogers reopened this Jul 24, 2022
@kanerogers
Copy link
Collaborator Author

With Oculus' help I was able to uncover the following:

09-26 11:46:04.531     0     0 E kgsl-3d0: ust.the_station[22066]: gpu fault ctx 37 ctx_type VK ts 92 status 00E795A7 rb 1f80/1fa1 ib1 00000040000CE000/007d ib2 00000040000D5350/0000
09-26 11:46:04.531     0     0 E kgsl-3d0: ust.the_station[22066]: gpu fault rb 2 rb sw r/w 1f80/1fa1
09-26 11:46:04.531     0     0 E platform 3d6a000.qcom,gmu: Suspended GMU
09-26 11:46:04.548     0     0 E kgsl-3d0: ust.the_station[22066]: gpu skipcmd ctx 37 ts 93 policy 80
09-26 11:46:04.641     0     0 F kgsl-3d0: MISC: GPU hang detected
09-26 11:46:04.641     0     0 E kgsl-3d0: ust.the_station[22066]: gpu fault ctx 37 ctx_type VK ts 100 status 00E795A7 rb 0526/0526 ib1 00000040000CE000/007d ib2 0000004000015350/0000
09-26 11:46:04.641     0     0 E kgsl-3d0: ust.the_station[22066]: gpu fault rb 2 rb sw r/w 0526/05af
09-26 11:46:04.641     0     0 E platform 3d6a000.qcom,gmu: Suspended GMU
09-26 11:46:04.657     0     0 E kgsl-3d0: ust.the_station[22066]: gpu skipcmd ctx 37 ts 101 policy 80
09-26 11:46:04.710     0     0 F kgsl-3d0: MISC: GPU hang detected
09-26 11:46:04.710     0     0 E kgsl-3d0: ust.the_station[22066]: gpu fault ctx 37 ctx_type VK ts 105 status 00E795A7 rb 038b/038b ib1 00000040012FD000/007d ib2 00000040000E5350/0000
09-26 11:46:04.710     0     0 E kgsl-3d0: ust.the_station[22066]: gpu fault rb 2 rb sw r/w 038b/0414
09-26 11:46:04.710     0     0 E platform 3d6a000.qcom,gmu: Suspended GMU
09-26 11:46:04.727     0     0 E kgsl-3d0: ust.the_station[22066]: gpu skipcmd ctx 37 ts 106 policy 80
09-26 11:46:04.780     0     0 F kgsl-3d0: MISC: GPU hang detected
09-26 11:46:04.780     0     0 E kgsl-3d0: ust.the_station[22066]: gpu fault ctx 37 ctx_type VK ts 110 status 00E795A7 rb 038b/038b ib1 00000040000CE000/007d ib2 00000040000D5350/0000
09-26 11:46:04.780     0     0 E kgsl-3d0: ust.the_station[22066]: gpu fault rb 2 rb sw r/w 038b/0414
09-26 11:46:04.780     0     0 E platform 3d6a000.qcom,gmu: Suspended GMU
09-26 11:46:04.784     0     0 E kgsl-3d0: ust.the_station[22066]: gpu fault threshold exceeded 3 faults in 3000 msecs
09-26 11:46:04.784     0     0 E kgsl-3d0: ust.the_station[22066]: gpu skipped ctx 37 ts 110

That definitely points us forward. Will take a closer look.

@kanerogers
Copy link
Collaborator Author

kanerogers commented Jul 26, 2022

I am now certain that the cause of this bug is vkCmdDrawIndirect. The solution is to switch to instanced rendering instead: #284

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rendering An issue with the rendering system showstopper SHOWSTOPPER - FIX IT
Projects
None yet
Development

No branches or pull requests

1 participant