You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On my Titan RTX, the FusionPersistentSoftmaxLocalShared_CUDA test fails:
C++ exception with description "(dynamic_smem_size) < (available_dynamic_smem_without_reconfiguration + additional_dynamic_smem_available_through_reconfiguration) INTERNAL ASSERT FAILED at "/home/nmaruyama/pytorch/debug3/nvfuser/csrc/executor.cpp":910, please report a bug to PyTorch. The total shared memory allocation is larger than available memory. Dynamic size: 66048. Available size: 49136. Configured smem size: 49152. Device limit size: 65536
It seems this issue started to happen at PR #148. Unclear why the PR could affect shared memory usage.
The text was updated successfully, but these errors were encountered:
On my Titan RTX, the FusionPersistentSoftmaxLocalShared_CUDA test fails:
It seems this issue started to happen at PR #148. Unclear why the PR could affect shared memory usage.
The text was updated successfully, but these errors were encountered: