FusionPersistentSoftmaxLocalShared failure #163

naoyam · 2023-04-12T16:25:23Z

On my Titan RTX, the FusionPersistentSoftmaxLocalShared_CUDA test fails:

C++ exception with description "(dynamic_smem_size) < (available_dynamic_smem_without_reconfiguration + additional_dynamic_smem_available_through_reconfiguration) INTERNAL ASSERT FAILED at "/home/nmaruyama/pytorch/debug3/nvfuser/csrc/executor.cpp":910, please report a bug to PyTorch. The total shared memory allocation is larger than available memory. Dynamic size: 66048. Available size: 49136. Configured smem size: 49152. Device limit size: 65536

It seems this issue started to happen at PR #148. Unclear why the PR could affect shared memory usage.

The text was updated successfully, but these errors were encountered:

naoyam assigned zasdfgbnm Apr 12, 2023

naoyam mentioned this issue Apr 19, 2023

Fix alias analysis #185

Merged

naoyam closed this as completed in #185 Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FusionPersistentSoftmaxLocalShared failure #163

FusionPersistentSoftmaxLocalShared failure #163

naoyam commented Apr 12, 2023

FusionPersistentSoftmaxLocalShared failure #163

FusionPersistentSoftmaxLocalShared failure #163

Comments

naoyam commented Apr 12, 2023