Loss NaN in Mamba2

Hello guys,

When I applied Mamba2 to image generation, I found several NaN values in the gradients (`ddt_bias`, `dx`, and `ddt_given`) in `_mamba_chunk_scan_combined_bwd` of `mamba_ssm/ops/triton/ssd_combined.py`, therefore the loss is NaN.

The image generation code is [DiM](https://github.com/tyshiwo1/DiM-DiffusionMamba). I just replaced the original Mamba-1 block with Mamba-2. I used the bf16 precision for training from scratch, and the NaN appears in the first training iteration.

My environment is `triton==2.2.0, torch==2.2.1+cu121`.

If anyone can help me, I will be very grateful!
![nan](https://github.com/state-spaces/mamba/assets/34150970/67a49ceb-deb3-457e-986a-7eaaa113770e)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Loss NaN in Mamba2 #352

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Loss NaN in Mamba2 #352

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions