fix: handle 3D input tensors in flashAttention by MankyDanky · Pull Request #45 · mni-ml/framework

MankyDanky · 2026-04-17T02:16:26Z

flashAttention was hardcoded to assume 4D [B, H, T, D] input, but models commonly pass 3D [B*H, T, D] after merging batch and head dims. With 3D input, qShape[3] was undefined, causing NaN to propagate through the entire forward pass.

Now reads T and D from the last two dimensions regardless of rank.

flashAttention was hardcoded to assume 4D [B, H, T, D] input, but models commonly pass 3D [B*H, T, D] after merging batch and head dims. With 3D input, qShape[3] was undefined, causing NaN to propagate through the entire forward pass. Now reads T and D from the last two dimensions regardless of rank.

MankyDanky merged commit 7e0a561 into staging Apr 17, 2026
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: handle 3D input tensors in flashAttention#45

fix: handle 3D input tensors in flashAttention#45
MankyDanky merged 1 commit intostagingfrom
feat/web-backend

MankyDanky commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MankyDanky commented Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant