Fix attention_mask and attention_bias shape descriptions and remove redundant checks #192

LoserCheems · 2025-10-13T08:11:44Z

Clarify the shape descriptions for attention_mask and attention_bias in the documentation. Eliminate unnecessary dimension checks in the attention forward function to streamline the code.

…as in _flash_dynamic_mask_attention_forward

Copilot

Pull Request Overview

This PR improves the documentation and code for flash dynamic mask attention by clarifying tensor shape descriptions and removing redundant dimension handling logic.

Clarifies shape documentation for attention_mask and attention_bias parameters to be more explicit about supported dimensions
Removes unnecessary dimension compatibility checks in the attention forward function

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
flash_dynamic_mask_attention.py	Updates parameter documentation to clarify supported tensor shapes
modeling_flash_dynamic_mask_attention_utils.py	Removes redundant dimension checking and reshaping logic

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-13T08:12:12Z

flash_dmattn/integrations/modeling_flash_dynamic_mask_attention_utils.py

            and key_length > window_size
        ):
            min_dtype = torch.finfo(query_states.dtype).min
            if attention_mask is not None:


After removing the dimension compatibility checks, this code will fail if attention_mask and attention_bias have incompatible shapes for broadcasting in the masked_fill operation. The removed checks were handling dimension mismatches between 3D and 4D tensors.

Suggested change

if attention_mask is not None:

if attention_mask is not None:

# Ensure attention_mask is broadcastable to attention_bias

if attention_mask.shape != attention_bias.shape:

# Try to unsqueeze or expand attention_mask to match attention_bias

# Common case: attention_mask is (batch, seq_len) and attention_bias is (batch, num_heads, seq_len, seq_len)

# or attention_mask is (batch, 1, seq_len, seq_len)

# We'll try to expand attention_mask to match attention_bias

try:

attention_mask = attention_mask.expand_as(attention_bias)

except RuntimeError:

# Try unsqueeze and expand for common 3D->4D case

while attention_mask.dim() < attention_bias.dim():

attention_mask = attention_mask.unsqueeze(1)

attention_mask = attention_mask.expand_as(attention_bias)

algo-home added 2 commits October 13, 2025 15:58

Fix attention_mask and attention_bias shape descriptions in docstring

42e118d

Remove redundant dimension checks for attention_mask and attention_bi…

623b75d

…as in _flash_dynamic_mask_attention_forward

Copilot AI review requested due to automatic review settings October 13, 2025 08:11

github-actions bot requested review from Evanwu1125, SNHuan, Thanksyy, ftgreat, juliohsu, wubingheng111 and zacliu2023 October 13, 2025 08:11

github-actions bot assigned Evanwu1125, ftgreat, juliohsu, SNHuan, Thanksyy, wubingheng111 and zacliu2023 Oct 13, 2025

Copilot AI reviewed Oct 13, 2025

View reviewed changes

LoserCheems merged commit 3d91162 into main Oct 13, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix attention_mask and attention_bias shape descriptions and remove redundant checks #192

Fix attention_mask and attention_bias shape descriptions and remove redundant checks #192

Uh oh!

LoserCheems commented Oct 13, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

-            if attention_mask is not None:
+            if attention_mask is not None:
+                # Ensure attention_mask is broadcastable to attention_bias
+                if attention_mask.shape != attention_bias.shape:
+                    # Try to unsqueeze or expand attention_mask to match attention_bias
+                    # Common case: attention_mask is (batch, seq_len) and attention_bias is (batch, num_heads, seq_len, seq_len)
+                    # or attention_mask is (batch, 1, seq_len, seq_len)
+                    # We'll try to expand attention_mask to match attention_bias
+                    try:
+                        attention_mask = attention_mask.expand_as(attention_bias)
+                    except RuntimeError:
+                        # Try unsqueeze and expand for common 3D->4D case
+                        while attention_mask.dim() < attention_bias.dim():
+                            attention_mask = attention_mask.unsqueeze(1)
+                        attention_mask = attention_mask.expand_as(attention_bias)

Fix attention_mask and attention_bias shape descriptions and remove redundant checks #192

Fix attention_mask and attention_bias shape descriptions and remove redundant checks #192

Uh oh!

Conversation

LoserCheems commented Oct 13, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants