Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

@LoserCheems LoserCheems commented Oct 23, 2025

Linked #195
Replace manual mask construction with a utility-based approach in both English and Chinese documentation to simplify examples and ensure consistency with the current API.

Replaces manual top‑k mask construction with a utility-based dynamic sparse mask in the examples to reduce complexity and align with current API.

Unifies variable names and updates example usage and gradient printouts across English and Chinese guides.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the documentation to use the create_mask utility function instead of manual mask construction in both English and Chinese README files. The changes simplify the examples and promote consistency with the current API design.

Key changes:

  • Import and utilize the create_mask utility function from flash_dmattn.utils.mask
  • Replace manual top-k mask generation logic with a single utility function call
  • Standardize variable naming from attention_mask/attention_bias to attn_mask/attn_bias

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
README.md Updated English documentation to use the mask utility and simplified variable names
README_zh.md Updated Chinese documentation with the same utility-based approach and variable naming

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

README.md Outdated
Comment on lines 172 to 173
attn_mask = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)

Copy link

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 171 says 'Create bias for sparse attention' but the code creates attn_mask, not attn_bias. Additionally, attn_bias is used on line 177 but never defined. The line should create attn_bias instead of attn_mask, and the subsequent create_mask call should initialize attn_mask from None or an appropriate default.

Suggested change
attn_mask = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)
attn_bias = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)
attn_mask = None

Copilot uses AI. Check for mistakes.
attention_mask = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)
attention_bias = torch.randn(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)
# 为稀疏注意力创建 bias
attn_bias = torch.randn(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)
Copy link

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable attn_mask is passed to create_mask on line 178 but is never defined before this usage. Either initialize attn_mask before the conditional block or pass None if the utility function supports it.

Copilot uses AI. Check for mistakes.
@LoserCheems LoserCheems requested a review from Copilot October 23, 2025 13:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@LoserCheems LoserCheems merged commit 964973e into main Oct 23, 2025
@LoserCheems LoserCheems deleted the fix-195 branch October 27, 2025 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants