-
Notifications
You must be signed in to change notification settings - Fork 39
Update documentation to use mask utility in examples #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Replaces manual top‑k mask construction with a utility-based dynamic sparse mask in the examples to reduce complexity and align with current API. Unifies variable names and updates example usage and gradient printouts across English and Chinese guides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the documentation to use the create_mask utility function instead of manual mask construction in both English and Chinese README files. The changes simplify the examples and promote consistency with the current API design.
Key changes:
- Import and utilize the
create_maskutility function fromflash_dmattn.utils.mask - Replace manual top-k mask generation logic with a single utility function call
- Standardize variable naming from
attention_mask/attention_biastoattn_mask/attn_bias
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| README.md | Updated English documentation to use the mask utility and simplified variable names |
| README_zh.md | Updated Chinese documentation with the same utility-based approach and variable naming |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
README.md
Outdated
| attn_mask = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype) | ||
|
|
Copilot
AI
Oct 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 171 says 'Create bias for sparse attention' but the code creates attn_mask, not attn_bias. Additionally, attn_bias is used on line 177 but never defined. The line should create attn_bias instead of attn_mask, and the subsequent create_mask call should initialize attn_mask from None or an appropriate default.
| attn_mask = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype) | |
| attn_bias = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype) | |
| attn_mask = None |
| attention_mask = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype) | ||
| attention_bias = torch.randn(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype) | ||
| # 为稀疏注意力创建 bias | ||
| attn_bias = torch.randn(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype) |
Copilot
AI
Oct 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable attn_mask is passed to create_mask on line 178 but is never defined before this usage. Either initialize attn_mask before the conditional block or pass None if the utility function supports it.
… create_mask (EN and ZH)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Linked #195
Replace manual mask construction with a utility-based approach in both English and Chinese documentation to simplify examples and ensure consistency with the current API.