Update documentation to use mask utility in examples #198

LoserCheems · 2025-10-23T13:51:38Z

Linked #195
Replace manual mask construction with a utility-based approach in both English and Chinese documentation to simplify examples and ensure consistency with the current API.

Replaces manual top‑k mask construction with a utility-based dynamic sparse mask in the examples to reduce complexity and align with current API. Unifies variable names and updates example usage and gradient printouts across English and Chinese guides.

Copilot

Pull Request Overview

This PR updates the documentation to use the create_mask utility function instead of manual mask construction in both English and Chinese README files. The changes simplify the examples and promote consistency with the current API design.

Key changes:

Import and utilize the create_mask utility function from flash_dmattn.utils.mask
Replace manual top-k mask generation logic with a single utility function call
Standardize variable naming from attention_mask/attention_bias to attn_mask/attn_bias

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
README.md	Updated English documentation to use the mask utility and simplified variable names
README_zh.md	Updated Chinese documentation with the same utility-based approach and variable naming

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-23T13:52:40Z

README.md

+attn_mask = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)



The comment on line 171 says 'Create bias for sparse attention' but the code creates attn_mask, not attn_bias. Additionally, attn_bias is used on line 177 but never defined. The line should create attn_bias instead of attn_mask, and the subsequent create_mask call should initialize attn_mask from None or an appropriate default.

Suggested change

attn_mask = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)

attn_bias = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)

attn_mask = None

Copilot · 2025-10-23T13:52:40Z

README_zh.md

-attention_mask = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)
-attention_bias = torch.randn(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)
+# 为稀疏注意力创建 bias
+attn_bias = torch.randn(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)


The variable attn_mask is passed to create_mask on line 178 but is never defined before this usage. Either initialize attn_mask before the conditional block or pass None if the utility function supports it.

… create_mask (EN and ZH)

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot AI review requested due to automatic review settings October 23, 2025 13:51

github-actions bot requested review from Evanwu1125, SNHuan, Thanksyy, ftgreat, juliohsu, wubingheng111 and zacliu2023 October 23, 2025 13:51

github-actions bot assigned Evanwu1125, ftgreat, juliohsu, SNHuan, Thanksyy, wubingheng111 and zacliu2023 Oct 23, 2025

Copilot AI reviewed Oct 23, 2025

View reviewed changes

Fix README examples: create attn_bias and pass attention_mask=None to…

4a74d6a

… create_mask (EN and ZH)

LoserCheems requested a review from Copilot October 23, 2025 13:57

Copilot AI reviewed Oct 23, 2025

View reviewed changes

LoserCheems merged commit 964973e into main Oct 23, 2025

LoserCheems deleted the fix-195 branch October 27, 2025 08:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update documentation to use mask utility in examples #198

Update documentation to use mask utility in examples #198

Uh oh!

LoserCheems commented Oct 23, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 23, 2025

Uh oh!

Copilot AI Oct 23, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

		attn_mask = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)

	attn_mask = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)
	attn_bias = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype)
	attn_mask = None

Update documentation to use mask utility in examples #198

Update documentation to use mask utility in examples #198

Uh oh!

Conversation

LoserCheems commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

LoserCheems commented Oct 23, 2025 •

edited

Loading