-
Notifications
You must be signed in to change notification settings - Fork 45
Update-docs #154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update-docs #154
Conversation
Replaces packed variants and variable length sections with comprehensive Transformers integration documentation. Adds detailed documentation for flash_dynamic_mask_attention_forward function with complete usage examples showing dynamic attention bias generation and flexible backend selection. Reorganizes content structure to prioritize practical integration patterns over low-level API variants. Includes backend comparison table and updated installation instructions for better developer onboarding.
Revises documentation to reflect the transition from a two-stage ZOH-based approach to a unified sparse computation system with block-level skip logic. Key documentation updates include: - Replaces ZOH states and active masks with attention mask and bias tensors - Documents unified block-level skip logic for both forward and backward passes - Updates API signatures to reflect new required parameters - Adds comprehensive shared memory aliasing strategies - Documents LSE caching for numerical stability in backward pass - Updates performance models to reflect block-level sparsity benefits - Provides complete migration examples for existing codebases Removes references to TopK selection and keep_window_size parameters in favor of direct mask and bias tensor inputs, simplifying the API while maintaining sparse computation benefits.
…-level optimizations Restructures README to highlight core kernel advantages including native 4D mask/bias tensor processing and intelligent computation skipping mechanisms. Reorganizes feature sections to better showcase performance optimizations and separates basic usage from gradient computation examples. Improves technical explanations by focusing on unified skip logic, memory access patterns, and complete gradient chain support rather than abstract integration concepts. Updates code examples to demonstrate proper tensor shapes and sparse mask generation patterns for better user guidance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the documentation to reflect a transition from a two-stage ZOH-based approach to a unified sparse computation system with block-level skip logic. The changes remove references to TopK selection parameters and update API signatures to reflect direct mask and bias tensor inputs.
- Comprehensive documentation update across API reference and README files
- Updated API signatures to reflect unified block-level sparse computation
- Added transformers integration documentation with complete usage examples
Reviewed Changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| docs/api_reference.md | Major API documentation overhaul with new backend details, transformers integration section, and updated function signatures |
| README_zh.md | Chinese README updates with revised feature descriptions and usage examples |
| README.md | English README updates with revised feature descriptions and usage examples |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| - softcap: Softcap value for attention scores | ||
| - **kwargs: Additional arguments including: | ||
| - is_causal: Whether to apply causal mask | ||
| - keep_window_size: Size of window to keep |
Copilot
AI
Sep 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The keep_window_size parameter is documented here but the PR description mentions removing references to keep_window_size parameters. This appears inconsistent with the stated goal of the documentation update.
| - keep_window_size: Size of window to keep |
README.md
Outdated
| device=device, dtype=dtype) | ||
| attention_mask = torch.ones(batch_size, num_heads, seq_len, seq_len, | ||
| device=device, dtype=dtype) | ||
| attention_mask = torch.ones(batch_size, num_heads, seq_len, seq_len, device=device, dtype=dtype) |
Copilot
AI
Sep 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attention_bias tensor shape uses num_kv_heads but the attention_mask tensor on line 172 uses num_heads. This inconsistency in head dimensions should be clarified or made consistent.
| attention_mask = torch.ones(batch_size, num_heads, seq_len, seq_len, device=device, dtype=dtype) | |
| attention_mask = torch.ones(batch_size, num_kv_heads, seq_len, seq_len, device=device, dtype=dtype) |
Corrects attention mask to use num_kv_heads instead of num_heads for proper dimensional consistency with attention bias and key-value tensors in sparse attention implementation. Updates both English and Chinese documentation examples.
Description
Revises documentation to reflect the transition from a two-stage ZOH-based approach to a unified sparse computation system with block-level skip logic.
Removes references to TopK selection and keep_window_size parameters in favor of direct mask and bias tensor inputs, simplifying the API while maintaining sparse computation benefits.
Key documentation updates include:
Type of Change
Please check the relevant option(s):
Related Issues
no
Changes Made
Code Changes
Documentation
Testing
Please describe the tests you ran to verify your changes:
python -m pytest tests/ -vTest Configuration
Performance Impact
If this change affects performance, please provide benchmarks:
Before
After
Breaking Changes
If this PR introduces breaking changes, please describe:
Checklist
Please check all that apply:
CUDA-specific (if applicable)
Additional Notes
Any additional information that reviewers should know:
Screenshots (if applicable)
If your changes include visual elements or performance improvements, please add screenshots or graphs.