-
Notifications
You must be signed in to change notification settings - Fork 45
Add tensor sanitization and improve test configurations #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Introduces a sanitization function that replaces NaN and infinite values with specified defaults (0.0) and applies it to output tensors in forward and backward passes. Prevents potential numerical instabilities that could propagate through the attention computation pipeline.
Comments out test configurations with head dimensions 128 and 256 due to insufficient shared memory on sm89 architecture when using the splitkv branch by default. Adds organizational comments to group test configurations by head dimension for better readability.
Improves test configuration organization by grouping test cases by head dimension and adding clarifying comments. Disables head dimension 128 and 256 test configurations due to current limitations - splitkv branch does not support head_dim>=128, and head_dim=256 additionally lacks sufficient shared memory for backward operations. Adds inline comments identifying specific configurations that produce INF values in dbias calculations.
Adds contiguous() calls after tensor transpose operations to ensure proper memory layout for CUDA functions, preventing potential memory access errors. Corrects benchmark configuration parameter from 4096 to 1024 for consistency with other test cases.
Documents the unified block-level dynamic mask skip logic for both forward and backward passes, establishing the technical foundation for the major release. Covers architectural overview, algorithm pseudocode, mathematical derivations, API specifications, memory management strategies, and performance optimization techniques. Provides detailed explanations of sparsity handling, numerical stability measures, and future enhancement roadmap to support ongoing development and user adoption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces tensor sanitization to enhance numerical stability, organizes test configurations for GPU compatibility, and provides comprehensive technical documentation for the v1.0.0 release. The changes focus on preventing NaN and infinite values in computations while improving code organization and documentation.
- Add tensor sanitization function to replace NaN/infinite values with specified defaults
- Organize and disable unsupported test configurations for large head dimensions on sm89 GPUs
- Improve memory layout handling in backward performance benchmarks with explicit contiguous operations
- Provide comprehensive technical documentation covering architecture, algorithms, and API details
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| flash_dmattn/flash_dmattn_interface.py | Adds tensor sanitization function and applies it to forward/backward pass outputs |
| docs/v1.0.0_technical_report.md | Comprehensive technical documentation covering architecture, algorithms, and v1.0.0 features |
| benchmarks/forward_equivalence.py | Organizes test configurations and disables unsupported head dimensions for sm89 GPUs |
| benchmarks/backward_equivalence.py | Similar test configuration organization with additional comments on known issues |
| benchmarks/backward_performance.py | Improves memory layout handling and adjusts test configuration |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| softcap, | ||
| deterministic, | ||
| ) | ||
| _sanitize_tensors(dq, dk, dv, dbias) |
Copilot
AI
Aug 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dbias parameter may be None in some cases, but _sanitize_tensors doesn't handle None values, which could cause a runtime error.
| softcap, | ||
| deterministic, | ||
| ) | ||
| _sanitize_tensors(dq, dk, dv, dbias) |
Copilot
AI
Aug 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dbias parameter may be None in some cases, but _sanitize_tensors doesn't handle None values, which could cause a runtime error.
| (1, 2, 1, 2048, 2048, 32, True), | ||
| (1, 2, 1, 2048, 2048, 32, False), | ||
| (1, 2, 1, 4096, 4096, 32, True), | ||
| (1, 2, 1, 4096, 4096, 32, True), # some INF in dbias, Idk why |
Copilot
AI
Aug 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace informal abbreviation 'Idk' with proper text: 'I don't know' or 'unknown reason'.
| (1, 2, 1, 128, 128, 64, True), | ||
| (1, 2, 1, 128, 128, 64, False), | ||
| (1, 2, 1, 256, 256, 64, True), | ||
| (1, 2, 1, 256, 256, 64, True), # some INF in dbias, Idk why |
Copilot
AI
Aug 29, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace informal abbreviation 'Idk' with proper text: 'I don't know' or 'unknown reason'.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Introduce a tensor sanitization function to prevent NaN and infinite values, enhancing numerical stability in computations. Organize and disable unsupported test configurations for large head dimensions on sm89 GPUs, ensuring clarity and consistency in benchmarks. Update memory layout handling in backward performance benchmarks and provide comprehensive technical documentation for the major release.