Add tensor sanitization and improve test configurations #136

LoserCheems · 2025-08-29T08:06:24Z

Introduce a tensor sanitization function to prevent NaN and infinite values, enhancing numerical stability in computations. Organize and disable unsupported test configurations for large head dimensions on sm89 GPUs, ensuring clarity and consistency in benchmarks. Update memory layout handling in backward performance benchmarks and provide comprehensive technical documentation for the major release.

Introduces a sanitization function that replaces NaN and infinite values with specified defaults (0.0) and applies it to output tensors in forward and backward passes. Prevents potential numerical instabilities that could propagate through the attention computation pipeline.

Comments out test configurations with head dimensions 128 and 256 due to insufficient shared memory on sm89 architecture when using the splitkv branch by default. Adds organizational comments to group test configurations by head dimension for better readability.

Improves test configuration organization by grouping test cases by head dimension and adding clarifying comments. Disables head dimension 128 and 256 test configurations due to current limitations - splitkv branch does not support head_dim>=128, and head_dim=256 additionally lacks sufficient shared memory for backward operations. Adds inline comments identifying specific configurations that produce INF values in dbias calculations.

Adds contiguous() calls after tensor transpose operations to ensure proper memory layout for CUDA functions, preventing potential memory access errors. Corrects benchmark configuration parameter from 4096 to 1024 for consistency with other test cases.

Documents the unified block-level dynamic mask skip logic for both forward and backward passes, establishing the technical foundation for the major release. Covers architectural overview, algorithm pseudocode, mathematical derivations, API specifications, memory management strategies, and performance optimization techniques. Provides detailed explanations of sparsity handling, numerical stability measures, and future enhancement roadmap to support ongoing development and user adoption.

Copilot

Pull Request Overview

This PR introduces tensor sanitization to enhance numerical stability, organizes test configurations for GPU compatibility, and provides comprehensive technical documentation for the v1.0.0 release. The changes focus on preventing NaN and infinite values in computations while improving code organization and documentation.

Add tensor sanitization function to replace NaN/infinite values with specified defaults
Organize and disable unsupported test configurations for large head dimensions on sm89 GPUs
Improve memory layout handling in backward performance benchmarks with explicit contiguous operations
Provide comprehensive technical documentation covering architecture, algorithms, and API details

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
flash_dmattn/flash_dmattn_interface.py	Adds tensor sanitization function and applies it to forward/backward pass outputs
docs/v1.0.0_technical_report.md	Comprehensive technical documentation covering architecture, algorithms, and v1.0.0 features
benchmarks/forward_equivalence.py	Organizes test configurations and disables unsupported head dimensions for sm89 GPUs
benchmarks/backward_equivalence.py	Similar test configuration organization with additional comments on known issues
benchmarks/backward_performance.py	Improves memory layout handling and adjusts test configuration

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

flash_dmattn/flash_dmattn_interface.py

Copilot · 2025-08-29T08:09:18Z

flash_dmattn/flash_dmattn_interface.py

        softcap,
        deterministic,
    )
+    _sanitize_tensors(dq, dk, dv, dbias)


The dbias parameter may be None in some cases, but _sanitize_tensors doesn't handle None values, which could cause a runtime error.

Copilot · 2025-08-29T08:09:18Z

flash_dmattn/flash_dmattn_interface.py

        softcap,
        deterministic,
    )
+    _sanitize_tensors(dq, dk, dv, dbias)


The dbias parameter may be None in some cases, but _sanitize_tensors doesn't handle None values, which could cause a runtime error.

docs/v1.0.0_technical_report.md

Copilot · 2025-08-29T08:09:18Z

benchmarks/backward_equivalence.py

        (1, 2, 1, 2048, 2048, 32, True),
        (1, 2, 1, 2048, 2048, 32, False),
-        (1, 2, 1, 4096, 4096, 32, True),
+        (1, 2, 1, 4096, 4096, 32, True),            # some INF in dbias, Idk why


Replace informal abbreviation 'Idk' with proper text: 'I don't know' or 'unknown reason'.

Copilot · 2025-08-29T08:09:19Z

benchmarks/backward_equivalence.py

        (1, 2, 1, 128, 128, 64, True),
        (1, 2, 1, 128, 128, 64, False),
-        (1, 2, 1, 256, 256, 64, True),
+        (1, 2, 1, 256, 256, 64, True),              # some INF in dbias, Idk why


Replace informal abbreviation 'Idk' with proper text: 'I don't know' or 'unknown reason'.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

LoserCheems added 5 commits August 29, 2025 15:53

LoserCheems requested review from Evanwu1125, SNHuan, Thanksyy, Copilot and wubingheng111 and removed request for Copilot August 29, 2025 08:06

LoserCheems assigned LoserCheems, Copilot, Evanwu1125, SNHuan, Thanksyy and wubingheng111 Aug 29, 2025

LoserCheems requested review from Copilot and ftgreat August 29, 2025 08:07

LoserCheems assigned ftgreat Aug 29, 2025

Copilot AI reviewed Aug 29, 2025

View reviewed changes

LoserCheems and others added 2 commits August 29, 2025 16:09

Update flash_dmattn/flash_dmattn_interface.py

b14f098

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update docs/v1.0.0_technical_report.md

779ef44

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

LoserCheems merged commit ce22f2d into main Aug 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add tensor sanitization and improve test configurations #136

Add tensor sanitization and improve test configurations #136

Uh oh!

LoserCheems commented Aug 29, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Aug 29, 2025

Uh oh!

Copilot AI Aug 29, 2025

Uh oh!

Uh oh!

Copilot AI Aug 29, 2025

Uh oh!

Copilot AI Aug 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Add tensor sanitization and improve test configurations #136

Add tensor sanitization and improve test configurations #136

Uh oh!

Conversation

LoserCheems commented Aug 29, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants