Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

Introduce a tensor sanitization function to prevent NaN and infinite values, enhancing numerical stability in computations. Organize and disable unsupported test configurations for large head dimensions on sm89 GPUs, ensuring clarity and consistency in benchmarks. Update memory layout handling in backward performance benchmarks and provide comprehensive technical documentation for the major release.

Introduces a sanitization function that replaces NaN and infinite values with specified defaults (0.0) and applies it to output tensors in forward and backward passes.

Prevents potential numerical instabilities that could propagate through the attention computation pipeline.
Comments out test configurations with head dimensions 128 and 256 due to insufficient shared memory on sm89 architecture when using the splitkv branch by default.

Adds organizational comments to group test configurations by head dimension for better readability.
Improves test configuration organization by grouping test cases by head dimension and adding clarifying comments.

Disables head dimension 128 and 256 test configurations due to current limitations - splitkv branch does not support head_dim>=128, and head_dim=256 additionally lacks sufficient shared memory for backward operations.

Adds inline comments identifying specific configurations that produce INF values in dbias calculations.
Adds contiguous() calls after tensor transpose operations to ensure proper memory layout for CUDA functions, preventing potential memory access errors.

Corrects benchmark configuration parameter from 4096 to 1024 for consistency with other test cases.
Documents the unified block-level dynamic mask skip logic for both forward and backward passes, establishing the technical foundation for the major release.

Covers architectural overview, algorithm pseudocode, mathematical derivations, API specifications, memory management strategies, and performance optimization techniques.

Provides detailed explanations of sparsity handling, numerical stability measures, and future enhancement roadmap to support ongoing development and user adoption.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces tensor sanitization to enhance numerical stability, organizes test configurations for GPU compatibility, and provides comprehensive technical documentation for the v1.0.0 release. The changes focus on preventing NaN and infinite values in computations while improving code organization and documentation.

  • Add tensor sanitization function to replace NaN/infinite values with specified defaults
  • Organize and disable unsupported test configurations for large head dimensions on sm89 GPUs
  • Improve memory layout handling in backward performance benchmarks with explicit contiguous operations
  • Provide comprehensive technical documentation covering architecture, algorithms, and API details

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
flash_dmattn/flash_dmattn_interface.py Adds tensor sanitization function and applies it to forward/backward pass outputs
docs/v1.0.0_technical_report.md Comprehensive technical documentation covering architecture, algorithms, and v1.0.0 features
benchmarks/forward_equivalence.py Organizes test configurations and disables unsupported head dimensions for sm89 GPUs
benchmarks/backward_equivalence.py Similar test configuration organization with additional comments on known issues
benchmarks/backward_performance.py Improves memory layout handling and adjusts test configuration

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

softcap,
deterministic,
)
_sanitize_tensors(dq, dk, dv, dbias)
Copy link

Copilot AI Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dbias parameter may be None in some cases, but _sanitize_tensors doesn't handle None values, which could cause a runtime error.

Copilot uses AI. Check for mistakes.
softcap,
deterministic,
)
_sanitize_tensors(dq, dk, dv, dbias)
Copy link

Copilot AI Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dbias parameter may be None in some cases, but _sanitize_tensors doesn't handle None values, which could cause a runtime error.

Copilot uses AI. Check for mistakes.
(1, 2, 1, 2048, 2048, 32, True),
(1, 2, 1, 2048, 2048, 32, False),
(1, 2, 1, 4096, 4096, 32, True),
(1, 2, 1, 4096, 4096, 32, True), # some INF in dbias, Idk why
Copy link

Copilot AI Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace informal abbreviation 'Idk' with proper text: 'I don't know' or 'unknown reason'.

Copilot uses AI. Check for mistakes.
(1, 2, 1, 128, 128, 64, True),
(1, 2, 1, 128, 128, 64, False),
(1, 2, 1, 256, 256, 64, True),
(1, 2, 1, 256, 256, 64, True), # some INF in dbias, Idk why
Copy link

Copilot AI Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace informal abbreviation 'Idk' with proper text: 'I don't know' or 'unknown reason'.

Copilot uses AI. Check for mistakes.
LoserCheems and others added 2 commits August 29, 2025 16:09
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@LoserCheems LoserCheems merged commit ce22f2d into main Aug 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants