Refactor CUDA interface for improved usability #80

LoserCheems · 2025-07-30T13:25:45Z

Simplify the CUDA interface by replacing direct module imports with a higher-level function that uses named parameters. Enhance usability by removing manual tensor preparation and adding runtime checks for CUDA availability before executing tests.

Replaces direct CUDA module import with higher-level interface function that provides cleaner API with named parameters. Simplifies function call by removing manual tensor preparation and using more intuitive parameter names like dropout_p and softmax_scale. Adds runtime check to ensure CUDA implementation availability before executing tests.

Copilot

Pull Request Overview

Refactor the CUDA interface to use a higher-level function with named parameters instead of direct module imports. The changes simplify tensor preparation by removing manual contiguous() calls and improve error handling with runtime checks for CUDA availability.

Replace direct flash_dmattn_cuda module import with flash_dmattn_func from interface module
Simplify CUDA function call using named parameters instead of positional arguments
Add runtime checks for CUDA availability before executing tests

Copilot · 2025-07-30T13:26:22Z

benchmarks/benchmark_forward_equivalence.py

+    query_states = query_states.transpose(1, 2)     # [batch, query_len, num_heads, head_dim]
+    key_states = key_states.transpose(1, 2)         # [batch, key_len, num_kv_heads, head_dim]
+    value_states = value_states.transpose(1, 2)     # [batch, key_len, num_kv_heads, head_dim]


Removing .contiguous() calls may cause performance issues if the tensors are not contiguous in memory. CUDA kernels typically require contiguous tensors for optimal performance. Consider adding .contiguous() back or verify that the new interface handles non-contiguous tensors efficiently.

Suggested change

query_states = query_states.transpose(1, 2) # [batch, query_len, num_heads, head_dim]

key_states = key_states.transpose(1, 2) # [batch, key_len, num_kv_heads, head_dim]

value_states = value_states.transpose(1, 2) # [batch, key_len, num_kv_heads, head_dim]

query_states = query_states.transpose(1, 2).contiguous() # [batch, query_len, num_heads, head_dim]

key_states = key_states.transpose(1, 2).contiguous() # [batch, key_len, num_kv_heads, head_dim]

value_states = value_states.transpose(1, 2).contiguous() # [batch, key_len, num_kv_heads, head_dim]

Copilot · 2025-07-30T13:26:23Z

benchmarks/benchmark_forward_equivalence.py

-    attn_mask = attn_mask.contiguous()                          # [batch, num_kv_heads, query_len, key_len]
+    query_states = query_states.transpose(1, 2)     # [batch, query_len, num_heads, head_dim]
+    key_states = key_states.transpose(1, 2)         # [batch, key_len, num_kv_heads, head_dim]
+    value_states = value_states.transpose(1, 2)     # [batch, key_len, num_kv_heads, head_dim]


The zoh_states calculation and expansion logic has been removed, but the calculate_zoh_states function is still called earlier in the function. This may cause the attention mechanism to not work correctly as zoh_states appear to be required for the dynamic mask attention computation.

LoserCheems requested review from Evanwu1125, SNHuan, Copilot and wubingheng111 and removed request for Copilot July 30, 2025 13:25

LoserCheems assigned LoserCheems, Copilot, Evanwu1125, SNHuan and wubingheng111 Jul 30, 2025

Copilot AI reviewed Jul 30, 2025

View reviewed changes

LoserCheems merged commit 4e93039 into main Jul 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor CUDA interface for improved usability #80

Refactor CUDA interface for improved usability #80

Uh oh!

LoserCheems commented Jul 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 30, 2025

Uh oh!

Copilot AI Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Refactor CUDA interface for improved usability #80

Refactor CUDA interface for improved usability #80

Uh oh!

Conversation

LoserCheems commented Jul 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants