Skip to content

Conversation

@LoserCheems
Copy link
Collaborator

Uncomments all test configurations to enable comprehensive performance benchmarking across multiple dimensions including sequence length, batch size, head count, and head dimension variations.

Updates sequence lengths in batch size, head count, and head dimension tests from 1024 to 4096 for more realistic testing scenarios.

Activates non-causal attention testing with updated configuration.

Removes duplicate num_runs assignment to eliminate redundancy.

Uncomments all test configurations to enable comprehensive performance benchmarking across multiple dimensions including sequence length, batch size, head count, and head dimension variations.

Updates sequence lengths in batch size, head count, and head dimension tests from 1024 to 4096 for more realistic testing scenarios.

Activates non-causal attention testing with updated configuration.

Removes duplicate num_runs assignment to eliminate redundancy.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables comprehensive performance benchmarking by uncommenting and updating all test configurations, adjusting sequence lengths for more realistic scenarios, activating non-causal attention tests, and removing a redundant num_runs assignment.

  • Uncomments and expands the configs list to cover sequence length, inference, batch size, head count, and head dimension variations.
  • Updates sequence lengths from 1024 to 4096 for batch size, head count, and head dimension tests.
  • Activates the non-causal attention test with the updated configuration and removes the duplicate num_runs reassignment.

Comment on lines 737 to +754
configs = [
# # Vary sequence length
# (1, 2, 1, 256, 256, 128, 2048, True),
# (1, 2, 1, 512, 512, 128, 2048, True),
# (1, 2, 1, 1024, 1024, 128, 2048, True),
# (1, 2, 1, 2048, 2048, 128, 2048, True),
# (1, 2, 1, 4096, 4096, 128, 2048, True),
# (1, 2, 1, 8192, 8192, 128, 2048, True),
# (1, 2, 1, 16384, 16384, 128, 2048, True),
# (1, 2, 1, 32768, 32768, 128, 2048, True),

# # Inference
# (1, 2, 1, 2, 256, 128, 2048, True),
# (1, 2, 1, 2, 512, 128, 2048, True),
# (1, 2, 1, 2, 1024, 128, 2048, True),
# (1, 2, 1, 2, 2048, 128, 2048, True),
# (1, 2, 1, 2, 4096, 128, 2048, True),
# (1, 2, 1, 2, 8192, 128, 2048, True),
# (1, 2, 1, 2, 16384, 128, 2048, True),
# (1, 2, 1, 2, 32768, 128, 2048, True),
# (1, 2, 1, 2, 65536, 128, 2048, True),
# (1, 2, 1, 2, 131072, 128, 2048, True),
# (1, 2, 1, 2, 262144, 128, 2048, True),
# (1, 2, 1, 2, 524288, 128, 2048, True),

# # Vary batch size
# (1, 2, 1, 1024, 1024, 32, 2048, True),
# (2, 2, 1, 1024, 1024, 32, 2048, True),
# (4, 2, 1, 1024, 1024, 32, 2048, True),
# (8, 2, 1, 1024, 1024, 32, 2048, True),

# # Vary head count
# (1, 1, 1, 1024, 1024, 32, 2048, True),
# (1, 2, 1, 1024, 1024, 32, 2048, True),
# (1, 4, 1, 1024, 1024, 32, 2048, True),
# (1, 8, 2, 1024, 1024, 32, 2048, True),

# # Vary head dimension
# (1, 2, 1, 1024, 1024, 32, 2048, True),
# (1, 2, 1, 1024, 1024, 64, 2048, True),
# (1, 2, 1, 1024, 1024, 96, 2048, True),
# (1, 2, 1, 1024, 1024, 128, 2048, True),
# (1, 2, 1, 1024, 1024, 192, 2048, True),
# (1, 2, 1, 1024, 1024, 256, 2048, True),
# Vary sequence length
(1, 2, 1, 256, 256, 128, 2048, True),
(1, 2, 1, 512, 512, 128, 2048, True),
(1, 2, 1, 1024, 1024, 128, 2048, True),
(1, 2, 1, 2048, 2048, 128, 2048, True),
(1, 2, 1, 4096, 4096, 128, 2048, True),
(1, 2, 1, 8192, 8192, 128, 2048, True),
(1, 2, 1, 16384, 16384, 128, 2048, True),
(1, 2, 1, 32768, 32768, 128, 2048, True),

# Inference
(1, 2, 1, 2, 256, 128, 2048, True),
(1, 2, 1, 2, 512, 128, 2048, True),
(1, 2, 1, 2, 1024, 128, 2048, True),
(1, 2, 1, 2, 2048, 128, 2048, True),
(1, 2, 1, 2, 4096, 128, 2048, True),
(1, 2, 1, 2, 8192, 128, 2048, True),
Copy link

Copilot AI Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The manually unrolled configs list is quite large and repetitive. Consider generating these configurations programmatically using loops or list comprehensions to improve readability and reduce duplication.

Copilot uses AI. Check for mistakes.
Comment on lines +795 to +796
# Test non-causal
(1, 2, 1, 4096, 4096, 128, 2048, False),
Copy link

Copilot AI Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Instead of hardcoding a single non-causal test tuple, you could extend your programmatic generation approach (e.g., include is_causal as a parameter) so adding or modifying test modes scales more cleanly.

Suggested change
# Test non-causal
(1, 2, 1, 4096, 4096, 128, 2048, False),
# Generate configurations for both causal and non-causal cases
*[(1, 2, 1, 4096, 4096, 128, 2048, is_causal) for is_causal in [True, False]],

Copilot uses AI. Check for mistakes.
@LoserCheems LoserCheems merged commit 09f3b82 into main Jul 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants