-
Notifications
You must be signed in to change notification settings - Fork 39
Enables comprehensive benchmark configurations #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Uncomments all test configurations to enable comprehensive performance benchmarking across multiple dimensions including sequence length, batch size, head count, and head dimension variations. Updates sequence lengths in batch size, head count, and head dimension tests from 1024 to 4096 for more realistic testing scenarios. Activates non-causal attention testing with updated configuration. Removes duplicate num_runs assignment to eliminate redundancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables comprehensive performance benchmarking by uncommenting and updating all test configurations, adjusting sequence lengths for more realistic scenarios, activating non-causal attention tests, and removing a redundant num_runs assignment.
- Uncomments and expands the
configslist to cover sequence length, inference, batch size, head count, and head dimension variations. - Updates sequence lengths from 1024 to 4096 for batch size, head count, and head dimension tests.
- Activates the non-causal attention test with the updated configuration and removes the duplicate
num_runsreassignment.
| configs = [ | ||
| # # Vary sequence length | ||
| # (1, 2, 1, 256, 256, 128, 2048, True), | ||
| # (1, 2, 1, 512, 512, 128, 2048, True), | ||
| # (1, 2, 1, 1024, 1024, 128, 2048, True), | ||
| # (1, 2, 1, 2048, 2048, 128, 2048, True), | ||
| # (1, 2, 1, 4096, 4096, 128, 2048, True), | ||
| # (1, 2, 1, 8192, 8192, 128, 2048, True), | ||
| # (1, 2, 1, 16384, 16384, 128, 2048, True), | ||
| # (1, 2, 1, 32768, 32768, 128, 2048, True), | ||
|
|
||
| # # Inference | ||
| # (1, 2, 1, 2, 256, 128, 2048, True), | ||
| # (1, 2, 1, 2, 512, 128, 2048, True), | ||
| # (1, 2, 1, 2, 1024, 128, 2048, True), | ||
| # (1, 2, 1, 2, 2048, 128, 2048, True), | ||
| # (1, 2, 1, 2, 4096, 128, 2048, True), | ||
| # (1, 2, 1, 2, 8192, 128, 2048, True), | ||
| # (1, 2, 1, 2, 16384, 128, 2048, True), | ||
| # (1, 2, 1, 2, 32768, 128, 2048, True), | ||
| # (1, 2, 1, 2, 65536, 128, 2048, True), | ||
| # (1, 2, 1, 2, 131072, 128, 2048, True), | ||
| # (1, 2, 1, 2, 262144, 128, 2048, True), | ||
| # (1, 2, 1, 2, 524288, 128, 2048, True), | ||
|
|
||
| # # Vary batch size | ||
| # (1, 2, 1, 1024, 1024, 32, 2048, True), | ||
| # (2, 2, 1, 1024, 1024, 32, 2048, True), | ||
| # (4, 2, 1, 1024, 1024, 32, 2048, True), | ||
| # (8, 2, 1, 1024, 1024, 32, 2048, True), | ||
|
|
||
| # # Vary head count | ||
| # (1, 1, 1, 1024, 1024, 32, 2048, True), | ||
| # (1, 2, 1, 1024, 1024, 32, 2048, True), | ||
| # (1, 4, 1, 1024, 1024, 32, 2048, True), | ||
| # (1, 8, 2, 1024, 1024, 32, 2048, True), | ||
|
|
||
| # # Vary head dimension | ||
| # (1, 2, 1, 1024, 1024, 32, 2048, True), | ||
| # (1, 2, 1, 1024, 1024, 64, 2048, True), | ||
| # (1, 2, 1, 1024, 1024, 96, 2048, True), | ||
| # (1, 2, 1, 1024, 1024, 128, 2048, True), | ||
| # (1, 2, 1, 1024, 1024, 192, 2048, True), | ||
| # (1, 2, 1, 1024, 1024, 256, 2048, True), | ||
| # Vary sequence length | ||
| (1, 2, 1, 256, 256, 128, 2048, True), | ||
| (1, 2, 1, 512, 512, 128, 2048, True), | ||
| (1, 2, 1, 1024, 1024, 128, 2048, True), | ||
| (1, 2, 1, 2048, 2048, 128, 2048, True), | ||
| (1, 2, 1, 4096, 4096, 128, 2048, True), | ||
| (1, 2, 1, 8192, 8192, 128, 2048, True), | ||
| (1, 2, 1, 16384, 16384, 128, 2048, True), | ||
| (1, 2, 1, 32768, 32768, 128, 2048, True), | ||
|
|
||
| # Inference | ||
| (1, 2, 1, 2, 256, 128, 2048, True), | ||
| (1, 2, 1, 2, 512, 128, 2048, True), | ||
| (1, 2, 1, 2, 1024, 128, 2048, True), | ||
| (1, 2, 1, 2, 2048, 128, 2048, True), | ||
| (1, 2, 1, 2, 4096, 128, 2048, True), | ||
| (1, 2, 1, 2, 8192, 128, 2048, True), |
Copilot
AI
Jul 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The manually unrolled configs list is quite large and repetitive. Consider generating these configurations programmatically using loops or list comprehensions to improve readability and reduce duplication.
| # Test non-causal | ||
| (1, 2, 1, 4096, 4096, 128, 2048, False), |
Copilot
AI
Jul 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Instead of hardcoding a single non-causal test tuple, you could extend your programmatic generation approach (e.g., include is_causal as a parameter) so adding or modifying test modes scales more cleanly.
| # Test non-causal | |
| (1, 2, 1, 4096, 4096, 128, 2048, False), | |
| # Generate configurations for both causal and non-causal cases | |
| *[(1, 2, 1, 4096, 4096, 128, 2048, is_causal) for is_causal in [True, False]], |
Uncomments all test configurations to enable comprehensive performance benchmarking across multiple dimensions including sequence length, batch size, head count, and head dimension variations.
Updates sequence lengths in batch size, head count, and head dimension tests from 1024 to 4096 for more realistic testing scenarios.
Activates non-causal attention testing with updated configuration.
Removes duplicate num_runs assignment to eliminate redundancy.