-
Notifications
You must be signed in to change notification settings - Fork 52
[Dev][AMD] Support LDS and Flash Attention for AMD Backend #247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request includes several changes to the benchmarking scripts and the matrix multiplication and multi-head attention implementations, as well as updates to the
mfma_macro_generator.pyfile to support different thread binding layouts. The most important changes include updating the submodule commit, adding new benchmarking scripts, and modifying themfma_macro_generator.pyto support different thread binding layouts.Benchmarking updates:
benchmark/tilelang/benchmark.sh: Added multiple new benchmarking commands for different matrix dimensions.benchmark/tilelang/benchmark_tilelang_matmul.py: Added a new script for benchmarking matrix multiplication with various configurations.benchmark/tilelang/benchmark_tilelang_mha.py: Added a new script for benchmarking multi-head attention with various configurations.Matrix multiplication and multi-head attention implementations:
bitblas/tl/mfma_macro_generator.py: Added support for different thread binding layouts by introducing theis_m_firstflag and modifying methods to use this flag. [1] [2] [3] [4] [5] [6]Code simplification and cleanup:
bitblas/tl/mfma_layout.py: Removed an unused import and added new functions for different thread binding layouts. [1] [2]bitblas/tl/utils.py: Updated imports and modified themfma_store_index_mapfunction to use the new thread binding layout function. [1] [2]Submodule update:
3rdparty/tvm: Updated the submodule commit to a new version.