[Dev] Enhance Thread Sync Injector for Stream-K Implementation #166
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request includes several updates across multiple files to improve the installation process, update documentation, and adjust configurations. The most important changes include updating the submodule commit for
3rdparty/tvm, modifying the README and installation documentation, and refining a threshold in thebitblasanalysis script.Installation and Configuration Improvements:
install.sh: Added a check to remove thebuilddirectory if it exists before creating a new one.3rdparty/tvm: Updated the submodule commit to the latest version.Documentation Updates:
README.md: Added instructions for installing the latest version of BitBLAS directly from the GitHub repository.docs/Installation.md: Added similar installation instructions for the latest BitBLAS version from GitHub.Code Refinements:
bitblas/gpu/matmul_analysis.py: Adjusted theminimal_tensorize_thresholdbased on the input data type.Table Updates:
README.md: Updated theAccum_dtypecolumn in the support table to consistently useFP32.