Skip to content

Conversation

@LeiWang1999
Copy link
Contributor

This pull request includes several changes to the bitblas library, focusing on improving the matrix multiplication operations and adding new scheduling capabilities. The most important changes involve updates to propagation handling, scheduler conditions, and test configurations.

Propagation Handling:

Scheduler Conditions:

Scheduler Class:

Typing and Method Signatures:

Test Configurations:

The select_scheduler function in the dense/__init__.py module has been refactored to use a fine-grained interface. This change provides more flexibility and enables the implementation of high-performance kernels.

Update MatmulScheduler class in matmul_tensorcore.py

The MatmulScheduler class in the matmul_tensorcore.py module has been updated to calculate the number of threads based on the block size and warp size. This ensures optimal GPU warp configuration for NVIDIA GPUs.

Improve test_general_matmul_tilelang_kernel.py

The test_general_matmul_tilelang_kernel.py module has been improved to include additional test cases and assertions for correctness.
…_tilelang_kernel.py to use centered random values for input tensors
@LeiWang1999 LeiWang1999 merged commit 988e782 into microsoft:main Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant