[Dev] Add support and test case for Ladder Weight only Transformation Matmul Operator #212

LeiWang1999 · 2024-10-02T16:21:41Z

This pull request includes several changes to the bitblas library, focusing on improving the matrix multiplication operations and adding new scheduling capabilities. The most important changes involve updates to propagation handling, scheduler conditions, and test configurations.

Propagation Handling:

bitblas/ops/general_matmul/__init__.py: Updated the propagation handling to use TransformKind.LDMatrixTransform for boolean propagation and added a TODO comment to check device compatibility for propagation. [1] [2]

Scheduler Conditions:

bitblas/ops/general_matmul/tilelang/dense/__init__.py: Split conditions for can_apply_fine_grain_scheduler and added a new can_apply_weight_propagation_scheduler function to handle specific propagation scenarios. [1] [2]
bitblas/ops/general_matmul/tilelang/dense/__init__.py: Updated error message in the scheduler to provide more context on unsupported configurations.

Scheduler Class:

bitblas/ops/general_matmul/tilelang/dense/matmul_tensorcore.py: Refactored MatmulWeightPropagationScheduler to inherit from MatmulFineGrainScheduler and removed redundant configuration methods.

Typing and Method Signatures:

bitblas/tl/base_hint.py: Added type hints and converted from_roller_hint to a class method. [1] [2]

Test Configurations:

testing/python/operators/test_general_matmul_ops_backend_tl.py: Updated the matmul_finetune function and tests to include the propagate_b parameter for more flexible testing. [1] [2] [3]

…y function

…ps_dynamic

The select_scheduler function in the dense/__init__.py module has been refactored to use a fine-grained interface. This change provides more flexibility and enables the implementation of high-performance kernels. Update MatmulScheduler class in matmul_tensorcore.py The MatmulScheduler class in the matmul_tensorcore.py module has been updated to calculate the number of threads based on the block size and warp size. This ensures optimal GPU warp configuration for NVIDIA GPUs. Improve test_general_matmul_tilelang_kernel.py The test_general_matmul_tilelang_kernel.py module has been improved to include additional test cases and assertions for correctness.

…inetuning

…ps_dynamic

…_tilelang_kernel.py to use centered random values for input tensors

…ps_dynamic

…t tensors

…alled

LeiWang1999 added 30 commits September 28, 2024 07:43

Refactor tilelang dequantize module and add matmul_blocked_weight_onl…

f3b1eb9

…y function

remove un-implemented code.

730d13e

Implement BaseScheduler to wrap some related items.

8047ee7

lint fix

64db065

test skip

cef04a8

Refactor tilelang dequantize module and add matmul_blocked_weight_onl…

f1652e9

…y function

Merge branch 'main' of https://github.com/microsoft/BitBLAS into tl_o…

4f6c545

…ps_dynamic

test fix

c485b68

hardware tuning demo

ebe42a6

Merge branch 'main' of https://github.com/microsoft/BitBLAS into tl_o…

88230ec

…ps_dynamic

remove debug related items.

44246a1

imlement tuner and cache fix

bb51e15

Merge branch 'main' of https://github.com/microsoft/BitBLAS into tl_o…

f42a3b9

…ps_dynamic

lint fix

de7ae18

test case fix.

ef40bd8

Adapt Tuning Space generation with Roller

85f0a5f

Merge branch 'main' of https://github.com/microsoft/BitBLAS into tl_o…

e9f7db3

…ps_dynamic

lint fix

9e31336

Refactor select_scheduler function for fine-grained interface

f1378d4

Refactor NotImplementedError message in BaseTLHint class

137cce3

Update submodule reference in 3rdparty/tvm

fc19fa2

Refactor matmul_finetune function to use topk=20 for hardware-aware f…

fe51bb1

…inetuning

Refactor submodule reference in 3rdparty/tvm

79878cb

lint fix

0fc7ab9

Refactor test_general_matmul_tilelang_impl.py and test_tilelang_gemm.py

255e925

Refactor MatmulConfig to enable weight propagation on supported devices

df47f63

Merge branch 'main' of https://github.com/microsoft/BitBLAS into tl_o…

826255d

…ps_dynamic

Refactor test_general_matmul_tilelang_impl.py and test_general_matmul…

48dc94e

…_tilelang_kernel.py to use centered random values for input tensors

test fix

82f39d7

LeiWang1999 added 9 commits October 2, 2024 18:27

Merge branch 'main' of https://github.com/microsoft/BitBLAS into tl_o…

02ef258

…ps_dynamic

test fix

e753ef2

Refactor flash attention tests to use centered random values for inpu…

f6dd744

…t tensors

Refactor flash attention tests to use centered random values for inpu…

7417372

…t tensors

Refactor flash attention tests to skip test if flash_attn is not inst…

145a850

…alled

lint fix

3384458

test fix

82f50ea

test fix

d2ed936

test fix

6c56273

LeiWang1999 merged commit 988e782 into microsoft:main Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dev] Add support and test case for Ladder Weight only Transformation Matmul Operator #212

[Dev] Add support and test case for Ladder Weight only Transformation Matmul Operator #212

Uh oh!

LeiWang1999 commented Oct 2, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Dev] Add support and test case for Ladder Weight only Transformation Matmul Operator #212

[Dev] Add support and test case for Ladder Weight only Transformation Matmul Operator #212

Uh oh!

Conversation

LeiWang1999 commented Oct 2, 2024

Propagation Handling:

Scheduler Conditions:

Scheduler Class:

Typing and Method Signatures:

Test Configurations:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant