[Dev] Refactor the weight transformation to support upcoming stage3 transform #130

LeiWang1999 · 2024-08-05T11:22:43Z

PR #110 #114 enabling warp memory dequantization through Ladder Transfrom Stage3. However, such a transformation is 32bits level and make it hard to apply when the compressed bits are 2bit or 1bit. We should put the transformation before the weight compression.

This pull request introduced a new transform pipeline. which use the tir version of weight compress ref to pr #126 instead of cpu numpy simulated version.

This pull request includes several changes to the bitblas module, focusing on refactoring and enhancing functionality. The most important changes involve the removal of the matmul and matmul_dequantize modules, the introduction of a deprecation decorator, and updates to various matrix operations.

Refactoring and Removal:

Removed matmul and matmul_dequantize modules from bitblas/ops/__init__.py and bitblas/ops/matmul.py. This includes all associated classes and functions. ([[1]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-b4614d98b88a14674bc57a6c3e018791f7585b8310cff91f9bb672d82ccc7f8cL4-R4), [[2]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-f5fc0fd9c3e7bf9bc75de88c0ff9f60fe03a9a6db36d3a6a88ac81107fc47d8fL1-L276))

New Features:

Added a deprecated decorator in bitblas/__init__.py to mark functions as deprecated and emit warnings when they are used. ([bitblas/__init__.pyR93-R114](https://github.com/microsoft/BitBLAS/pull/130/files#diff-c2248c838997d60356d36c7ad50e42b7bfcc09238719cc55cccb6b87e48472e7R93-R114))

Matrix Operations Enhancements:

Added QuantCompress and QuantCompressConfig to bitblas/ops/general_matmul/__init__.py and integrated them into the dispatch_tir and _assign_weight_compress methods. ([[1]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-74fe5dd2824cb03a0fb2b0a913a2fc5caeb9c08e5368c318cd32b3af7e6f52edR16), [[2]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-74fe5dd2824cb03a0fb2b0a913a2fc5caeb9c08e5368c318cd32b3af7e6f52edR296))
Modified transform_weight method in bitblas/ops/general_matmul/__init__.py to handle weight compression and transformation more efficiently. ([[1]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-74fe5dd2824cb03a0fb2b0a913a2fc5caeb9c08e5368c318cd32b3af7e6f52edL455-L458), [[2]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-74fe5dd2824cb03a0fb2b0a913a2fc5caeb9c08e5368c318cd32b3af7e6f52edL467-R492))
Added forward and retrieve_output_shape methods to bitblas/ops/ladder_permutate/__init__.py and bitblas/ops/lop3_permutate/__init__.py to enhance tensor operations. ([[1]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-240200715f9a3998fc8f27b583c31a1c2679ed5e7a941c41a7a1b21df23a9abdR61-R77), [[2]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-0f5a4f22da4cc57a720a7a5eb160ff863316113eb51d4c447e332b4ab2bb5114L45-R61))

Code Cleanup:

Removed imports and unused variables from various files to improve code clarity and maintainability. ([[1]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-c2248c838997d60356d36c7ad50e42b7bfcc09238719cc55cccb6b87e48472e7L42-R45), [[2]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-240200715f9a3998fc8f27b583c31a1c2679ed5e7a941c41a7a1b21df23a9abdR8))

These changes collectively aim to streamline the codebase, improve functionality, and prepare for future updates.

…ability and maintainability

…ainability

…tainability

…ility

…d maintainability

…te_transform

…d maintainability

LeiWang1999 added 30 commits July 5, 2024 08:54

Refactor BatchMatMulEmitter and BatchMatMulSelector for improved read…

d8884e6

…ability and maintainability

Refactor import statements for improved readability and maintainability

fc84173

Refactor import statements for improved readability and maintainability

02f64de

disable failure email for ci

397eee6

remove email notifications.

20f6ad1

move relax pass from testing to mlc_llm

b93c394

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into main

ba6a6df

Refactor scripts with se check_eual_ref_scripts_with_emitter function

257693a

Lint Fix

9bb7f49

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into main

39e7614

Refactor scripts with se check_eual_ref_scripts_with_emitter function

93eb5a5

bug fix in test

aa66a90

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into dev

ae14a53

lint fix.

79b08e4

test cuda i4 kernel

86fd036

Refactor copyright notice in i4matmul.hpp

6b73a21

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into dev

0ba90c1

Refactor BitBLASLinear test module for improved readability and maint…

086d208

…ainability

refactor test as version below python 3.9 cannot handle int32 overflow.

47a3abd

format lint for test

024b247

Refactor test_int4b_fp16_convert.py for improved readability and main…

bfedeaa

…tainability

remove unused design file

e672a23

move tile device from package to base

21e5430

dummy impl for codegen

fd11940

Refactor file structure for ladder_permutate module

9ccfa85

Refactor backend class and fix typos in comments

7c7d73e

Deep refactor Lib related code.

47d5fc5

remove ci pull.

53dd0dd

LintFix

d58ac43

refactor builder for whl build

37cb07c

LeiWang1999 added 29 commits August 1, 2024 17:37

Merge block reduce for dequantze config.

d9830ba

fix codeql

e5a4485

chore: Update submodule reference to latest commit

a04282b

chore: Disable common subexpression elimination in TIR passes

314d3e9

Lint Fix

f7d33bb

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into dev

db633ed

4bit related lop3 updates.

201155a

lint fix

2b73662

gptq test fix

1a6a0fd

Fix for test

e84e3ef

lint fix

f0fbb55

lint fix

bf30688

typofix

9a360ba

QuantCompress Test

ee94536

chore: Refactor quant_compress_impl.py for readability and maintainab…

930cd76

…ility

Enhance docs to update latest works.

8c24776

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into dev

c018e3c

Refactor weight executors in Matmul class for improved readability an…

38f1713

…d maintainability

Refactor weight executors in Matmul class for improved readability an…

4a578ce

…d maintainability

Refactor weight executors in Matmul class for improved readability an…

4e7126b

…d maintainability

Merge branch 'main' of https://github.com/Microsoft/BitBLAS into upda…

de9fd2e

…te_transform

removed legacy operator

e405aa2

Refactor weight executors in Matmul class for improved readability an…

5709db1

…d maintainability

LintFix

2d90e7b

Fix GPTQ Repack with the latest weight transform

c2d2cfa

lint fix

ed6a0a1

bug fix for rescale dequantize

d23ab47

test fix

af16059

typo fix

ac316fd

LeiWang1999 merged commit 5d14d31 into microsoft:main Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Dev] Refactor the weight transformation to support upcoming stage3 transform #130

[Dev] Refactor the weight transformation to support upcoming stage3 transform #130

Uh oh!

LeiWang1999 commented Aug 5, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Dev] Refactor the weight transformation to support upcoming stage3 transform #130

[Dev] Refactor the weight transformation to support upcoming stage3 transform #130

Uh oh!

Conversation

LeiWang1999 commented Aug 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactoring and Removal:

New Features:

Matrix Operations Enhancements:

Code Cleanup:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LeiWang1999 commented Aug 5, 2024 •

edited

Loading