Skip to content

Conversation

@LeiWang1999
Copy link
Contributor

@LeiWang1999 LeiWang1999 commented Aug 5, 2024

PR #110 #114 enabling warp memory dequantization through Ladder Transfrom Stage3. However, such a transformation is 32bits level and make it hard to apply when the compressed bits are 2bit or 1bit. We should put the transformation before the weight compression.

This pull request introduced a new transform pipeline. which use the tir version of weight compress ref to pr #126 instead of cpu numpy simulated version.

This pull request includes several changes to the bitblas module, focusing on refactoring and enhancing functionality. The most important changes involve the removal of the matmul and matmul_dequantize modules, the introduction of a deprecation decorator, and updates to various matrix operations.

Refactoring and Removal:

  • Removed matmul and matmul_dequantize modules from bitblas/ops/__init__.py and bitblas/ops/matmul.py. This includes all associated classes and functions. ([[1]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-b4614d98b88a14674bc57a6c3e018791f7585b8310cff91f9bb672d82ccc7f8cL4-R4), [[2]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-f5fc0fd9c3e7bf9bc75de88c0ff9f60fe03a9a6db36d3a6a88ac81107fc47d8fL1-L276))

New Features:

  • Added a deprecated decorator in bitblas/__init__.py to mark functions as deprecated and emit warnings when they are used. ([bitblas/__init__.pyR93-R114](https://github.com/microsoft/BitBLAS/pull/130/files#diff-c2248c838997d60356d36c7ad50e42b7bfcc09238719cc55cccb6b87e48472e7R93-R114))

Matrix Operations Enhancements:

  • Added QuantCompress and QuantCompressConfig to bitblas/ops/general_matmul/__init__.py and integrated them into the dispatch_tir and _assign_weight_compress methods. ([[1]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-74fe5dd2824cb03a0fb2b0a913a2fc5caeb9c08e5368c318cd32b3af7e6f52edR16), [[2]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-74fe5dd2824cb03a0fb2b0a913a2fc5caeb9c08e5368c318cd32b3af7e6f52edR296))
  • Modified transform_weight method in bitblas/ops/general_matmul/__init__.py to handle weight compression and transformation more efficiently. ([[1]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-74fe5dd2824cb03a0fb2b0a913a2fc5caeb9c08e5368c318cd32b3af7e6f52edL455-L458), [[2]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-74fe5dd2824cb03a0fb2b0a913a2fc5caeb9c08e5368c318cd32b3af7e6f52edL467-R492))
  • Added forward and retrieve_output_shape methods to bitblas/ops/ladder_permutate/__init__.py and bitblas/ops/lop3_permutate/__init__.py to enhance tensor operations. ([[1]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-240200715f9a3998fc8f27b583c31a1c2679ed5e7a941c41a7a1b21df23a9abdR61-R77), [[2]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-0f5a4f22da4cc57a720a7a5eb160ff863316113eb51d4c447e332b4ab2bb5114L45-R61))

Code Cleanup:

  • Removed imports and unused variables from various files to improve code clarity and maintainability. ([[1]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-c2248c838997d60356d36c7ad50e42b7bfcc09238719cc55cccb6b87e48472e7L42-R45), [[2]](https://github.com/microsoft/BitBLAS/pull/130/files#diff-240200715f9a3998fc8f27b583c31a1c2679ed5e7a941c41a7a1b21df23a9abdR8))

These changes collectively aim to streamline the codebase, improve functionality, and prepare for future updates.

@LeiWang1999 LeiWang1999 merged commit 5d14d31 into microsoft:main Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant