microsoft · LeiWang1999 · Jul 23, 2024 · Jul 23, 2024
diff --git a/README.md b/README.md
@@ -9,7 +9,7 @@ Some of the key features of BitBLAS include:
     - $W_{wdtype}A_{adtype}$ mixed-precision matrix multiplication including FP16xINT4/2/1, INT8xINT4/2/1, etc. Please checkout [support matrix](#support-matrix) for detailed data types support.
     - Matrix multiplication like FP16xFP16 and INT8xINT8.
   - Auto-Tensorization for TensorCore-like hardware instructions.
-  - Implemented [integration](https://github.com/microsoft/BitBLAS/blob/main/integration/) to [PyTorch](https://pytorch.org/), [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ), [vLLM](https://github.com/vllm-project/vllm) and [BitNet-b1.58](https://huggingface.co/1bitLLM/bitnet_b1_58-3B) for LLM deployment. Please checkout [benchmark summary](#benchmark-summary) for detailed end2end LLM inference performance.
+  - Implemented [integration](https://github.com/microsoft/BitBLAS/blob/main/integration/) to [PyTorch](https://pytorch.org/), [GPTQModel](https://github.com/ModelCloud/GPTQModel), [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ), [vLLM](https://github.com/vllm-project/vllm) and [BitNet-b1.58](https://huggingface.co/1bitLLM/bitnet_b1_58-3B) for LLM deployment. Please checkout [benchmark summary](#benchmark-summary) for detailed end2end LLM inference performance.
   - BitBLAS first implemented $W_{INT2}A_{INT8}$ GEMV/GEMM in [BitNet-b1.58](https://arxiv.org/abs/2402.17764) with 8x/2x speedup over cuBLAS $W_{FP16}A_{FP16}$ on A100, please checkout [op_benchmark_a100_int2_scaling](https://github.com/microsoft/BitBLAS/blob/main/images/figures/op_benchmark_a100_int2_scaling.png) for detailed benchmark results. Please checkout [BitNet-b1.58 integration](https://github.com/microsoft/BitBLAS/blob/main/integration/BitNet) for the integration with the 3rdparty reproduced BitNet-b1.58 model.
   - Support customizing mixed-precision DNN operations for your specific scenarios via the flexible DSL (TIR Script).
 

diff --git a/integration/GPTQModel/README.md b/integration/GPTQModel/README.md
@@ -0,0 +1,3 @@
+BitBLAS has been fully integraded into [GPTQModel](https://github.com/ModelCloud/GPTQModel) since v0.9.1. 
+
+Please reference [sample code](https://github.com/ModelCloud/GPTQModel/blob/main/examples/inference/run_with_different_backends.py) for usage on using `backend=BACKEND.BITBLAS`within GPTQModel.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		BitBLAS has been fully integraded into [GPTQModel](https://github.com/ModelCloud/GPTQModel) since v0.9.1.

		Please reference [sample code](https://github.com/ModelCloud/GPTQModel/blob/main/examples/inference/run_with_different_backends.py) for usage on using `backend=BACKEND.BITBLAS`within GPTQModel.