Skip to content

Block-wise 4b quantization matmul operator change#18172

Merged
chenfucn merged 16 commits into
microsoft:mainfrom
chenfucn:cfu_blkq4
Nov 3, 2023
Merged

Block-wise 4b quantization matmul operator change#18172
chenfucn merged 16 commits into
microsoft:mainfrom
chenfucn:cfu_blkq4

Conversation

@chenfucn
Copy link
Copy Markdown
Contributor

Description

Replace block-wise 4b quantization implementation

Motivation and Context

In #18101 we have an augmented block-wise 4b quantization interface and implementation. Here we use this new implementation in onnxruntime contrib ops

@chenfucn chenfucn requested a review from a team as a code owner October 30, 2023 17:56
@chenfucn chenfucn changed the title Cfu blkq4 Block-wise 4b quantization matmul operator change Oct 30, 2023
Comment thread onnxruntime/test/contrib_ops/matmul_4bits_test.cc
Comment thread onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc Outdated
Comment thread onnxruntime/contrib_ops/cuda/quantization/matmul_nbits.cc Outdated
Comment thread onnxruntime/contrib_ops/cpu/quantization/matmul_nbits.cc
Comment thread onnxruntime/test/contrib_ops/matmul_4bits_test.cc
chenfucn and others added 3 commits October 31, 2023 14:35
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Comment thread onnxruntime/contrib_ops/cuda/quantization/dequantize_blockwise.cu Outdated
Comment thread onnxruntime/contrib_ops/cuda/quantization/matmul_nbits.cc
Copy link
Copy Markdown
Member

@yufenglee yufenglee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@chenfucn chenfucn merged commit 26b3964 into microsoft:main Nov 3, 2023
@chenfucn chenfucn deleted the cfu_blkq4 branch November 3, 2023 22:29
kleiti pushed a commit to kleiti/onnxruntime that referenced this pull request Mar 22, 2024
### Description
Replace block-wise 4b quantization implementation


### Motivation and Context
In microsoft#18101 we have an
augmented block-wise 4b quantization interface and implementation. Here
we use this new implementation in onnxruntime contrib ops

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants