Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLAS] add q4 quantize and transpose kernel to support MatMulNBits QDQ fuse #21054

Merged
merged 25 commits into from
Jun 20, 2024

Conversation

fajin-corp
Copy link
Contributor

@fajin-corp fajin-corp commented Jun 15, 2024

Description

  1. added kernel to quantize matmul B tensor to q4, and store in the same shape as original tensor. scales and zero points are calculated as well. scales and zero points have the same shape.
  2. added kernel to transpose q4 B tensor to B tensor in MatMulNBits. Scales and zero points are transposed as well.

Benchmark
<1024 x 4096 input, 64 quant block, 8 threads>:

  • quantize: 23035923 ns
  • transpose: 718635 ns

<1024 x 4095 input, 64 quant block, 8 threads>:

  • quantize: 26759319 ns
  • transpose: 1279064 ns

Motivation and Context

The MatMulNbits tool chain current only supports converting a MatMul op direct to MatMulNBits op. MatMulNbits op is not an ONNX standard op.
Therefore, we need the tool chain to support converting MatMul to Q/DQ format, and later in the transform step converts DQ + MatMul to MatMulNBits. The tensors stored in DQ are the quantized constants and will be stored in the MatMulNBits.

@fajin-corp fajin-corp requested a review from a team as a code owner June 15, 2024 00:17
@fajin-corp fajin-corp force-pushed the fajin/qdqmatmulnbitskkernels branch from 14fcdb7 to ed9421a Compare June 17, 2024 17:39
@yufenglee
Copy link
Member

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@yufenglee
Copy link
Member

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models

@yufenglee
Copy link
Member

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

onnxruntime/core/mlas/lib/q4_dq.cpp Fixed Show fixed Hide fixed
onnxruntime/core/mlas/lib/q4_dq.cpp Fixed Show fixed Hide fixed
onnxruntime/core/mlas/lib/q4_dq.cpp Fixed Show fixed Hide fixed
onnxruntime/core/mlas/lib/q4_dq.cpp Fixed Show fixed Hide fixed
onnxruntime/core/mlas/lib/q4_dq.cpp Fixed Show fixed Hide fixed
onnxruntime/core/mlas/lib/q4_dq.cpp Fixed Show fixed Hide fixed
onnxruntime/core/mlas/lib/q4_dq.cpp Fixed Show fixed Hide fixed
onnxruntime/core/mlas/lib/q4_dq.cpp Fixed Show fixed Hide fixed
onnxruntime/core/mlas/lib/q4_dq.cpp Fixed Show fixed Hide fixed
onnxruntime/test/mlas/unittest/test_blockq4.cpp Dismissed Show dismissed Hide dismissed
onnxruntime/core/util/qmath.h Dismissed Show dismissed Hide dismissed
onnxruntime/core/util/qmath.h Dismissed Show dismissed Hide dismissed
onnxruntime/core/util/qmath.h Dismissed Show dismissed Hide dismissed
onnxruntime/core/util/qmath.h Dismissed Show dismissed Hide dismissed
onnxruntime/core/util/qmath.h Dismissed Show dismissed Hide dismissed
onnxruntime/core/util/qmath.h Dismissed Show dismissed Hide dismissed
onnxruntime/core/util/qmath.h Dismissed Show dismissed Hide dismissed
onnxruntime/core/util/qmath.h Dismissed Show dismissed Hide dismissed
onnxruntime/core/util/qmath.h Dismissed Show dismissed Hide dismissed
onnxruntime/core/util/qmath.h Dismissed Show dismissed Hide dismissed
yufenglee
yufenglee previously approved these changes Jun 19, 2024
Copy link
Member

@yufenglee yufenglee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@fajin-corp
Copy link
Contributor Author

/azp run Windows CPU CI Pipeline,orttraining-linux-ci-pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@fajin-corp fajin-corp merged commit 6817b01 into main Jun 20, 2024
100 checks passed
@fajin-corp fajin-corp deleted the fajin/qdqmatmulnbitskkernels branch June 20, 2024 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants