Describe the issue
I got an error loading the uint8 quantized version of onnx-community/whisper-tiny with onnxruntime 1.25. With onnxruntime 1.24 and older versions everything works fine. I don't see any notes about breaking changes on this.
To reproduce
Code
from huggingface_hub import hf_hub_download
import onnxruntime as rt
path = hf_hub_download("onnx-community/whisper-tiny", "onnx/decoder_model_merged_uint8.onnx")
model = rt.InferenceSession(path)
Error:
[ONNXRuntimeError] : 1 : FAIL : qdq_actions.cc:136 TransposeDQWeightsForMatMulNBits Missing required scale: model.decoder.embed_tokens.weight_merged_0_scale for node: model.decoder.embed_tokens.weight_transposed_DequantizeLinear
Urgency
This broke the use of popular Whisper models with my onnx-asr library.
Platform
Linux
OS Version
Debian 13
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.25.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Describe the issue
I got an error loading the
uint8quantized version ofonnx-community/whisper-tinywith onnxruntime 1.25. With onnxruntime 1.24 and older versions everything works fine. I don't see any notes about breaking changes on this.To reproduce
Code
Error:
Urgency
This broke the use of popular Whisper models with my onnx-asr library.
Platform
Linux
OS Version
Debian 13
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.25.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response