Dynamic quantization of multilingual miniLM - output does not match float32 version. Onnxruntime 1.9.0 

**Describe the bug**
First thanks a lot for all your hard work on onnxruntime, I love it! To the issue at hand:

The quantized version of the original onnx model output does look anything like the float onnx model and also downstream accuracy suffers greatly. I'm thinking that something is off with regards to this particular model and quantization.

The model is a multilingual MiniLM model fine trained for sequence classification with a classification head. Model card is on [huggingface hub](https://huggingface.co/unicamp-dl/multilingual-MiniLM-L6-v2-multi-msmarco).

The model when exported to onnx with float32 works as expected with very similar score and also downstream accuracy as when using torch, however when using quantization the model output score does not match the float versions and similar the downstream task  accuracy drops significantly (next to random behaviour). 

**Urgency**
None

**System information**
- MacOS X
- ONNX Runtime version: 1.9.0
- Python version: 3.8.5

**To Reproduce**
The model could be downloaded from the internet, see instructions in the following notebook which also demonstrates
the output score difference. See [notebook](https://github.com/jobergum/notebooks/blob/master/QuantizationProblem.ipynb)


**Expected behavior**
I would expect that the output is roughly the same score as the float version. I've successfully quantized  subword tokenized MiniLM with classification heads with little or no accuracy drop as compared to the float32 version,  but having some trouble with this model so I'm reaching out for help 💯 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamic quantization of multilingual miniLM - output does not match float32 version. Onnxruntime 1.9.0 #9599

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dynamic quantization of multilingual miniLM - output does not match float32 version. Onnxruntime 1.9.0 #9599

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions