Skip to content

ORT_ENABLE_ALL changes Constant DequantizeLinear Reshape output for int8 weights #28491

@ALinrunrun

Description

@ALinrunrun

Describe the issue

ONNX Runtime CPUExecutionProvider produces different results for a constant DequantizeLinear -> Reshape pattern when graph optimization is enabled.

The model contains an int8 constant tensor:

[12, -45, 7, -3]

with scale 0.02, followed by DequantizeLinear and Reshape.

With ORT_DISABLE_ALL, the output is correct:

[[0.24, -0.9], [0.14, -0.06]]

With ORT_ENABLE_ALL, the negative dequantized values become 0:

[[0.24, 0.0], [0.14, 0.0]]

This indicates that an optimization or constant-folding path changes the semantics of the model.

To reproduce

import sys
import numpy as np
import onnxruntime as ort
from onnx import TensorProto, helper, numpy_helper

def make_model():
    weights = np.array([12, -45, 7, -3], dtype=np.int8)
    scale = np.float32(0.02)
    shape = np.array([2, 2], dtype=np.int64)

    nodes = [
        helper.make_node("Constant", [], ["wq"], value=numpy_helper.from_array(weights)),
        helper.make_node("Constant", [], ["scale"], value=numpy_helper.from_array(scale)),
        helper.make_node("Constant", [], ["shape"], value=numpy_helper.from_array(shape)),
        helper.make_node("DequantizeLinear", ["wq", "scale"], ["wf"]),
        helper.make_node("Reshape", ["wf", "shape"], ["y"]),
    ]

    graph = helper.make_graph(
        nodes,
        "g",
        [],
        [helper.make_tensor_value_info("y", TensorProto.FLOAT, [2, 2])],
    )

    model = helper.make_model(graph, opset_imports=[helper.make_opsetid("", 21)])
    model.ir_version = 10
    return model.SerializeToString()

def run(level):
    options = ort.SessionOptions()
    options.graph_optimization_level = level
    sess = ort.InferenceSession(
        make_model(),
        sess_options=options,
        providers=["CPUExecutionProvider"],
    )
    return sess.run(None, {})[0]

disabled = run(ort.GraphOptimizationLevel.ORT_DISABLE_ALL)
enabled = run(ort.GraphOptimizationLevel.ORT_ENABLE_ALL)

print("disable_all:", disabled)
print("enable_all:", enabled)
print("max_abs_diff:", float(np.max(np.abs(disabled - enabled))))
print("PASS=", np.allclose(disabled, enabled))

sys.exit(0 if not np.allclose(disabled, enabled) else 1)

Urgency

Expected output

disable_all: [[ 0.24 -0.9 ]
 [ 0.14 -0.06]]
enable_all: [[ 0.24 -0.9 ]
 [ 0.14 -0.06]]
max_abs_diff: 0.0
PASS=True

Actual output

disable_all: [[ 0.24 -0.9 ]
 [ 0.14 -0.06]]
enable_all: [[0.24 0.  ]
 [0.14 0.  ]]
max_abs_diff: 0.8999999761581421
PASS=False

Platform

Linux

OS Version

Linux-6.17.0-20-generic-x86_64-with-glibc2.39

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.25.1

ONNX Runtime API

Python

Architecture

X86

Execution Provider

Default CPU

Execution Provider Library Version

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    quantizationissues related to quantization

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions