output is different between onnx and model #20478

MichaelH717 · 2024-04-26T06:20:30Z

Describe the issue

when i try to convert layernorm to onnx, I found that the precision between onnx and pytorch model is different,

the output is:

model: tensor([[ 1.7345, -0.6264, 0.3472, -1.2434, 0.8797, -1.0218, -0.8755, 1.1631,
-0.8686, 0.4889],
[ 1.1136, -1.0294, -0.0380, -0.4270, 1.4148, -0.4356, -1.0604, 1.6713,
-1.1861, -0.0328]])
onnx: [array([[ 1.7370378 , -0.6239623 , 0.34969315, -1.2410678 , 0.88223237,
-1.019367 , -0.87309 , 1.1656438 , -0.86623335, 0.49139884],
[ 1.1136355 , -1.0292952 , -0.03786705, -0.42686492, 1.4148507 ,
-0.43547106, -1.0602773 , 1.6713139 , -1.1859272 , -0.03270336]],
dtype=float32)]
can anybody solve this issue?

To reproduce

here my easy python test code:

import torch
import torch.nn as nn
import torch.onnx
import onnxruntime

import torch
import onnx

class SimpleModel(nn.Module):
def init(self, num_features):
super(SimpleModel, self).init()
self.layer_norm = nn.LayerNorm(num_features)

def forward(self, x):
    x = self.layer_norm(x)
    return x

def onnx_export(dummy_input, num_features):
model = SimpleModel(num_features)
model.eval()

torch.onnx.export(
    model,
    dummy_input,
    "model_with_layernorm.onnx",
    export_params=True,
    opset_version=16,
    do_constant_folding=True,
    input_names=['input'],
    output_names=['output'],
)
print("ONNX finish")

def result_test(dummy_input, num_features):
sm = SimpleModel(num_features)
sm.train()
model_out = sm(dummy_input)
onnx_path = '/home/mengyaohuang/python/model_with_layernorm.onnx'

print("model: {}".format(model_out))
providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if torch.cuda.is_available() else [
    'CPUExecutionProvider']
input_map_sdc = {'input': dummy_input.numpy()}
ort_session = onnxruntime.InferenceSession(onnx_path, providers=providers)
output = ort_session.run(None, input_map_sdc)
print("onnx: {}".format(output))

if name == "main":
num_features = 10
dummy_input = torch.triu(torch.ones(2, num_features), diagonal=1)
A = torch.tensor([[-10240.355, -15141.355, -14749.948, -3194.9736, -13981.226, -20323.963, -16821.863, -23410.441, -7674.426, -4421.628],
[-10240.355, -15141.355, -14749.948, -3194.9736, -13981.226, -20323.963, -16821.863, -23410.441, -7674.426, -4421.628]])
dummy_input *= torch.tensor(100000)

B = torch.tensor([[-9999.212, -10000.221, -9999.805, -10000.484, -9999.577, -10000.39, -10000.327, -9999.456, -10000.324, -9999.744],
                  [-9999.581, -10000.797, -10000.234, -10000.455, -9999.41, -10000.46, -10000.814, -9999.265, -10000.886, -10000.231]])
    
# onnx_export(dummy_input, num_features)
result_test(B, num_features)

Urgency

No response

Platform

Linux

OS Version

ubuntu20.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.14.1

ONNX Runtime API

Python

Architecture

X86

Execution Provider

CUDA

Execution Provider Library Version

CUDA11.7

The text was updated successfully, but these errors were encountered:

skottmckay · 2024-04-26T23:35:51Z

Are the predictions different? Or just that the floating point numbers vary slightly, which is expected.

#19449 (comment)

MichaelH717 · 2024-04-27T03:50:45Z

Are the predictions different? Or just that the floating point numbers vary slightly, which is expected.

#19449 (comment)

the difference of input A is:
[[[2.9802322e-08 5.9604645e-08 5.9604645e-08 0.0000000e+00 2.9802322e-08
1.1920929e-07 5.9604645e-08 0.0000000e+00 0.0000000e+00 0.0000000e+00]
[2.0861626e-07 1.7881393e-07 1.7881393e-07 2.3841858e-07 1.6391277e-07
1.1920929e-07 1.7881393e-07 1.1920929e-07 2.3841858e-07 2.3841858e-07]]]

the difference of input B is:
[[[2.5670528e-03 2.4108291e-03 2.4752617e-03 2.3699999e-03 2.5104880e-03
2.3846626e-03 2.3943782e-03 2.5292635e-03 2.3947954e-03 2.4846196e-03]
[3.7431717e-05 1.3673306e-04 9.0770423e-05 1.0877848e-04 2.3484230e-05
1.0919571e-04 1.3816357e-04 1.1563301e-05 1.4400482e-04 9.0532005e-05]]]
i hope the difference will less than 1e-5

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

output is different between onnx and model #20478

output is different between onnx and model #20478

MichaelH717 commented Apr 26, 2024

skottmckay commented Apr 26, 2024

MichaelH717 commented Apr 27, 2024

output is different between onnx and model #20478

output is different between onnx and model #20478

Comments

MichaelH717 commented Apr 26, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

skottmckay commented Apr 26, 2024

MichaelH717 commented Apr 27, 2024