`torch.Tensor.as_strided` yields not the same results after conversion to ONNX with `CPUExecutionProvider` #13920

fxmarty · 2022-12-09T21:11:56Z

Describe the issue

Exporting a very simple PyTorch model with a tensor.as_strided() operation, no warning or error is raised during the export.

However, the results are different compared to PyTorch when running with ONNX Runtime. It could be related to a limited dynamic shape support.

To reproduce

Define the model:

import torch
import torch.nn as nn

class MyModel3(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x: torch.Tensor):
        a, b, c = x.size()

        strides_original = x.stride()

        shape = (a, b // 2, 4)
        stride = (strides_original[0] // 2, strides_original[1] - 3, 3)
        
        # stride = (x[0][0][0], 4, 3)  <-- if used instead, this will raise an error

        x_strided = x.as_strided(size=shape, stride=stride)

        return x_strided

Export to ONNX:

model = MyModel3()
x = torch.randint(8, (50, 30, 15)) + 1
res = model(x)
print(res)

torch.onnx.export(
    model,
    (x,),
    "/home/fxmarty/asstrided_model.onnx",
    input_names=["x"],
    output_names=["x_out"],
    dynamic_axes={"x":  {0: "axis0", 1: "axis1", 2: "axis2"}},
    opset_version=14
)

No warning or error whatsoever is shown during the export.

Then, compare the inference between PyTorch and ONNX Runtime with CPUExecutionProvider:

import onnxruntime as ort
import numpy as np

session = ort.InferenceSession("/home/fxmarty/asstrided_model.onnx", providers=["CPUExecutionProvider"])

inp = {
    "x": np.random.randint(8, size=(45, 56, 29)) + 1,
}

res = session.run(None, inp)

model = MyModel3()
with torch.no_grad():
    res_pt = model(torch.tensor(inp["x"]))

res_ort = res[0]
res_pt_np = res_pt.numpy()

assert res_ort.shape == res_pt_np.shape

diff = np.max(np.abs(res_ort - res_pt_np))
print(f"[x] Maxdiff: {diff}")

Prints:

[x] Maxdiff: 0.9846013188362122

Same issue doing the export with opset 15, 16, 17.

PyTorch version: 1.13.1

@justinchuby https://www.justinchuby.com/torch-onnx-op-matrix/ shows Broken support for as_strided, is it related to my issue? In pytorch/pytorch#80039 as_strided is marked as supported so not sure.

Thanks everyone!

Urgency

mediumish

Platform

Linux

OS Version

Linux 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.13.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

justinchuby · 2022-12-09T21:19:36Z

The support matrix shows an export error, whereas we are seeing a runtime error here, so the errors detected there may be different than yours:

It is still possible that the exported model is incorrect. Could you also share the exported onnx model here?

fxmarty · 2022-12-09T21:29:01Z

Oh did not know we could click on there. Awesome!

Yes, you can find the model exported as in my original post here and a preview in netron here.

I suspect that the strides_original = x.stride() is not raising an error while maybe it should. Trying to put dynamic stride by using the shape values (a, b, c), or e.g. x[0][0][0] (see the updated comment in the original model for an example), then torch.onnx.export rightfully raises:

torch.onnx.errors.SymbolicValueError: Failed to export a node '%17 : Long(requires_grad=0, device=cpu) = onnx::Gather[axis=0](%16, %1), scope: __main__.MyModel3:: # /home/fxmarty/test_torchsript.py:48:0
' (in list node %21 : int[] = prim::ListConstruct(%17, %18, %20), scope: __main__.MyModel3::
) because it is not constant. Please try to make things (e.g. kernel sizes) static if possible.  [Caused by the value '21 defined in (%21 : int[] = prim::ListConstruct(%17, %18, %20), scope: __main__.MyModel3::
)' (type 'List[int]') in the TorchScript graph. The containing node has kind 'prim::ListConstruct'.]

Edit: oh, yes, it seems the graph is wrong. There are hard-coded values in Mul coming from the example given in torch.onnx.export. Should I rather open an issue under pytorch? I think this is somewhat very minor so maybe not worth it. Did still spend a quite long time debugging this on a large model!

justinchuby · 2022-12-09T22:37:04Z

Yes, please open an issue on torch and mention me. Thanks for doing the hard work to isolate the issue!

fxmarty · 2022-12-10T09:16:07Z

Great thanks!

fxmarty mentioned this issue Dec 9, 2022

ONNX Conversion for LongFormer predictions different huggingface/optimum#505

Open

4 tasks

fxmarty closed this as completed Dec 10, 2022

fxmarty mentioned this issue Dec 10, 2022

Export to ONNX of as_strided() hard codes stride in the graph, although it should be dynamic pytorch/pytorch#90607

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`torch.Tensor.as_strided` yields not the same results after conversion to ONNX with `CPUExecutionProvider` #13920

`torch.Tensor.as_strided` yields not the same results after conversion to ONNX with `CPUExecutionProvider` #13920

fxmarty commented Dec 9, 2022 •

edited

Loading

justinchuby commented Dec 9, 2022

fxmarty commented Dec 9, 2022 •

edited

Loading

justinchuby commented Dec 9, 2022

fxmarty commented Dec 10, 2022

torch.Tensor.as_strided yields not the same results after conversion to ONNX with CPUExecutionProvider #13920

torch.Tensor.as_strided yields not the same results after conversion to ONNX with CPUExecutionProvider #13920

Comments

fxmarty commented Dec 9, 2022 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

justinchuby commented Dec 9, 2022

fxmarty commented Dec 9, 2022 • edited Loading

justinchuby commented Dec 9, 2022

fxmarty commented Dec 10, 2022

`torch.Tensor.as_strided` yields not the same results after conversion to ONNX with `CPUExecutionProvider` #13920

`torch.Tensor.as_strided` yields not the same results after conversion to ONNX with `CPUExecutionProvider` #13920

fxmarty commented Dec 9, 2022 •

edited

Loading

fxmarty commented Dec 9, 2022 •

edited

Loading