ONNX Model Producing Different Results Compared to Original PyTorch and JIT Traced Model

### 🐛 Describe the bug

I have a PyTorch model that I want to convert to ONNX format. I have managed to produce a traced version of the model with `torch.jit.trace` which works correctly, but exporting this to ONNX produces a model that gives back different (wrong) outputs.

I have attached the serialised traced model is in this [zip folder](https://github.com/pytorch/pytorch/files/12313364/postcvpr.zip). The original PyTorch model is a slightly modified version of [HOOD](https://github.com/Dolorousrtur/HOOD) to get tracing working (inputs/outputs turned into flat `tuple`s of `Tensor`s instead of `HeteroData` batches) etc...

See https://gist.github.com/NathanielB123/75fa9229f49ccdf4af3628abf73c4d12 for the script I used to export to `onnx` and print a sample of the outputs.

The full terminal output after running this code is:
```
/home/nathaniel/miniconda3/envs/hood/lib/python3.10/site-packages/torch/onnx/utils.py:825: UserWarning: no signature found for <torch.ScriptMethod object at 0x7fb25f51bc40>, skipping _decide_input_format
  warnings.warn(f"{e}, skipping _decide_input_format")
============= Diagnostic Run torch.onnx.export version 2.0.1+cu117 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

0:
[0.11268847 0.11745856 1.171965  ]
[-0.09882738 -0.75298095  0.57629055]
500:
[ 0.36263114 -0.09879217  0.8864989 ]
[-0.21008204 -0.9361933   0.81566393]
1000:
[-0.6373062   0.28653488  1.4825789 ]
[-1.2088397  -0.39165828  1.3485816 ]
1500:
[-0.17704955  0.5167098   1.6783938 ]
[ 0.0593356  -0.71061885  1.5147011 ]
2000:
[-0.41957054  0.12701258  1.0791388 ]
[-0.35199475 -0.79554534  0.8017714 ]
2500:
[-0.5762653  -0.23102543  0.46685016]
[-1.1951463  -0.57939434  0.12812436]
3000:
[-0.01403345 -0.07792548  0.9337326 ]
[-0.79627764 -0.7208227   0.6344711 ]
3500:
[-0.6358214   0.34578276  1.105367  ]
[-0.8967222 -1.1001766  0.4269399]
4000:
[-0.20787726  0.01798273  1.1856971 ]
[-0.39938125 -0.7097303   0.6083367 ]
```
As you can see, the vectors outputted by each model are (very) different.




### Versions

```
Collecting environment information...
PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.2 LTS (x86_64)
GCC version: (Ubuntu 13.1.0-8ubuntu1~22.04) 13.1.0
Clang version: Could not collect
CMake version: version 3.27.1
Libc version: glibc-2.35

Python version: 3.10.9 (main, Mar  8 2023, 10:47:38) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.90.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Quadro T2000 with Max-Q Design
Nvidia driver version: 531.41
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   39 bits physical, 48 bits virtual
Byte Order:                      Little Endian
CPU(s):                          12
On-line CPU(s) list:             0-11
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz
CPU family:                      6
Model:                           165
Thread(s) per core:              2
Core(s) per socket:              6
Socket(s):                       1
Stepping:                        2
BogoMIPS:                        5616.00
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves flush_l1d arch_capabilities
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       192 KiB (6 instances)
L1i cache:                       192 KiB (6 instances)
L2 cache:                        1.5 MiB (6 instances)
L3 cache:                        12 MiB (1 instance)
Vulnerability Itlb multihit:     KVM: Mitigation: VMX unsupported
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Retbleed:          Mitigation; Enhanced IBRS
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds:             Unknown: Dependent on hypervisor status
Vulnerability Tsx async abort:   Not affected

Versions of relevant libraries:
[pip3] numpy==1.22.4
[pip3] pytorch3d==0.7.4
[pip3] torch==2.0.1
[pip3] torch-cluster==1.6.1
[pip3] torch-geometric==2.3.1
[pip3] torch-scatter==2.1.1
[pip3] torch-sparse==0.6.17
[pip3] torchaudio==2.0.2
[pip3] torchinfo==1.8.0
[pip3] torchvision==0.15.2
[pip3] triton==2.0.0
[conda] blas                      1.0                         mkl    conda-forge
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2023.1.0         h6d00ec8_46342  
[conda] mkl-service               2.4.0           py310h5eee18b_1  
[conda] mkl_fft                   1.3.6           py310h1128e8f_1  
[conda] mkl_random                1.2.2           py310h1128e8f_1  
[conda] mxnet-mkl                 1.6.0                    pypi_0    pypi
[conda] numpy                     1.22.4                   pypi_0    pypi
[conda] pytorch                   2.0.1           py3.10_cuda11.7_cudnn8.5.0_0    pytorch
[conda] pytorch-cuda              11.7                 h778d358_5    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] pytorch3d                 0.7.4           py310_cu117_pyt201    pytorch3d
[conda] torch                     2.0.1                    pypi_0    pypi
[conda] torch-cluster             1.6.1                    pypi_0    pypi
[conda] torch-geometric           2.3.1                    pypi_0    pypi
[conda] torch-scatter             2.1.1                    pypi_0    pypi
[conda] torch-sparse              0.6.17                   pypi_0    pypi
[conda] torchaudio                2.0.2               py310_cu117    pytorch
[conda] torchinfo                 1.8.0                    pypi_0    pypi
[conda] torchtriton               2.0.0                     py310    pytorch
[conda] torchvision               0.15.2              py310_cu117    pytorch
[conda] triton                    2.0.0                    pypi_0    pypi
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ONNX Model Producing Different Results Compared to Original PyTorch and JIT Traced Model #106967

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ONNX Model Producing Different Results Compared to Original PyTorch and JIT Traced Model #106967

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions