Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNX] Fix an assertion failure involving Slice (#71965) #72989

Closed
wants to merge 2 commits into from

Conversation

BowenBao
Copy link
Collaborator

@BowenBao BowenBao commented Feb 17, 2022

Stack from ghstack:

Before this change, exporting a model to ONNX involving Slice crashes at axes[i] in line 153 if C++ assertions are enabled:

/usr/include/c++/11.1.0/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = long int; _Alloc = std::allocator<long int>; std::vector<_Tp, _Alloc>::reference = long int&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__n < this->size()' failed.

The relevant check is https://github.com/gcc-mirror/gcc/blob/releases/gcc-11.1.0/libstdc++-v3/include/bits/stl_vector.h#L1045, which checks the vector index.

The issue can be reproduced by exporting Mask R-CNN or similar ones. For example,

import io
import torch
import torchvision as tv

model = tv.models.detection.maskrcnn_resnet50_fpn(pretrained=False)
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
with io.BytesIO() as f:
    torch.onnx.export(model, x, f, opset_version=11)

(extracted from onnxoptimizer tests)

Tested environment: Arch Linux x86_64 with pytorch and torchvisoin installed from the official repo and AUR, respectively.

Before this change, exporting a model to ONNX involving Slice crashes at `axes[i]` in line 153 if C++ assertions are enabled:
```
/usr/include/c++/11.1.0/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = long int; _Alloc = std::allocator<long int>; std::vector<_Tp, _Alloc>::reference = long int&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__n < this->size()' failed.
```
The relevant check is https://github.com/gcc-mirror/gcc/blob/releases/gcc-11.1.0/libstdc++-v3/include/bits/stl_vector.h#L1045, which checks the vector index.

The issue can be reproduced by exporting Mask R-CNN or similar ones. For example,
```Python
import io
import torch
import torchvision as tv

model = tv.models.detection.maskrcnn_resnet50_fpn(pretrained=False)
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
with io.BytesIO() as f:
    torch.onnx.export(model, x, f, opset_version=11)
```
(extracted from [onnxoptimizer tests](https://github.com/onnx/optimizer/blob/master/onnxoptimizer/test/optimizer_test.py))

Tested environment: Arch Linux x86_64 with pytorch and torchvisoin installed from [the official repo](https://github.com/archlinux/svntogit-community/blob/packages/python-pytorch/trunk/PKGBUILD) and [AUR](https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=python-torchvision), respectively.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 17, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/5f27273b485af6fef46dc62e21b2365f5cc48da9/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default
Add ciflow labels to this PR to trigger more builds:

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
linux-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
linux-binary-manywheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-bionic-rocm4.5-py3.7 ciflow/all, ciflow/default, ciflow/linux, ciflow/rocm, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7-no-ops ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
macos-arm64-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-arm64-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
macos-binary-conda ciflow/binaries, ciflow/binaries_conda, ciflow/default ✅ triggered
macos-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
macos-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
windows-binary-libtorch-cxx11-abi ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-libtorch-pre-cxx11 ciflow/binaries, ciflow/binaries_libtorch, ciflow/default ✅ triggered
windows-binary-wheel ciflow/binaries, ciflow/binaries_wheel, ciflow/default ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops ciflow/all, ciflow/cuda, ciflow/linux, ciflow/trunk 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
pytorch-xla-linux-bionic-py3.7-clang8 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk, ciflow/xla 🚫 skipped

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Feb 17, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 5900947 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Feb 17, 2022
Before this change, exporting a model to ONNX involving Slice crashes at `axes[i]` in line 153 if C++ assertions are enabled:
```
/usr/include/c++/11.1.0/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = long int; _Alloc = std::allocator<long int>; std::vector<_Tp, _Alloc>::reference = long int&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__n < this->size()' failed.
```
The relevant check is https://github.com/gcc-mirror/gcc/blob/releases/gcc-11.1.0/libstdc++-v3/include/bits/stl_vector.h#L1045, which checks the vector index.

The issue can be reproduced by exporting Mask R-CNN or similar ones. For example,
```Python
import io
import torch
import torchvision as tv

model = tv.models.detection.maskrcnn_resnet50_fpn(pretrained=False)
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
with io.BytesIO() as f:
    torch.onnx.export(model, x, f, opset_version=11)
```
(extracted from [onnxoptimizer tests](https://github.com/onnx/optimizer/blob/master/onnxoptimizer/test/optimizer_test.py))

Tested environment: Arch Linux x86_64 with pytorch and torchvisoin installed from [the official repo](https://github.com/archlinux/svntogit-community/blob/packages/python-pytorch/trunk/PKGBUILD) and [AUR](https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=python-torchvision), respectively.

[ghstack-poisoned]
Copy link
Collaborator

@garymm garymm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💮

@BowenBao
Copy link
Collaborator Author

@pytorchbot merge this

@github-actions
Copy link

Hey @BowenBao.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 21, 2022
Before this change, exporting a model to ONNX involving Slice crashes at `axes[i]` in line 153 if C++ assertions are enabled:
```
/usr/include/c++/11.1.0/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = long int; _Alloc = std::allocator<long int>; std::vector<_Tp, _Alloc>::reference = long int&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__n < this->size()' failed.
```
The relevant check is https://github.com/gcc-mirror/gcc/blob/releases/gcc-11.1.0/libstdc++-v3/include/bits/stl_vector.h#L1045, which checks the vector index.

The issue can be reproduced by exporting Mask R-CNN or similar ones. For example,
```Python
import io
import torch
import torchvision as tv

model = tv.models.detection.maskrcnn_resnet50_fpn(pretrained=False)
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
with io.BytesIO() as f:
    torch.onnx.export(model, x, f, opset_version=11)
```
(extracted from [onnxoptimizer tests](https://github.com/onnx/optimizer/blob/master/onnxoptimizer/test/optimizer_test.py))

Tested environment: Arch Linux x86_64 with pytorch and torchvisoin installed from [the official repo](https://github.com/archlinux/svntogit-community/blob/packages/python-pytorch/trunk/PKGBUILD) and [AUR](https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=python-torchvision), respectively.

Pull Request resolved: pytorch/pytorch#72989
@facebook-github-bot facebook-github-bot deleted the gh/BowenBao/136/head branch February 22, 2022 15:17
facebook-github-bot pushed a commit that referenced this pull request Feb 22, 2022
Summary:
Before this change, exporting a model to ONNX involving Slice crashes at `axes[i]` in line 153 if C++ assertions are enabled:
```
/usr/include/c++/11.1.0/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = long int; _Alloc = std::allocator<long int>; std::vector<_Tp, _Alloc>::reference = long int&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__n < this->size()' failed.
```
The relevant check is https://github.com/gcc-mirror/gcc/blob/releases/gcc-11.1.0/libstdc++-v3/include/bits/stl_vector.h#L1045, which checks the vector index.

The issue can be reproduced by exporting Mask R-CNN or similar ones. For example,
```Python
import io
import torch
import torchvision as tv

model = tv.models.detection.maskrcnn_resnet50_fpn(pretrained=False)
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
with io.BytesIO() as f:
    torch.onnx.export(model, x, f, opset_version=11)
```
(extracted from [onnxoptimizer tests](https://github.com/onnx/optimizer/blob/master/onnxoptimizer/test/optimizer_test.py))

Tested environment: Arch Linux x86_64 with pytorch and torchvisoin installed from [the official repo](https://github.com/archlinux/svntogit-community/blob/packages/python-pytorch/trunk/PKGBUILD) and [AUR](https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=python-torchvision), respectively.

Pull Request resolved: #72989

Test Plan: automation

Reviewed By: seemethere

Differential Revision: D34347596

Pulled By: seemethere

fbshipit-source-id: 3d33eee5654dfae23d12247f1e74e6d6bc9fdd9d
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 3, 2022
Before this change, exporting a model to ONNX involving Slice crashes at `axes[i]` in line 153 if C++ assertions are enabled:
```
/usr/include/c++/11.1.0/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = long int; _Alloc = std::allocator<long int>; std::vector<_Tp, _Alloc>::reference = long int&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__n < this->size()' failed.
```
The relevant check is https://github.com/gcc-mirror/gcc/blob/releases/gcc-11.1.0/libstdc++-v3/include/bits/stl_vector.h#L1045, which checks the vector index.

The issue can be reproduced by exporting Mask R-CNN or similar ones. For example,
```Python
import io
import torch
import torchvision as tv

model = tv.models.detection.maskrcnn_resnet50_fpn(pretrained=False)
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
with io.BytesIO() as f:
    torch.onnx.export(model, x, f, opset_version=11)
```
(extracted from [onnxoptimizer tests](https://github.com/onnx/optimizer/blob/master/onnxoptimizer/test/optimizer_test.py))

Tested environment: Arch Linux x86_64 with pytorch and torchvisoin installed from [the official repo](https://github.com/archlinux/svntogit-community/blob/packages/python-pytorch/trunk/PKGBUILD) and [AUR](https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=python-torchvision), respectively.

Pull Request resolved: pytorch/pytorch#72989
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 3, 2022
Before this change, exporting a model to ONNX involving Slice crashes at `axes[i]` in line 153 if C++ assertions are enabled:
```
/usr/include/c++/11.1.0/bits/stl_vector.h:1045: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](std::vector<_Tp, _Alloc>::size_type) [with _Tp = long int; _Alloc = std::allocator<long int>; std::vector<_Tp, _Alloc>::reference = long int&; std::vector<_Tp, _Alloc>::size_type = long unsigned int]: Assertion '__n < this->size()' failed.
```
The relevant check is https://github.com/gcc-mirror/gcc/blob/releases/gcc-11.1.0/libstdc++-v3/include/bits/stl_vector.h#L1045, which checks the vector index.

The issue can be reproduced by exporting Mask R-CNN or similar ones. For example,
```Python
import io
import torch
import torchvision as tv

model = tv.models.detection.maskrcnn_resnet50_fpn(pretrained=False)
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
with io.BytesIO() as f:
    torch.onnx.export(model, x, f, opset_version=11)
```
(extracted from [onnxoptimizer tests](https://github.com/onnx/optimizer/blob/master/onnxoptimizer/test/optimizer_test.py))

Tested environment: Arch Linux x86_64 with pytorch and torchvisoin installed from [the official repo](https://github.com/archlinux/svntogit-community/blob/packages/python-pytorch/trunk/PKGBUILD) and [AUR](https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=python-torchvision), respectively.

Pull Request resolved: pytorch/pytorch#72989
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed oncall: jit Add this issue/PR to JIT oncall triage queue open source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants