fix: clip-onnx spec 9 & 11 #70584

ganler · 2022-01-03T18:13:40Z

clip-v6 and clip-v9 only support float(16/32/64) as inputs such that prior implementation will make exported clamp(int_min, int_max) fail in ONNXRuntime.

To fix it, use the "cast-clip-cast" pattern as a workaround.

cc: @peterbell10 @BowenBao

pytorch-probot · 2022-01-03T18:13:42Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/ganler/pytorch/blob/bc263ac8354296485666c1e11622e427b1f98024/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2022-01-03T18:13:45Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/70584
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit f7610b8 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ganler · 2022-01-12T01:48:42Z

@thiagocrepaldi @BowenBao Could you kindly review this PR? :-)

thiagocrepaldi

Why doesn't Operators.md list opset9 for clip? I expected to either have opset 9 listed or not having a symbolic for clip on symbolic_opset9.py.

@ganler Please clarify

ganler · 2022-01-24T18:23:00Z

@thiagocrepaldi Thanks for the reply! and I am sorry for the lack of clarification.

You are right that Clip does not have a version called version 9. But I just found that the clip operator was originally implemented in symbolic_opset9 (

pytorch/torch/onnx/symbolic_opset9.py

Line 1594 in 4b47047

def clamp(g, self, min, max):

) which I thought represents new/changed operators in version 9.

According to the doc, Clip has version [13, 12, 11, 6, 1] where [11, 6, 1] do not support integer-typed clip but PyTorch seems to export that unsupported operator still, leading to an error in model importation for ONNXRuntime.

Therefore, I was thinking maybe we can still support that by apply the cast-clip-cast pattern.

Regarding the versions, you are right that maybe we should not put the clip implementation in symbolic_opset9.py as clip is not updated in opset 9. Could you kindly indicate whether we should have a new file symbolic_opset6.py or put it elsewhere? Thanks!

thiagocrepaldi · 2022-01-24T23:03:09Z

@thiagocrepaldi Thanks for the reply! and I am sorry for the lack of clarification.

You are right that Clip does not have a version called version 9. But I just found that the clip operator was originally implemented in symbolic_opset9 (

pytorch/torch/onnx/symbolic_opset9.py

Line 1594 in 4b47047

def clamp(g, self, min, max):

) which I thought represents new/changed operators in version 9.
According to the doc, Clip has version [13, 12, 11, 6, 1] where [11, 6, 1] do not support integer-typed clip but PyTorch seems to export that unsupported operator still, leading to an error in model importation for ONNXRuntime.

Therefore, I was thinking maybe we can still support that by apply the cast-clip-cast pattern.

Regarding the versions, you are right that maybe we should not put the clip implementation in symbolic_opset9.py as clip is not updated in opset 9. Could you kindly indicate whether we should have a new file symbolic_opset6.py or put it elsewhere? Thanks!

Given the spec, does the behavior implemented on symbolic_opset9.py belongs to 1, 6, 11, 12 or 13?

Whatever is exported by pytorch has to fit the onnx spec. If that is not the case, then we have a bug on pytorch as opposed to the onnx symbolic

ganler · 2022-01-24T23:15:00Z

Given the spec, does the behavior implemented on symbolic_opset9.py belongs to 1, 6, 11, 12 or 13?

@thiagocrepaldi

IMHO, Clip in symbolic_opset9.py intends to implement Clip opset 6 because in opset 6, "min" and "max" are floats (and in later versions, say opset 11, its parameter type becomes "T" which is the data type of input tensors). And Clip in symbolic_opset9.py expects "min" and "max" to be "f" in _parse_arg. But Clip opset < 12 (including opset 6) does not support int which requires a trick like cast-float-clip-cast to implement that.

pytorch/torch/onnx/symbolic_opset9.py

Line 1611 in 4b47047

return g.op("Clip", self, min_f=_parse_arg(min, "f"))

Whatever is exported by pytorch has to fit the onnx spec. If that is not the case, then we have a bug on pytorch as opposed to the onnx symbolic.

Agreed. So I would suggest:

Support clip<int> in opset < 12 using cast-float-clip-cast pattern;
Move the implementation to a new file called symbolic_opset6.py or so;

I am happy to implement that in this PR if it looks to you. :-)

thiagocrepaldi · 2022-01-26T18:17:53Z

Given the spec, does the behavior implemented on symbolic_opset9.py belongs to 1, 6, 11, 12 or 13?

@thiagocrepaldi

IMHO, Clip in symbolic_opset9.py intends to implement Clip opset 6 because in opset 6, "min" and "max" are floats (and in later versions, say opset 11, its parameter type becomes "T" which is the data type of input tensors). And Clip in symbolic_opset9.py expects "min" and "max" to be "f" in _parse_arg. But Clip opset < 12 (including opset 6) does not support int which requires a trick like cast-float-clip-cast to implement that.

pytorch/torch/onnx/symbolic_opset9.py

Line 1611 in 4b47047

return g.op("Clip", self, min_f=_parse_arg(min, "f"))

Whatever is exported by pytorch has to fit the onnx spec. If that is not the case, then we have a bug on pytorch as opposed to the onnx symbolic.

Agreed. So I would suggest:

Support clip<int> in opset < 12 using cast-float-clip-cast pattern;

Move the implementation to a new file called symbolic_opset6.py or so;

I am happy to implement that in this PR if it looks to you. :-)

Looks sensible to me. let me check with a colleague

@BowenBao how do you feel about moving clip-9 code to clip-6? Looking the onnx spec, there is no such clip-9 and clip-6 does support float for min/max

torch/onnx/symbolic_opset9.py

torch/onnx/symbolic_opset11.py

torch/onnx/symbolic_opset9.py

test/onnx/test_pytorch_onnx_onnxruntime.py

ganler · 2022-01-26T19:46:57Z

@thiagocrepaldi Thanks for the review. Comments should be resolved! Could you also take a look at #70571 ?

@garymm

… Models There are a few ONNX operators do not support non-float (e.g., integer) inputs at early versions. For example, Clip supports non-float types until [opset 12](https://github.com/onnx/onnx/blob/main/docs/Changelog.md#type-constraints-280), that said older versions like [opset 6](https://github.com/onnx/onnx/blob/main/docs/Changelog.md#type-constraints-107) cannot deal with integer types. I initially find such a bug in Clip (#70584), but later found more: 1. Clip < 12; 2. Min/Max < 12; 3. ReLU < 14; 4. Pad < 11; In PyTorch, if we export Max-11 with integer inputs, actually the exportation will succeed; however, fail when imported by other frameworks like ONNXRuntime. ```python import torch class Net(torch.nn.Module): def __init__(self) -> None: super().__init__() def forward(self, x: torch.Tensor): return torch.max(x, x + 1) net = Net() onnx_model = 'test.onnx' torch.onnx.export(net, (torch.zeros((3, 3), dtype=torch.int32),), onnx_model, verbose=True, opset_version=11) ``` This is an unexpected behavior as we want to ensure that every model exported by PyTorch is valid (#70584 (comment)). Theoretically, we can simply forbid such cases (e.g., `Clip<int>` < 12, `ReLU<int>` < 14). But actually we can enhance the compatibility and flexibility of PyTorch by simply casting inputs of those operators into float tensors, which allows the float operator functions, and then casting it back to original types. This PR implements the second approach to achieve better compatibility in PyTorch. @garymm @thiagocrepaldi Pull Request resolved: #72401 Approved by: https://github.com/garymm, https://github.com/thiagocrepaldi

… Models (#72401) Summary: There are a few ONNX operators do not support non-float (e.g., integer) inputs at early versions. For example, Clip supports non-float types until [opset 12](https://github.com/onnx/onnx/blob/main/docs/Changelog.md#type-constraints-280), that said older versions like [opset 6](https://github.com/onnx/onnx/blob/main/docs/Changelog.md#type-constraints-107) cannot deal with integer types. I initially find such a bug in Clip (#70584), but later found more: 1. Clip < 12; 2. Min/Max < 12; 3. ReLU < 14; 4. Pad < 11; In PyTorch, if we export Max-11 with integer inputs, actually the exportation will succeed; however, fail when imported by other frameworks like ONNXRuntime. ```python import torch class Net(torch.nn.Module): def __init__(self) -> None: super().__init__() def forward(self, x: torch.Tensor): return torch.max(x, x + 1) net = Net() onnx_model = 'test.onnx' torch.onnx.export(net, (torch.zeros((3, 3), dtype=torch.int32),), onnx_model, verbose=True, opset_version=11) ``` This is an unexpected behavior as we want to ensure that every model exported by PyTorch is valid (#70584 (comment)). Theoretically, we can simply forbid such cases (e.g., `Clip<int>` < 12, `ReLU<int>` < 14). But actually we can enhance the compatibility and flexibility of PyTorch by simply casting inputs of those operators into float tensors, which allows the float operator functions, and then casting it back to original types. This PR implements the second approach to achieve better compatibility in PyTorch. garymm thiagocrepaldi Pull Request resolved: #72401 Approved by: https://github.com/garymm, https://github.com/thiagocrepaldi Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/c1f0e6e763b6720789e54cb83477bb353139a12f Reviewed By: mehtanirav Differential Revision: D35582765 fbshipit-source-id: 10bf99c429f0ea7eaaa37c667211a5db2e0b3c09

github-actions · 2022-05-22T00:49:42Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

titaiwangms · 2022-06-27T23:40:51Z

cc @thiagocrepaldi

thiagocrepaldi · 2022-08-15T19:45:04Z

@ganler do you mind rebasing this?

facebook-github-bot · 2022-10-04T01:10:27Z

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

linux-foundation-easycla · 2022-10-04T01:10:35Z

❌ - login: @ganler / name: Jiawei Liu . The commit (a3d4409, bc263ac, f7610b8) is not authorized under a signed CLA. Please click here to be authorized. For further assistance with EasyCLA, please submit a support request ticket.

kit1980 · 2022-11-22T22:27:37Z

/easycla

github-actions · 2023-01-21T22:33:36Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

fix: clip-onnx spec 9 & 11

a3d4409

ganler requested review from BowenBao and shubhambhokare1 as code owners January 3, 2022 18:13

pytorch-probot bot added the ciflow/default label Jan 3, 2022

facebook-github-bot added the cla signed label Jan 3, 2022

pytorchbot added the open source label Jan 3, 2022

fix: pyflake lint

bc263ac

mruberry added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 12, 2022

thiagocrepaldi added the onnx-needs-info needs information from the author / reporter before ONNX team can take action label Jan 24, 2022

thiagocrepaldi assigned ganler Jan 24, 2022

thiagocrepaldi reviewed Jan 24, 2022

View reviewed changes