Skip to content

Conversation

@ganler
Copy link
Contributor

@ganler ganler commented Jan 3, 2022

clip-v6 and clip-v9 only support float(16/32/64) as inputs such that prior implementation will make exported clamp(int_min, int_max) fail in ONNXRuntime.

image

To fix it, use the "cast-clip-cast" pattern as a workaround.

cc: @peterbell10 @BowenBao

@pytorch-probot
Copy link

pytorch-probot bot commented Jan 3, 2022

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/ganler/pytorch/blob/bc263ac8354296485666c1e11622e427b1f98024/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/trunk ✅ triggered
linux-docs ciflow/all, ciflow/cpu, ciflow/default, ciflow/docs, ciflow/linux, ciflow/trunk ✅ triggered
linux-vulkan-bionic-py3.7-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers, ciflow/trunk ✅ triggered
linux-xenial-py3.7-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
linux-xenial-py3.7-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/trunk ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/trunk, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
docker-builds ciflow/all, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/trunk 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow, ciflow/trunk 🚫 skipped
linux-docs-push ciflow/all, ciflow/cpu, ciflow/linux, ciflow/scheduled 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
macos-11-py3-x86-64 ciflow/all, ciflow/macos, ciflow/trunk 🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
periodic-win-vs2019-cuda11.5-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build ciflow/all, ciflow/android, ciflow/cpu, ciflow/linux, ciflow/trunk 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jan 3, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit f7610b8 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@ganler
Copy link
Contributor Author

ganler commented Jan 12, 2022

@thiagocrepaldi @BowenBao Could you kindly review this PR? :-)

@mruberry mruberry added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 12, 2022
@thiagocrepaldi thiagocrepaldi added the onnx-needs-info needs information from the author / reporter before ONNX team can take action label Jan 24, 2022
Copy link
Collaborator

@thiagocrepaldi thiagocrepaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't Operators.md list opset9 for clip? I expected to either have opset 9 listed or not having a symbolic for clip on symbolic_opset9.py.

@ganler Please clarify

@ganler
Copy link
Contributor Author

ganler commented Jan 24, 2022

@thiagocrepaldi Thanks for the reply! and I am sorry for the lack of clarification.

You are right that Clip does not have a version called version 9. But I just found that the clip operator was originally implemented in symbolic_opset9 (

def clamp(g, self, min, max):
) which I thought represents new/changed operators in version 9.

According to the doc, Clip has version [13, 12, 11, 6, 1] where [11, 6, 1] do not support integer-typed clip but PyTorch seems to export that unsupported operator still, leading to an error in model importation for ONNXRuntime.

Therefore, I was thinking maybe we can still support that by apply the cast-clip-cast pattern.

Regarding the versions, you are right that maybe we should not put the clip implementation in symbolic_opset9.py as clip is not updated in opset 9. Could you kindly indicate whether we should have a new file symbolic_opset6.py or put it elsewhere? Thanks!

@thiagocrepaldi
Copy link
Collaborator

@thiagocrepaldi Thanks for the reply! and I am sorry for the lack of clarification.

You are right that Clip does not have a version called version 9. But I just found that the clip operator was originally implemented in symbolic_opset9 (

def clamp(g, self, min, max):

) which I thought represents new/changed operators in version 9.
According to the doc, Clip has version [13, 12, 11, 6, 1] where [11, 6, 1] do not support integer-typed clip but PyTorch seems to export that unsupported operator still, leading to an error in model importation for ONNXRuntime.

Therefore, I was thinking maybe we can still support that by apply the cast-clip-cast pattern.

Regarding the versions, you are right that maybe we should not put the clip implementation in symbolic_opset9.py as clip is not updated in opset 9. Could you kindly indicate whether we should have a new file symbolic_opset6.py or put it elsewhere? Thanks!

Given the spec, does the behavior implemented on symbolic_opset9.py belongs to 1, 6, 11, 12 or 13?

Whatever is exported by pytorch has to fit the onnx spec. If that is not the case, then we have a bug on pytorch as opposed to the onnx symbolic

@ganler
Copy link
Contributor Author

ganler commented Jan 24, 2022

Given the spec, does the behavior implemented on symbolic_opset9.py belongs to 1, 6, 11, 12 or 13?

@thiagocrepaldi

IMHO, Clip in symbolic_opset9.py intends to implement Clip opset 6 because in opset 6, "min" and "max" are floats (and in later versions, say opset 11, its parameter type becomes "T" which is the data type of input tensors). And Clip in symbolic_opset9.py expects "min" and "max" to be "f" in _parse_arg. But Clip opset < 12 (including opset 6) does not support int which requires a trick like cast-float-clip-cast to implement that.

return g.op("Clip", self, min_f=_parse_arg(min, "f"))

Whatever is exported by pytorch has to fit the onnx spec. If that is not the case, then we have a bug on pytorch as opposed to the onnx symbolic.

Agreed. So I would suggest:

  1. Support clip<int> in opset < 12 using cast-float-clip-cast pattern;
  2. Move the implementation to a new file called symbolic_opset6.py or so;

I am happy to implement that in this PR if it looks to you. :-)

@thiagocrepaldi
Copy link
Collaborator

Given the spec, does the behavior implemented on symbolic_opset9.py belongs to 1, 6, 11, 12 or 13?

@thiagocrepaldi

IMHO, Clip in symbolic_opset9.py intends to implement Clip opset 6 because in opset 6, "min" and "max" are floats (and in later versions, say opset 11, its parameter type becomes "T" which is the data type of input tensors). And Clip in symbolic_opset9.py expects "min" and "max" to be "f" in _parse_arg. But Clip opset < 12 (including opset 6) does not support int which requires a trick like cast-float-clip-cast to implement that.

return g.op("Clip", self, min_f=_parse_arg(min, "f"))

Whatever is exported by pytorch has to fit the onnx spec. If that is not the case, then we have a bug on pytorch as opposed to the onnx symbolic.

Agreed. So I would suggest:

  1. Support clip<int> in opset < 12 using cast-float-clip-cast pattern;
  2. Move the implementation to a new file called symbolic_opset6.py or so;

I am happy to implement that in this PR if it looks to you. :-)

Looks sensible to me. let me check with a colleague

@BowenBao how do you feel about moving clip-9 code to clip-6? Looking the onnx spec, there is no such clip-9 and clip-6 does support float for min/max

@ganler
Copy link
Contributor Author

ganler commented Jan 26, 2022

@thiagocrepaldi Thanks for the review. Comments should be resolved! Could you also take a look at #70571 ?

pytorchmergebot pushed a commit that referenced this pull request Apr 11, 2022
… Models

There are a few ONNX operators do not support non-float (e.g., integer) inputs at early versions. For example, Clip supports non-float types until [opset 12](https://github.com/onnx/onnx/blob/main/docs/Changelog.md#type-constraints-280), that said older versions like [opset 6](https://github.com/onnx/onnx/blob/main/docs/Changelog.md#type-constraints-107) cannot deal with integer types.

I initially find such a bug in Clip (#70584), but later found more:
1. Clip < 12;
2. Min/Max < 12;
3. ReLU < 14;
4. Pad < 11;

In PyTorch, if we export Max-11 with integer inputs, actually the exportation will succeed; however, fail when imported by other frameworks like ONNXRuntime.

```python
import torch

class Net(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()

    def forward(self, x: torch.Tensor):
        return torch.max(x, x + 1)

net = Net()
onnx_model = 'test.onnx'

torch.onnx.export(net, (torch.zeros((3, 3), dtype=torch.int32),),
                  onnx_model, verbose=True, opset_version=11)
```

This is an unexpected behavior as we want to ensure that every model exported by PyTorch is valid (#70584 (comment)). Theoretically, we can simply forbid such cases (e.g., `Clip<int>` < 12, `ReLU<int>` < 14). But actually we can enhance the compatibility and flexibility of PyTorch by simply casting inputs of those operators into float tensors, which allows the float operator functions, and then casting it back to original types.

This PR implements the second approach to achieve better compatibility in PyTorch.

@garymm  @thiagocrepaldi

Pull Request resolved: #72401
Approved by: https://github.com/garymm, https://github.com/thiagocrepaldi
facebook-github-bot pushed a commit that referenced this pull request Apr 13, 2022
… Models (#72401)

Summary:
There are a few ONNX operators do not support non-float (e.g., integer) inputs at early versions. For example, Clip supports non-float types until [opset 12](https://github.com/onnx/onnx/blob/main/docs/Changelog.md#type-constraints-280), that said older versions like [opset 6](https://github.com/onnx/onnx/blob/main/docs/Changelog.md#type-constraints-107) cannot deal with integer types.

I initially find such a bug in Clip (#70584), but later found more:
1. Clip < 12;
2. Min/Max < 12;
3. ReLU < 14;
4. Pad < 11;

In PyTorch, if we export Max-11 with integer inputs, actually the exportation will succeed; however, fail when imported by other frameworks like ONNXRuntime.

```python
import torch

class Net(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()

    def forward(self, x: torch.Tensor):
        return torch.max(x, x + 1)

net = Net()
onnx_model = 'test.onnx'

torch.onnx.export(net, (torch.zeros((3, 3), dtype=torch.int32),),
                  onnx_model, verbose=True, opset_version=11)
```

This is an unexpected behavior as we want to ensure that every model exported by PyTorch is valid (#70584 (comment)). Theoretically, we can simply forbid such cases (e.g., `Clip<int>` < 12, `ReLU<int>` < 14). But actually we can enhance the compatibility and flexibility of PyTorch by simply casting inputs of those operators into float tensors, which allows the float operator functions, and then casting it back to original types.

This PR implements the second approach to achieve better compatibility in PyTorch.

garymm  thiagocrepaldi

Pull Request resolved: #72401
Approved by: https://github.com/garymm, https://github.com/thiagocrepaldi

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/c1f0e6e763b6720789e54cb83477bb353139a12f

Reviewed By: mehtanirav

Differential Revision: D35582765

fbshipit-source-id: 10bf99c429f0ea7eaaa37c667211a5db2e0b3c09
@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label May 22, 2022
@ganler ganler closed this May 22, 2022
@titaiwangms titaiwangms removed the onnx-needs-info needs information from the author / reporter before ONNX team can take action label Jun 27, 2022
@titaiwangms
Copy link
Collaborator

cc @thiagocrepaldi

@thiagocrepaldi
Copy link
Collaborator

@ganler do you mind rebasing this?

@facebook-github-bot
Copy link
Contributor

/easycla

As part of the transition to the PyTorch Foundation, this project now requires contributions be covered under the new CLA. See #85559 for additional details.

This comment will trigger a new check of this PR. If you are already covered, you will simply see a new "EasyCLA" check that passes. If you are not covered, a bot will leave a new comment with a link to sign.

@linux-foundation-easycla
Copy link

CLA Not Signed

@kit1980
Copy link
Contributor

kit1980 commented Nov 22, 2022

/easycla

@github-actions
Copy link
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Jan 21, 2023
@github-actions github-actions bot closed this Feb 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed open source Stale triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants