Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNX] Add dtype check in onnx verification #79263

Closed
wants to merge 15 commits into from

Conversation

qqaatw
Copy link
Collaborator

@qqaatw qqaatw commented Jun 10, 2022

Currently we don't have a dtype check in verifying the consistency between PyTorch and ONNX outputs. As a result, some of dtype inconsistencies were found and reported: #77842 #77845

This is a POC.

Failed workflows:

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 10, 2022

🔗 Helpful links

✅ No Failures (0 Pending)

As of commit ab98bce (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@gchanan gchanan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 10, 2022
@justinchuby justinchuby added the module: onnx Related to torch.onnx label Jun 10, 2022
@justinchuby
Copy link
Collaborator

cc @BowenBao

@justinchuby justinchuby added module: tests Issues related to tests (not the torch.testing module) topic: bug fixes topic category release notes: onnx torch.onnx related changes that should show up in the release notes labels Jun 16, 2022
@BowenBao
Copy link
Collaborator

@qqaatw This is awesome! Thank you so much for contribution!

A suggestion would be to add a new skip decorator to disable dtype checks for these tests. Then remove the skip in later follow up PRs that fix individual tests. This allows CI to tell us if the issue is properly fixed.

If you are interested you can utilize ghstack to submit follow-up individual fixes on top of this PR.

pytorchmergebot pushed a commit that referenced this pull request Jun 22, 2022
Part of #79263

This PR fixes the following matters:

1. Before this fix, the reduced output has `[1]` shape when `dim = None` and `keepdim = False`. Now the output is reduced to `[]` shape, which matches Pytorch's behavior.
2. Before this fix, the output is always casted to `Long`. Now the output is casted to the input's dtype.

Pull Request resolved: #79506
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
pytorchmergebot pushed a commit that referenced this pull request Jun 22, 2022
Part of #79263

Before: When `dim` == `None` and `keepdim` == `0`(`False`), the reduced output has `[1]` shape.
After: Squeeze the output so that the shape will be `[]` as PyTorch's behavior.

Pull Request resolved: #79371
Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao
pytorchmergebot pushed a commit that referenced this pull request Jun 22, 2022
Part of #79263

Before: The output has `[1]` shape when the input is a scalar.
After: The output has `[]` shape, matching PyTorch's behavior.

The original comment along the code states `torch allows scalar self, and ONNX is ambiguous about whether this is allowed`. The fact seems to be that ONNX never clearly indicates whether scalar inputs are allowed for all the ONNX operators. At least in this case, a scalar input seems to be allowed.

Pull Request resolved: #79846
Approved by: https://github.com/BowenBao
@qqaatw
Copy link
Collaborator Author

qqaatw commented Jun 22, 2022

@qqaatw This is awesome! Thank you so much for contribution!

A suggestion would be to add a new skip decorator to disable dtype checks for these tests. Then remove the skip in later follow up PRs that fix individual tests. This allows CI to tell us if the issue is properly fixed.

If you are interested you can utilize ghstack to submit follow-up individual fixes on top of this PR.

Thank you, good suggestion indeed!

ghstack requires write access to the repository, which I don't have now.

@justinchuby justinchuby linked an issue Jun 28, 2022 that may be closed by this pull request
pytorchmergebot pushed a commit that referenced this pull request Jun 28, 2022
Part of #79263

Before: When the shape of the two functions is `[]`, the reduced output has `[1]` shape.
After: The shape of the two functions is now `[]` as PyTorch's behavior.

Pull Request resolved: #79695
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
facebook-github-bot pushed a commit that referenced this pull request Jun 30, 2022
Summary:
Part of #79263

Before: output bool is casted back to input's dtype.
After: no longer casted back.

Pull Request resolved: #79339
Approved by: https://github.com/BowenBao

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/7a8d6c9b1dffdd01c9a95392c7ba1102a1566192

Reviewed By: b0noI

Differential Revision: D37509641

fbshipit-source-id: 89419ce977d48c01b4109d4e3075e603ccc8fc14
facebook-github-bot pushed a commit that referenced this pull request Jun 30, 2022
)

Summary:
Part of #79263

Before: When the shape of the two functions is `[]`, the reduced output has `[1]` shape.
After: The shape of the two functions is now `[]` as PyTorch's behavior.

Pull Request resolved: #79695
Approved by: https://github.com/justinchuby, https://github.com/BowenBao

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/3dec9fd09fbe5229e5737f9b54037454dc40802e

Reviewed By: b0noI

Differential Revision: D37509863

fbshipit-source-id: d3148da4b8bdb7ff8e28d623192d6dba97075dcb
@justinchuby
Copy link
Collaborator

Could you rebase?

@qqaatw
Copy link
Collaborator Author

qqaatw commented Jul 6, 2022

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/master pull/79263/head returned non-zero exit code 1

Rebasing (1/8)
Rebasing (2/8)
Rebasing (3/8)
Auto-merging test/onnx/test_pytorch_common.py
Auto-merging test/onnx/test_pytorch_onnx_onnxruntime.py
CONFLICT (content): Merge conflict in test/onnx/test_pytorch_onnx_onnxruntime.py
Auto-merging torch/onnx/verification.py
error: could not apply 8ce6eac7a1... Add decorators to skip checks
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
Could not apply 8ce6eac7a1... Add decorators to skip checks

Raised by https://github.com/pytorch/pytorch/actions/runs/2621356399

@qqaatw
Copy link
Collaborator Author

qqaatw commented Jul 6, 2022

Could you rebase?

After current opened PRs are merged maybe? There are two remaining.

justinchuby pushed a commit to justinchuby/pytorch that referenced this pull request Jul 27, 2022
Part of pytorch#79263

This PR fixes the following matters:

1. Before this fix, the reduced output has `[1]` shape when `dim = None` and `keepdim = False`. Now the output is reduced to `[]` shape, which matches Pytorch's behavior.
2. Before this fix, the output is always casted to `Long`. Now the output is casted to the input's dtype.

Pull Request resolved: pytorch#79506
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
justinchuby pushed a commit to justinchuby/pytorch that referenced this pull request Jul 27, 2022
Part of pytorch#79263

Before: When `dim` == `None` and `keepdim` == `0`(`False`), the reduced output has `[1]` shape.
After: Squeeze the output so that the shape will be `[]` as PyTorch's behavior.

Pull Request resolved: pytorch#79371
Approved by: https://github.com/AllenTiTaiWang, https://github.com/BowenBao
justinchuby pushed a commit to justinchuby/pytorch that referenced this pull request Jul 27, 2022
Part of pytorch#79263

Before: The output has `[1]` shape when the input is a scalar.
After: The output has `[]` shape, matching PyTorch's behavior.

The original comment along the code states `torch allows scalar self, and ONNX is ambiguous about whether this is allowed`. The fact seems to be that ONNX never clearly indicates whether scalar inputs are allowed for all the ONNX operators. At least in this case, a scalar input seems to be allowed.

Pull Request resolved: pytorch#79846
Approved by: https://github.com/BowenBao
justinchuby pushed a commit to justinchuby/pytorch that referenced this pull request Jul 27, 2022
Part of pytorch#79263

Before: output bool is casted back to input's dtype.
After: no longer casted back.
Pull Request resolved: pytorch#79339
Approved by: https://github.com/BowenBao
justinchuby pushed a commit to justinchuby/pytorch that referenced this pull request Jul 27, 2022
Part of pytorch#79263

Before: When the shape of the two functions is `[]`, the reduced output has `[1]` shape.
After: The shape of the two functions is now `[]` as PyTorch's behavior.

Pull Request resolved: pytorch#79695
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
pytorchmergebot pushed a commit that referenced this pull request Aug 5, 2022
Part of #79263

The `keepdim` argument is theoretically ignored when `dim` is not specified (See [docs](https://pytorch.org/docs/stable/generated/torch.argmin.html)).

Unfortunately the PyTorch implementation seems to still take it into account, resulting in a non-fully-reduced tensor, which is an undefined behavior. Thus, I add `dim` argument to the tests to make the outputs between PyTorch and ONNX runtime consistent.

Pull Request resolved: #79503
Approved by: https://github.com/justinchuby, https://github.com/AllenTiTaiWang, https://github.com/BowenBao
facebook-github-bot pushed a commit that referenced this pull request Aug 7, 2022
Summary:
Part of #79263

The `keepdim` argument is theoretically ignored when `dim` is not specified (See [docs](https://pytorch.org/docs/stable/generated/torch.argmin.html)).

Unfortunately the PyTorch implementation seems to still take it into account, resulting in a non-fully-reduced tensor, which is an undefined behavior. Thus, I add `dim` argument to the tests to make the outputs between PyTorch and ONNX runtime consistent.

Pull Request resolved: #79503
Approved by: https://github.com/justinchuby, https://github.com/AllenTiTaiWang, https://github.com/BowenBao

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/6bdf89b0c71a827be2a0969afa33f151e87a32f3

Reviewed By: kit1980

Differential Revision: D38478785

fbshipit-source-id: cd9493d6d088c4e32eb11bd19924efc0b6e6335c
pytorchmergebot pushed a commit that referenced this pull request Aug 9, 2022
Part of #79263

Previously, all quantized PyTorch tensors are all casted to the dtypes which comply with ONNX's definition, i.e. `scale` is casted to `double`, and `zero_point` is casted to `int64`. These casts lead to inconsistent dtypes when comparing PyTorch's outputs and ONNX runtime's outputs.

Now, `cast_onnx_accepted` argument is added to `unpack_quantized_tensor` function. When making example inputs for ONNX, we cast them to the ONNX compliant dtypes; otherwise, they are casted to PyTorch default types for quantization.

Pull Request resolved: #79690
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
@qqaatw
Copy link
Collaborator Author

qqaatw commented Aug 10, 2022

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here

facebook-github-bot pushed a commit that referenced this pull request Aug 10, 2022
Summary:
Part of #79263

Previously, all quantized PyTorch tensors are all casted to the dtypes which comply with ONNX's definition, i.e. `scale` is casted to `double`, and `zero_point` is casted to `int64`. These casts lead to inconsistent dtypes when comparing PyTorch's outputs and ONNX runtime's outputs.

Now, `cast_onnx_accepted` argument is added to `unpack_quantized_tensor` function. When making example inputs for ONNX, we cast them to the ONNX compliant dtypes; otherwise, they are casted to PyTorch default types for quantization.

Pull Request resolved: #79690
Approved by: https://github.com/justinchuby, https://github.com/BowenBao

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/9b4dc56c83ca28df3d34b650f4c0249de45d7601

Reviewed By: seemethere

Differential Revision: D38585476

fbshipit-source-id: a2015092f3f034e30c6134aee6230c57e959636f
facebook-github-bot pushed a commit that referenced this pull request Aug 10, 2022
Summary:
Currently we don't have a dtype check in verifying the consistency between PyTorch and ONNX outputs. As a result, some of dtype inconsistencies were found and reported: #77842 #77845

This is a POC.

Failed workflows:
- [linux-xenial-py3.7-clang7-onnx / test (default, 2, 2, linux.2xlarge)]
  - inconsistent shape
    - TestONNXRuntime_opset10.test_all (#79371)
    - TestONNXRuntime_opset10.test_any (#79371)
    - TestONNXRuntime_opset10.test_argmin_argmax (#79503)
    - TestONNXRuntime_opset10.test_hardshrink (#79695)
    - TestONNXRuntime_opset10.test_linalg_norm (#79506)
    - TestONNXRuntime_opset10.test_linalg_vector_norm (#79506)
    - TestONNXRuntime_opset10.test_prelu_scalar (#79846)
    - TestONNXRuntime_opset10.test_softshrink (#79695)
    - TestONNXRuntime_opset10.test_sum_empty_tensor (skipped)
    - TestONNXRuntime_opset10.test_tolist (skipped)
  - inconsistent dtype
    - test_arithmetic_prim_bool (skipped)
    - test_arithmeticOps_with_low_precision (skipped)
    - test_arithmetic_prim_float (skipped)
    - test_logical_and (#79339)
    - test_logical_or (#79339)
    - test_logical_xor (#79339)
    - test_pow (skipped)
    - test_primitive_input_floating (skipped)
    - test_quantize_per_tensor (#79690)
    - test_quantized_adaptive_avg_pool2d (#79690)
    - test_quantized_arithmetic (#79690)
    - test_quantized_arithmetic_qfunctional (#79690)
    - test_quantized_conv2d (#79690)
    - test_quantized_conv2d_relu (#79690)
    - test_quantized_flatten (#79690)
    - test_quantized_hardsigmoid (#79690)
    - test_quantized_hardswish (#79690)
    - test_quantized_linear (#79690)
    - test_quantized_sigmoid (#79690)
    - test_item (skipped)
    - test_full_like_value (skipped)
    - TestONNXRuntime_opset7.test_div_rounding_mode (skipped)
    - TestONNXRuntime_opset8.test_div_rounding_mode (skipped)
    - TestONNXRuntime_opset9.test_div_rounding_mode (skipped)
    - TestONNXRuntime_opset9_IRv4.test_div_rounding_mode (skipped)
    - test_outer (skipped)
    - test_symbolic_shape_inference_arange_2 (skipped)

Pull Request resolved: #79263
Approved by: https://github.com/justinchuby, https://github.com/BowenBao

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/d9a7e93aaf3166e639ea413123bd6c38b9144adc

Reviewed By: seemethere

Differential Revision: D38585848

fbshipit-source-id: 9da98581ceec51142ae31d3f8a06f9f296a16b23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged module: onnx Related to torch.onnx module: tests Issues related to tests (not the torch.testing module) open source release notes: onnx torch.onnx related changes that should show up in the release notes topic: bug fixes topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ONNX] Some functions do not preserve shape for scalars
9 participants