Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNX] Ignore print(Tensor) during tracing #86223

Closed

Conversation

thiagocrepaldi
Copy link
Collaborator

@thiagocrepaldi thiagocrepaldi commented Oct 4, 2022

Fixes #73619
Fixes microsoft/onnxruntime#11812

This PR adds new symbolics: aten::_conj, aten::conj_physical, aten::resolve_conj, and aten::resolve_neg
While the last two are always NO-OP by definition (do not change nodes), the first raises an exception as they are not supported by ONNX yet

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 4, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86223

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 0b7850a:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: onnx torch.onnx related changes that should show up in the release notes label Oct 4, 2022
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Oct 4, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

@justinchuby
Copy link
Collaborator

justinchuby commented Oct 5, 2022

I think Bowen has a PR #86180 (some race condition happening 😅 )

Copy link
Collaborator

@justinchuby justinchuby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment in line

@thiagocrepaldi thiagocrepaldi reopened this Oct 6, 2022
@bdhirsh bdhirsh requested a review from garymm October 6, 2022 19:45
@bdhirsh bdhirsh added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 6, 2022
@garymm
Copy link
Collaborator

garymm commented Oct 6, 2022

I am not working on PyTorch these days, please don't ask me to review.

@garymm garymm removed their request for review October 6, 2022 19:48
@BowenBao
Copy link
Collaborator

Closing #86180 since duplicated, let's focus on this one.

For aten::resolve_conj and aten::resolve_neg, pasting previous discussion here

Should we handle when the inputs are complex and we actually want to take the conjugate? I don't remember if this is supported yet. If not maybe a TODO comment?

These two ops are also noops for ONNX complex. It's not taking conjugate, but rather materializing it since torch stores additional conjugate bit. So it's only adjusting the internal representation but not its "value", and from ONNX perspective this op does nothing. Adding to the comment.

  • Does aten::_neg and aten::neg_physical exist, similar to aten::_conj and aten::conj_physical?
  • aten::resolve_neg & aten::resolve_conj does not alter value, hence even for ONNX complex it can be noop. Unless ONNX also uses conj/neg bit and wish to expose this implementation detail through ONNX operators.
  • Please update PR description the new ops supported.
  • Please consider adding below as test case too, it's root cause why these complex ops are called in print.
                # 'tolist' has side effect calling 'resolve_conj' and 'resolve_neg'.
                # Annotation added to pass torch script.
                _: List[float] = x.tolist()

@thiagocrepaldi
Copy link
Collaborator Author

Closing #86180 since duplicated, let's focus on this one.

For aten::resolve_conj and aten::resolve_neg, pasting previous discussion here

Should we handle when the inputs are complex and we actually want to take the conjugate? I don't remember if this is supported yet. If not maybe a TODO comment?

These two ops are also noops for ONNX complex. It's not taking conjugate, but rather materializing it since torch stores additional conjugate bit. So it's only adjusting the internal representation but not its "value", and from ONNX perspective this op does nothing. Adding to the comment.

  • Does aten::_neg and aten::neg_physical exist, similar to aten::_conj and aten::conj_physical?
  • aten::resolve_neg & aten::resolve_conj does not alter value, hence even for ONNX complex it can be noop. Unless ONNX also uses conj/neg bit and wish to expose this implementation detail through ONNX operators.
  • Please update PR description the new ops supported.
  • Please consider adding below as test case too, it's root cause why these complex ops are called in print.
                # 'tolist' has side effect calling 'resolve_conj' and 'resolve_neg'.
                # Annotation added to pass torch script.
                _: List[float] = x.tolist()

I believe all your comments were addressed. See what you think

Comment on lines +12505 to +12518
self.run_test(
m,
x,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.run_test(
m,
x,
)
self.run_test(PrintTensorOnMyModel(), x)

for simplicity and consistency

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually lintrunner f did this. I just left the linter do the linting for me :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because there is a comma after x,. Try removing the comma and run it again?

_: List[float] = x.tolist()
return x_firsts

m = PrintTensorOnMyModel().eval()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
m = PrintTensorOnMyModel().eval()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m is needed, otherwise the model is not instantiated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity and consistency with other tests, I would just call PrintTensorOnMyModel() in run test (comment above).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each test has its own specificities. It is subjective to expect "consistency" in their implementation. The reasoning for keeping the eval was that it came from the bug report

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context! If a test needs different treatment, I recommend adding a comment on the why to help future readers.

FYI I ran lintrunner locally on the file and got this result

image

Just for future reference. This is a rather minor detail we can always fix in some other changes in the future. The trick is to remove the trailing comma and let black add them when it sees the need

torch/onnx/symbolic_opset9.py Outdated Show resolved Hide resolved
torch/onnx/symbolic_opset9.py Show resolved Hide resolved
torch/onnx/symbolic_opset9.py Outdated Show resolved Hide resolved
torch/onnx/symbolic_opset9.py Outdated Show resolved Hide resolved
torch/onnx/symbolic_opset9.py Outdated Show resolved Hide resolved
Comment on lines +12498 to +12513
if input_dtype == torch.cfloat:
with self.assertRaises(RuntimeError):
self.run_test(
m,
x,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect no errors now? Maybe have separate tests for _conj and conj_physical?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned on a previous review, an exception is also raised for complex when shape inference is enabled (and it is enabled by default).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to create a test that will execute the exact same code as the existing ones. They share implementation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why it errors, is it because of a bug in shape inference? (keep open for reference)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

several jit passes do not expect complex numbers. for optimization-only passes, it would be safe to just return from the pass without making any change, preventing the failure. On the other hand, we are not interested in having ONNX graphs without shape inference (IIRC it is not part of the public API). We only use disable it for specific debugging scenarios

Copy link
Collaborator

@BowenBao BowenBao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BowenBao
Copy link
Collaborator

Not sure why many seemingly unrelated jobs where failing, I'll try with a rebase.

@BowenBao
Copy link
Collaborator

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased thiagofc/fix-resolve_conj onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout thiagofc/fix-resolve_conj && git pull --rebase)

@BowenBao
Copy link
Collaborator

BowenBao commented Oct 13, 2022

@thiagocrepaldi the CI errors are legit, the names of new functions in symbolic_opsetx.py needs to be added into __all__.
Noticed it's not the symbolic function ones, but this

2022-10-13T18:01:42.1984484Z # torch.onnx.symbolic_helper.is_complex_value:
2022-10-13T18:01:42.1985010Z   - Is NOT public: it is not inside the module's (`torch.onnx.symbolic_helper`) `__all__`
2022-10-13T18:01:42.1985728Z   - Does look public: it does look public because it follows the rules from the doc above (does not start with `_` and has a proper `__module__`).
2022-10-13T18:01:42.1986309Z   - You can do either of these two things to fix this problem:
2022-10-13T18:01:42.1986844Z     - To make it public: add it from the modules's (`torch.onnx.symbolic_helper`) `__all__`
2022-10-13T18:01:42.1987360Z     - To make it NOT look public: make its name start with `_`

@thiagocrepaldi
Copy link
Collaborator Author

@thiagocrepaldi the CI errors are legit, the names of new functions in symbolic_opsetx.py needs to be added into __all__. Noticed it's not the symbolic function ones, but this

2022-10-13T18:01:42.1984484Z # torch.onnx.symbolic_helper.is_complex_value:
2022-10-13T18:01:42.1985010Z   - Is NOT public: it is not inside the module's (`torch.onnx.symbolic_helper`) `__all__`
2022-10-13T18:01:42.1985728Z   - Does look public: it does look public because it follows the rules from the doc above (does not start with `_` and has a proper `__module__`).
2022-10-13T18:01:42.1986309Z   - You can do either of these two things to fix this problem:
2022-10-13T18:01:42.1986844Z     - To make it public: add it from the modules's (`torch.onnx.symbolic_helper`) `__all__`
2022-10-13T18:01:42.1987360Z     - To make it NOT look public: make its name start with `_`

Symbolics were in __all__

Adding is_complex_value to the symbolic helper's __all__ too

@pytorch pytorch deleted a comment from pytorch-bot bot Oct 17, 2022
@thiagocrepaldi
Copy link
Collaborator Author

@pytorchbot merge -f "Ignoring unrelated 'Build should have OpenMP enabled, but torch.backends.openmp.is_available() is False'"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

thiagocrepaldi pushed a commit that referenced this pull request Oct 17, 2022
Fixes #73619
Fixes microsoft/onnxruntime#11812

This PR adds new symbolics: `aten::_conj`, `aten::conj_physical`, `aten::resolve_conj`, and `aten::resolve_neg`
While the last two are always NO-OP by definition (do not change nodes), the first raises an exception as they are not supported by ONNX yet
Pull Request resolved: #86223
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
@github-actions
Copy link

Hey @thiagocrepaldi.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@thiagocrepaldi thiagocrepaldi added module: onnx Related to torch.onnx topic: bug fixes topic category labels Oct 17, 2022
pytorchmergebot pushed a commit that referenced this pull request Oct 18, 2022
Fixes #73619
Fixes microsoft/onnxruntime#11812

This PR adds new symbolics: `aten::_conj`, `aten::conj_physical`, `aten::resolve_conj`, and `aten::resolve_neg`
While the last two are always NO-OP by definition (do not change nodes), the first raises an exception as they are not supported by ONNX yet
Pull Request resolved: #86223
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
malfet pushed a commit that referenced this pull request Oct 18, 2022
Fixes #73619
Fixes microsoft/onnxruntime#11812

This PR adds new symbolics: `aten::_conj`, `aten::conj_physical`, `aten::resolve_conj`, and `aten::resolve_neg`
While the last two are always NO-OP by definition (do not change nodes), the first raises an exception as they are not supported by ONNX yet
Pull Request resolved: #86223
Approved by: https://github.com/justinchuby, https://github.com/BowenBao
@thiagocrepaldi thiagocrepaldi deleted the thiagofc/fix-resolve_conj branch May 4, 2023 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request cla signed Merged module: onnx Related to torch.onnx open source release notes: onnx torch.onnx related changes that should show up in the release notes topic: bug fixes topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
8 participants