Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNX] Remove Aten ops from ONNX export #37239

Closed
wants to merge 51 commits into from

Conversation

@neginraoof
Copy link
Collaborator

@neginraoof neginraoof commented Apr 24, 2020

This PR adds a new operator export type to exporter: ONNX_FALLTHROUGH
This new type allows ops that are not supported to pass through.
This PR also removes all aten ops in ONNX operator export type mode.

@dr-ci
Copy link

@dr-ci dr-ci bot commented Apr 24, 2020

💊 CI failures summary and remediations

As of commit 04c1cb8 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 1/2 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) <confirmed not flaky by 5 failures>

May 29 18:48:33 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function ‘int main()’:\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:22: error: expected ‘;’ before ‘}’ token\n 2 | int main() { return 0 }\n | ^~\n | ;\n" }
May 29 18:48:32     raise RuntimeError(message) 
May 29 18:48:32 RuntimeError: test_cuda failed! 
May 29 18:48:33  
May 29 18:48:33 real	5m13.298s 
May 29 18:48:33 user	5m19.383s 
May 29 18:48:33 sys	0m17.898s 
May 29 18:48:33 + cleanup 
May 29 18:48:33 + retcode=1 
May 29 18:48:33 + set +x 
May 29 18:48:33 =================== sccache compilation log =================== 
May 29 18:48:33 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp: In function ‘int main()’:\n/var/lib/jenkins/.cache/torch_extensions/test_compilation_error_formatting/main.cpp:2:22: error: expected ‘;’ before ‘}’ token\n    2 | int main() { return 0 }\n      |                      ^~\n      |                      ;\n" } 
May 29 18:48:33  
May 29 18:48:33 =========== If your build fails, please take a look at the log above for possible reasons =========== 
May 29 18:48:33 Compile requests                 58 
May 29 18:48:33 Compile requests executed        33 
May 29 18:48:33 Cache hits                       25 
May 29 18:48:33 Cache misses                      7 
May 29 18:48:33 Cache timeouts                    0 
May 29 18:48:33 Cache read errors                 0 
May 29 18:48:33 Forced recaches                   0 
May 29 18:48:33 Cache write errors                0 

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 130 times.

Loading

@vadimkantorov
Copy link
Contributor

@vadimkantorov vadimkantorov commented Apr 24, 2020

There are a lot of almost duplicate error messages 'Unsupported: ONNX export of cumsum in ' 'opset 9. Please try higher opset versions.'. Should a function / format string be put in a variable? (e.g. error_opset9_fmt = 'Unsupported: ONNX export of {} in opset 9. Please try higher opset versions.'

Loading

@neginraoof
Copy link
Collaborator Author

@neginraoof neginraoof commented Apr 24, 2020

@vadimkantorov Yeah makes sense. This is mainly WIP now to check CI failures due to removal of ATen ops in ONNX. I'll refactor and clean up export and symbolic conversions a bit.

Loading

@neginraoof neginraoof changed the title [ONNX] Remove Aten ops from ONNX models [WIP][ONNX] Remove Aten ops from ONNX models Apr 24, 2020
@neginraoof neginraoof changed the title [WIP][ONNX] Remove Aten ops from ONNX models [WIP][ONNX] Remove Aten ops from ONNX export Apr 28, 2020
@neginraoof neginraoof requested a review from apaszke as a code owner Apr 28, 2020
@neginraoof
Copy link
Collaborator Author

@neginraoof neginraoof commented Apr 29, 2020

@houseroad I'm seeing ONNX checker failures. Would it be possible to update ONNX submodule?

Loading

@neginraoof neginraoof changed the title [WIP][ONNX] Remove Aten ops from ONNX export [ONNX] Remove Aten ops from ONNX export May 1, 2020
torch/onnx/__init__.py Show resolved Hide resolved
Loading
torch/onnx/__init__.py Outdated Show resolved Hide resolved
Loading
torch/onnx/symbolic_opset9.py Outdated Show resolved Hide resolved
Loading
torch/onnx/symbolic_opset9.py Outdated Show resolved Hide resolved
Loading
torch/onnx/utils.py Show resolved Hide resolved
Loading
torch/onnx/__init__.py Outdated Show resolved Hide resolved
Loading
torch/onnx/__init__.py Outdated Show resolved Hide resolved
Loading
@neginraoof
Copy link
Collaborator Author

@neginraoof neginraoof commented May 19, 2020

cc @houseroad
Please review this PR. Thanks!

Loading

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Loading

@neginraoof
Copy link
Collaborator Author

@neginraoof neginraoof commented May 27, 2020

@houseroad Thanks for looking into this. Could you please give me more details about the internal failures?

Loading

Copy link
Member

@houseroad houseroad left a comment

Could you rebase the PR to resolve the conflict?

Loading

is exported as:
graph(%x.1 : Long(1:1)):
%1 : Tensor = onnx::ReduceSum[keepdims=0](%x.1)
%y.1 : Long() = prim::ListConstruct(%1)
Copy link
Member

@houseroad houseroad May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the comments, I am still not clear how ONNX_FALLTHROUGH works, could you also provide an ONNX graph as well?

Loading

Copy link
Collaborator Author

@neginraoof neginraoof May 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example graph is at line 103:

Example graph:
                graph(%x.1 : Long(1:1)):
                  %1 : None = prim::Constant()
                  %2 : Tensor = aten::sum(%x.1, %1)
                  %y.1 : Tensor[] = prim::ListConstruct(%2)
                  return (%y.1)
is exported as:
                graph(%x.1 : Long(1:1)):
                  %1 : Tensor = onnx::ReduceSum[keepdims=0](%x.1)
                  %y.1 : Long() = prim::ListConstruct(%1)
                return (%y.1)

This mode is converting the registered onnx ops into ONNX domain, and fallthrough the ops not registered.
In this case, prim:ListConstruct is not supported, and so it's exported as is.
With ONNX_ATEN_FALLBACK mode, only unsupported aten ops are exported. The new ONNX_FALLTHROUGH mode exports any unsupported op.

Loading

Copy link
Collaborator Author

@neginraoof neginraoof May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loading

Copy link
Member

@houseroad houseroad May 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think you should add the above explanation to the comments as well, such as in the example, which op is not supported by onnx, and how it falls through, and the major difference between ATEN_FALLBACK and ONNX_FALLTHROUGH

Loading

Copy link
Collaborator Author

@neginraoof neginraoof May 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll update it.

Loading

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Loading

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Loading

@facebook-github-bot
Copy link
Contributor

@facebook-github-bot facebook-github-bot commented May 30, 2020

@houseroad merged this pull request in b7b99ab.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants