[ONNX] disable size optimizations #35401

eellison · 2020-03-25T19:14:56Z

Seeing which tests fail in the CI.

…_shape

dr-ci · 2020-03-25T19:16:00Z

💊 CircleCI build failures summary and remediations

As of commit 96af1f3 (more details on the Dr. CI page):

2/6 failures introduced in this PR
4/6 broken upstream at merge base bf24753 on Mar 26 from 11:29am to 1:36pm PDT (4 commits; bf24753 - 4d39aee)
Please rebase on the viable/strict branch (expand for instructions)

Since your merge base is older than viable/strict, run these commands:
```
git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD
```
Check out the recency history of this "viable master" tracking branch.

🕵️ 2 new failures recognized by patterns

The following build failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

Step: "Build" (full log | pattern match details) <confirmed not flaky by 2 failures>

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/dimensions.py 
Auto-merging .circleci/cimodel/data/dimensions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/caffe2_build_definitions.py 
Auto-merging .circleci/cimodel/data/caffe2_build_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/caffe2_build_data.py 
Auto-merging .circleci/cimodel/data/caffe2_build_data.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/binary_build_data.py 
Auto-merging .circleci/cimodel/data/binary_build_data.py 
CONFLICT (add/add): Merge conflict in .circleci/README.md 
Auto-merging .circleci/README.md 
Automatic merge failed; fix conflicts and then commit the result.

pytorch_xla_linux_xenial_py3_6_clang7_test (2/2)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Mar 31 19:44:48 caused by: Connection refused (os error 111)

Mar 31 19:44:48 +++ eval 'extract_trap_cmd ' 
Mar 31 19:44:48 ++++ extract_trap_cmd 
Mar 31 19:44:48 ++++ printf '%s\n' '' 
Mar 31 19:44:48 +++ printf '%s\n' cleanup 
Mar 31 19:44:48 ++ trap -- ' 
Mar 31 19:44:48 cleanup' EXIT 
Mar 31 19:44:48 ++ which sccache 
Mar 31 19:44:48 ++ sccache --stop-server 
Mar 31 19:44:48 Stopping sccache server... 
Mar 31 19:44:48 error: couldn't connect to server 
Mar 31 19:44:48 caused by: Connection refused (os error 111) 
Mar 31 19:44:48 ++ true 
Mar 31 19:44:48 ++ rm /var/lib/jenkins/sccache_error.log 
Mar 31 19:44:48 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
Mar 31 19:44:48 ++ SCCACHE_IDLE_TIMEOUT=1200 
Mar 31 19:44:48 ++ RUST_LOG=sccache::server=error 
Mar 31 19:44:48 ++ sccache --start-server 
Mar 31 19:44:48 Starting sccache server... 
Mar 31 19:44:48 ++ sccache --zero-stats 
Mar 31 19:44:48 Compile requests                 0 
Mar 31 19:44:48 Compile requests executed        0

1 job timed out:

pytorch_linux_xenial_py3_clang5_asan_test

🚧 3 upstream failures:

These were probably caused by upstream breakages:

caffe2_onnx_ort2_py3_6_clang7_ubuntu16_04_test on Mar 26 from 11:08am to 1:33pm PDT (7 commits; 3b2b6ae - e005750)
caffe2_onnx_ort1_py3_6_clang7_ubuntu16_04_test on Mar 26 from 11:08am to 1:33pm PDT (7 commits; 3b2b6ae - e005750)
pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test on Mar 26 from 11:29am to 1:36pm PDT (4 commits; bf24753 - 4d39aee)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 28 times.

facebook-github-bot

@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…_disable_size_opt

BowenBao · 2020-03-27T17:45:40Z

@eellison the failure in onnx test test_dim reveals an issue with jit, instead of onnx export. I have created a small repro for you to look at.

import torch

class DimModel(torch.nn.Module):
    def forward(self, input):
        out = input * 2
        out *= out.dim()
        return out

m = torch.jit.script(DimModel())

input_1 = torch.arange(6).view(2, 3)
print(m(input_1))
"""
outputs tensor([[ 0,  4,  8],
        [12, 16, 20]])
"""

input_2 = torch.arange(6).view(1, 2, 3)
print(m(input_2))
"""
outputs tensor([[[ 0,  4,  8],
         [12, 16, 20]]])

but should be tensor([[[ 0,  6, 12],
         [18, 24, 30]]])
The correct result can also be produced if DimModel()(input_1) is commented
"""

edit: cc @houseroad

eellison · 2020-03-27T18:13:07Z

@BowenBao when I check the results against eager I don't get any difference:

import torch

class DimModel(torch.nn.Module):
    def forward(self, input):
        out = input * 2
        out *= out.dim()
        return out

m = torch.jit.script(DimModel())
eager = DimModel()

input_1 = torch.arange(6).view(2, 3)
self.assertEqual(eager(input_1), m(input_1))

input_2 = torch.arange(6).view(1, 2, 3)
self.assertEqual(eager(input_2), m(input_2))

BowenBao · 2020-03-27T20:16:20Z

@eellison what is self in the snippet above? I changed self.assertEqual to print and got the following

tensor([[ 0,  4,  8],
        [12, 16, 20]]) tensor([[ 0,  4,  8],
        [12, 16, 20]])
tensor([[[ 0,  6, 12],
         [18, 24, 30]]]) tensor([[[ 0,  4,  8],
         [12, 16, 20]]])

Could you verify if you are using the pytorch build of this pr?

eellison · 2020-03-28T01:00:30Z

Yep, I ran it again on 3a9fc1265dad403479a35f405452853ae0ae6ed8, no failure.

import torch
class DimModel(torch.nn.Module):
    def forward(self, input):
        out = input * 2
        out *= out.dim()
        return out

m = torch.jit.script(DimModel())
eager = DimModel()

input_1 = torch.arange(6).view(2, 3)
print(eager(input_1), m(input_1))
# tensor([[ 0,  4,  8],
#        [12, 16, 20]]) tensor([[ 0,  4,  8],
#        [12, 16, 20]])

input_2 = torch.arange(6).view(1, 2, 3)
print(eager(input_2), m(input_2))
# tensor([[[ 0,  6, 12],
#         [18, 24, 30]]]) tensor([[[ 0,  6, 12],
#        [18, 24, 30]]])

BowenBao · 2020-03-31T18:14:08Z

@eellison that's strange, this is the repro I fetched from the CI failure.

Mar 27 00:10:14     def test_dim(self):
Mar 27 00:10:14         class DimModel(torch.jit.ScriptModule):
Mar 27 00:10:14             @torch.jit.script_method
Mar 27 00:10:14             def forward(self, input):
Mar 27 00:10:14                 out = input * 2
Mar 27 00:10:14                 out *= out.dim()
Mar 27 00:10:14                 return out
Mar 27 00:10:14         empty_input = torch.randn(0, requires_grad=True)
Mar 27 00:10:14         multi_dim_input = torch.randn(1, 2, 3, requires_grad=True)
Mar 27 00:10:14         self.run_test(DimModel(), empty_input)
Mar 27 00:10:14 >       self.run_test(DimModel(), multi_dim_input)
...
Mar 27 00:10:14 >   [np.testing.assert_allclose(out, ort_out, rtol=rtol, atol=atol) for out, ort_out in zip(outputs, ort_outs)]
Mar 27 00:10:14 E   AssertionError: 
Mar 27 00:10:14 E   Not equal to tolerance rtol=0.001, atol=1e-07
Mar 27 00:10:14 E   Mismatch: 100%
Mar 27 00:10:14 E   Max absolute difference: 8.715158
Mar 27 00:10:14 E   Max relative difference: 0.6666667
Mar 27 00:10:14 E    x: array([[[ 3.081992, -0.586858, -4.357579],
Mar 27 00:10:14 E           [ 1.136863, -2.169045, -2.797191]]], dtype=float32)
Mar 27 00:10:14 E    y: array([[[  9.245976,  -1.760573, -13.072737],
Mar 27 00:10:14 E           [  3.410588,  -6.507134,  -8.391573]]], dtype=float32)

Summary: Reviving this PR #35401 eellison. I believe after the profiled graph executor fix the test failures are handled. Pull Request resolved: #36243 Differential Revision: D20950623 Pulled By: eellison fbshipit-source-id: 5fbee426d1a098d84d5938540d45ce00828299be

Summary: Reviving this PR pytorch#35401 eellison. I believe after the profiled graph executor fix the test failures are handled. Pull Request resolved: pytorch#36243 Differential Revision: D20950623 Pulled By: eellison fbshipit-source-id: 5fbee426d1a098d84d5938540d45ce00828299be

eellison added 14 commits March 18, 2020 18:05

[JIT] remove prim::shape op

cee40e0

fix type

d81b11e

add BC ignore

417f213

guard aten::dim optimization for onnx

fc52d29

update w/onnx changes

6a3ac1a

revert to original behavior

11c7704

update name

316d4cb

disable tests

e50fb7c

Merge branch 'master' of https://github.com/pytorch/pytorch into prim…

c946b61

…_shape

skip size

e1a5fdd

Merge branch 'master' of https://github.com/pytorch/pytorch into prim…

f37f624

…_shape

revert changes

19ce29d

Merge branch 'master' of https://github.com/pytorch/pytorch into prim…

bc5a9db

…_shape

remove formatting changes

d006f9e

eellison requested a review from BowenBao March 25, 2020 19:14

eellison requested a review from apaszke as a code owner March 25, 2020 19:14

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 25, 2020

facebook-github-bot reviewed Mar 25, 2020

View reviewed changes

eellison mentioned this pull request Mar 25, 2020

[JIT] remove prim::shape op #34286

Closed

disable size optimizations

ce8e670

eellison force-pushed the onnx_disable_size_opt branch from 859ae96 to ce8e670 Compare March 25, 2020 22:02

facebook-github-bot reviewed Mar 25, 2020

View reviewed changes

eellison closed this Mar 26, 2020

eellison reopened this Mar 26, 2020

eellison and others added 2 commits March 26, 2020 12:09

Merge branch 'master' of https://github.com/pytorch/pytorch into onnx…

3a9fc12

…_disable_size_opt

patch onnx opset 11 export for aten::size

d126512

BowenBao mentioned this pull request Mar 31, 2020

[JIT] Optimize before inlining #35562

Closed

update test_dim for test debugging

96af1f3

eellison closed this Apr 6, 2020

BowenBao mentioned this pull request Apr 8, 2020

[ONNX] disable size optimizations for onnx #36243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNX] disable size optimizations #35401

[ONNX] disable size optimizations #35401

eellison commented Mar 25, 2020

dr-ci bot commented Mar 25, 2020 •

edited

Loading

facebook-github-bot left a comment

facebook-github-bot left a comment

BowenBao commented Mar 27, 2020 •

edited

Loading

eellison commented Mar 27, 2020

BowenBao commented Mar 27, 2020

eellison commented Mar 28, 2020

BowenBao commented Mar 31, 2020

[ONNX] disable size optimizations #35401

[ONNX] disable size optimizations #35401

Conversation

eellison commented Mar 25, 2020

dr-ci bot commented Mar 25, 2020 • edited Loading

💊 CircleCI build failures summary and remediations

🕵️ 2 new failures recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

pytorch_xla_linux_xenial_py3_6_clang7_test (2/2)

🚧 3 upstream failures:

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

BowenBao commented Mar 27, 2020 • edited Loading

eellison commented Mar 27, 2020

BowenBao commented Mar 27, 2020

eellison commented Mar 28, 2020

BowenBao commented Mar 31, 2020

dr-ci bot commented Mar 25, 2020 •

edited

Loading

BowenBao commented Mar 27, 2020 •

edited

Loading