Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNX] disable size optimizations #35401

Closed
wants to merge 18 commits into from

Conversation

eellison
Copy link
Contributor

Seeing which tests fail in the CI.

@eellison eellison requested a review from BowenBao March 25, 2020 19:14
@eellison eellison requested a review from apaszke as a code owner March 25, 2020 19:14
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 25, 2020
@dr-ci
Copy link

dr-ci bot commented Mar 25, 2020

💊 CircleCI build failures summary and remediations

As of commit 96af1f3 (more details on the Dr. CI page):


  • 2/6 failures introduced in this PR

  • 4/6 broken upstream at merge base bf24753 on Mar 26 from 11:29am to 1:36pm PDT (4 commits; bf24753 - 4d39aee)

    Please rebase on the viable/strict branch (expand for instructions)

    Since your merge base is older than viable/strict, run these commands:

    git fetch https://github.com/pytorch/pytorch viable/strict
    git rebase FETCH_HEAD
    

    Check out the recency history of this "viable master" tracking branch.


🕵️ 2 new failures recognized by patterns

The following build failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

Step: "Build" (full log | pattern match details) <confirmed not flaky by 2 failures>

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/dimensions.py 
Auto-merging .circleci/cimodel/data/dimensions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/caffe2_build_definitions.py 
Auto-merging .circleci/cimodel/data/caffe2_build_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/caffe2_build_data.py 
Auto-merging .circleci/cimodel/data/caffe2_build_data.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/binary_build_data.py 
Auto-merging .circleci/cimodel/data/binary_build_data.py 
CONFLICT (add/add): Merge conflict in .circleci/README.md 
Auto-merging .circleci/README.md 
Automatic merge failed; fix conflicts and then commit the result. 

See CircleCI build pytorch_xla_linux_xenial_py3_6_clang7_test (2/2)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Mar 31 19:44:48 caused by: Connection refused (os error 111)
Mar 31 19:44:48 +++ eval 'extract_trap_cmd ' 
Mar 31 19:44:48 ++++ extract_trap_cmd 
Mar 31 19:44:48 ++++ printf '%s\n' '' 
Mar 31 19:44:48 +++ printf '%s\n' cleanup 
Mar 31 19:44:48 ++ trap -- ' 
Mar 31 19:44:48 cleanup' EXIT 
Mar 31 19:44:48 ++ which sccache 
Mar 31 19:44:48 ++ sccache --stop-server 
Mar 31 19:44:48 Stopping sccache server... 
Mar 31 19:44:48 error: couldn't connect to server 
Mar 31 19:44:48 caused by: Connection refused (os error 111) 
Mar 31 19:44:48 ++ true 
Mar 31 19:44:48 ++ rm /var/lib/jenkins/sccache_error.log 
Mar 31 19:44:48 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
Mar 31 19:44:48 ++ SCCACHE_IDLE_TIMEOUT=1200 
Mar 31 19:44:48 ++ RUST_LOG=sccache::server=error 
Mar 31 19:44:48 ++ sccache --start-server 
Mar 31 19:44:48 Starting sccache server... 
Mar 31 19:44:48 ++ sccache --zero-stats 
Mar 31 19:44:48 Compile requests                 0 
Mar 31 19:44:48 Compile requests executed        0 

1 job timed out:

  • pytorch_linux_xenial_py3_clang5_asan_test

🚧 3 upstream failures:

These were probably caused by upstream breakages:


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 28 times.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@eellison eellison closed this Mar 26, 2020
@eellison eellison reopened this Mar 26, 2020
@BowenBao
Copy link
Collaborator

BowenBao commented Mar 27, 2020

@eellison the failure in onnx test test_dim reveals an issue with jit, instead of onnx export. I have created a small repro for you to look at.

import torch

class DimModel(torch.nn.Module):
    def forward(self, input):
        out = input * 2
        out *= out.dim()
        return out

m = torch.jit.script(DimModel())

input_1 = torch.arange(6).view(2, 3)
print(m(input_1))
"""
outputs tensor([[ 0,  4,  8],
        [12, 16, 20]])
"""

input_2 = torch.arange(6).view(1, 2, 3)
print(m(input_2))
"""
outputs tensor([[[ 0,  4,  8],
         [12, 16, 20]]])

but should be tensor([[[ 0,  6, 12],
         [18, 24, 30]]])
The correct result can also be produced if DimModel()(input_1) is commented
"""

edit: cc @houseroad

@eellison
Copy link
Contributor Author

@BowenBao when I check the results against eager I don't get any difference:

import torch

class DimModel(torch.nn.Module):
    def forward(self, input):
        out = input * 2
        out *= out.dim()
        return out

m = torch.jit.script(DimModel())
eager = DimModel()

input_1 = torch.arange(6).view(2, 3)
self.assertEqual(eager(input_1), m(input_1))

input_2 = torch.arange(6).view(1, 2, 3)
self.assertEqual(eager(input_2), m(input_2))

@BowenBao
Copy link
Collaborator

@eellison what is self in the snippet above? I changed self.assertEqual to print and got the following

tensor([[ 0,  4,  8],
        [12, 16, 20]]) tensor([[ 0,  4,  8],
        [12, 16, 20]])
tensor([[[ 0,  6, 12],
         [18, 24, 30]]]) tensor([[[ 0,  4,  8],
         [12, 16, 20]]])

Could you verify if you are using the pytorch build of this pr?

@eellison
Copy link
Contributor Author

Yep, I ran it again on 3a9fc1265dad403479a35f405452853ae0ae6ed8, no failure.

import torch
class DimModel(torch.nn.Module):
    def forward(self, input):
        out = input * 2
        out *= out.dim()
        return out

m = torch.jit.script(DimModel())
eager = DimModel()

input_1 = torch.arange(6).view(2, 3)
print(eager(input_1), m(input_1))
# tensor([[ 0,  4,  8],
#        [12, 16, 20]]) tensor([[ 0,  4,  8],
#        [12, 16, 20]])

input_2 = torch.arange(6).view(1, 2, 3)
print(eager(input_2), m(input_2))
# tensor([[[ 0,  6, 12],
#         [18, 24, 30]]]) tensor([[[ 0,  6, 12],
#        [18, 24, 30]]])

@BowenBao
Copy link
Collaborator

@eellison that's strange, this is the repro I fetched from the CI failure.

Mar 27 00:10:14     def test_dim(self):
Mar 27 00:10:14         class DimModel(torch.jit.ScriptModule):
Mar 27 00:10:14             @torch.jit.script_method
Mar 27 00:10:14             def forward(self, input):
Mar 27 00:10:14                 out = input * 2
Mar 27 00:10:14                 out *= out.dim()
Mar 27 00:10:14                 return out
Mar 27 00:10:14         empty_input = torch.randn(0, requires_grad=True)
Mar 27 00:10:14         multi_dim_input = torch.randn(1, 2, 3, requires_grad=True)
Mar 27 00:10:14         self.run_test(DimModel(), empty_input)
Mar 27 00:10:14 >       self.run_test(DimModel(), multi_dim_input)
...
Mar 27 00:10:14 >   [np.testing.assert_allclose(out, ort_out, rtol=rtol, atol=atol) for out, ort_out in zip(outputs, ort_outs)]
Mar 27 00:10:14 E   AssertionError: 
Mar 27 00:10:14 E   Not equal to tolerance rtol=0.001, atol=1e-07
Mar 27 00:10:14 E   Mismatch: 100%
Mar 27 00:10:14 E   Max absolute difference: 8.715158
Mar 27 00:10:14 E   Max relative difference: 0.6666667
Mar 27 00:10:14 E    x: array([[[ 3.081992, -0.586858, -4.357579],
Mar 27 00:10:14 E           [ 1.136863, -2.169045, -2.797191]]], dtype=float32)
Mar 27 00:10:14 E    y: array([[[  9.245976,  -1.760573, -13.072737],
Mar 27 00:10:14 E           [  3.410588,  -6.507134,  -8.391573]]], dtype=float32)

@eellison eellison closed this Apr 6, 2020
facebook-github-bot pushed a commit that referenced this pull request Apr 10, 2020
Summary:
Reviving this PR #35401 eellison. I believe after the profiled graph executor fix the test failures are handled.
Pull Request resolved: #36243

Differential Revision: D20950623

Pulled By: eellison

fbshipit-source-id: 5fbee426d1a098d84d5938540d45ce00828299be
ashishfarmer pushed a commit to ashishfarmer/pytorch that referenced this pull request Apr 13, 2020
Summary:
Reviving this PR pytorch#35401 eellison. I believe after the profiled graph executor fix the test failures are handled.
Pull Request resolved: pytorch#36243

Differential Revision: D20950623

Pulled By: eellison

fbshipit-source-id: 5fbee426d1a098d84d5938540d45ce00828299be
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
oncall: jit Add this issue/PR to JIT oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants