Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[quant] Fix dimension for output of batchnorm 1d #59264

Closed
wants to merge 4 commits into from

Conversation

jerryzh168
Copy link
Contributor

@jerryzh168 jerryzh168 commented Jun 1, 2021

Stack from ghstack:

Summary:
fixes: #59200
Previously batchnorm 1d did unsqueeze twice but only squeeze once before return when the dimension
for input Tensor is 2, this PR adds an extra squeeze

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D28810597

Summary:
Previously batchnorm 1d did unsqueeze twice but only squeeze once before return when the dimension
for input Tensor is 2, this PR adds an extra squeeze

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jun 1, 2021

💊 CI failures summary and remediations

As of commit 6070d79 (more details on the Dr. CI page):


  • 3/3 failures possibly* introduced in this PR
    • 1/3 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_bionic_py3_6_clang9_noarch_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jun 08 02:33:26 test_udf_remote_message_delay...yUniqueId(created_on=0, local_id=0) to be created.
Jun 08 02:32:39 frame #13: c10::ThreadPool::main_loop(unsigned long) + 0x17a (0x7ff3ef6d0e5a in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 08 02:32:39 frame #14: <unknown function> + 0xc819d (0x7ff3ef5e619d in /opt/conda/lib/libstdc++.so.6)
Jun 08 02:32:39 frame #15: <unknown function> + 0x76db (0x7ff40d2786db in /lib/x86_64-linux-gnu/libpthread.so.0)
Jun 08 02:32:39 frame #16: clone + 0x3f (0x7ff40cfa171f in /lib/x86_64-linux-gnu/libc.so.6)
Jun 08 02:32:39 
Jun 08 02:32:40 ok (5.322s)
Jun 08 02:32:56   test_rpc_builtin_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (16.412s)
Jun 08 02:33:07   test_rpc_script_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (10.695s)
Jun 08 02:33:12   test_rref_to_here_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (5.174s)
Jun 08 02:33:21   test_udf_remote_message_delay_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (8.795s)
Jun 08 02:33:26   test_udf_remote_message_delay_timeout_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... [E request_callback_no_python.cpp:552] Received error while processing request type 261: falseINTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp":387, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.
Jun 08 02:33:26 Exception raised from getOwnerRRef at /var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp:387 (most recent call first):
Jun 08 02:33:26 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x7d (0x7fa7fa6df08d in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 08 02:33:26 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xde (0x7fa7fa6dd7ae in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 08 02:33:26 frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x3b (0x7fa7fa6dd9fb in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Jun 08 02:33:26 frame #3: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 0x664 (0x7fa7fee5c094 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 08 02:33:26 frame #4: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> >) const + 0x59 (0x7fa7fee44b09 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 08 02:33:26 frame #5: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0xa7 (0x7fa808145e87 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
Jun 08 02:33:26 frame #6: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x1d7 (0x7fa7fee43407 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Jun 08 02:33:26 frame #7: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x41 (0x7fa808146e31 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
Jun 08 02:33:26 frame #8: <unknown function> + 0x4538e28 (0x7fa7fee4be28 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)

1 failure not recognized by patterns:

Job Step Action
GitHub Actions Windows CI (pytorch-win-vs2019-cpu-py3) / render_test_results Unknown 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

jerryzh168 added a commit that referenced this pull request Jun 1, 2021
Summary:
Previously batchnorm 1d did unsqueeze twice but only squeeze once before return when the dimension
for input Tensor is 2, this PR adds an extra squeeze

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 22b6872bbba6575c45ac3882ef7355efd23b8583
Pull Request resolved: #59264
@jerryzh168
Copy link
Contributor Author

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jerryzh168 jerryzh168 requested review from vkuzo and supriyar and removed request for vkuzo June 1, 2021 22:05
@pmeier
Copy link
Collaborator

pmeier commented Jun 2, 2021

@jerryzh168 You probably need to disable the 1d case in

def test_batch_norm_relu(self):

and

def test_batch_norm(self):

They only pass now because there is a bug in the underlying comparison mechanism. Try put a assert qx.shape == qy.shape in there and see for yourself. After #59067 has landed, the tests will (correctly) fail if not fixed.

@jerryzh168
Copy link
Contributor Author

@jerryzh168 You probably need to disable the 1d case in

def test_batch_norm_relu(self):

and

def test_batch_norm(self):

They only pass now because there is a bug in the underlying comparison mechanism. Try put a assert qx.shape == qy.shape in there and see for yourself. After #59067 has landed, the tests will (correctly) fail if not fixed.

makes sense, this PR would fix the issue though, if we land this PR before yours it would not fail I think?

@pmeier
Copy link
Collaborator

pmeier commented Jun 3, 2021

If I'm not mistaken you only need to remove the 2 from

max_sides = (2, 3, 4, 5)

as well as the ==2 check here

if len(X.shape) == 2 or len(X.shape) == 3:

@jerryzh168
Copy link
Contributor Author

jerryzh168 commented Jun 3, 2021

If I'm not mistaken you only need to remove the 2 from

max_sides = (2, 3, 4, 5)

as well as the ==2 check here

if len(X.shape) == 2 or len(X.shape) == 3:

are you saying I don't need to add an extra test for batchnorm 1d? I can remove that
The goal for the test is to make sure batchnorm 1d outputs a Tensor with correct shape

@pmeier
Copy link
Collaborator

pmeier commented Jun 3, 2021

Oh sorry, I misunderstood the intent of the PR. Ignore my earlier comment. Before you land this PR though, could you locally put in assert qx.shape == qy.shape to make sure this is actually passing?

@jerryzh168
Copy link
Contributor Author

Oh sorry, I misunderstood the intent of the PR. Ignore my earlier comment. Before you land this PR though, could you locally put in assert qx.shape == qy.shape to make sure this is actually passing?

yes, will do that

Summary:
fixes: #59200
Previously batchnorm 1d did unsqueeze twice but only squeeze once before return when the dimension
for input Tensor is 2, this PR adds an extra squeeze

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D28810597](https://our.internmc.facebook.com/intern/diff/D28810597)

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jun 3, 2021
Summary:
Previously batchnorm 1d did unsqueeze twice but only squeeze once before return when the dimension
for input Tensor is 2, this PR adds an extra squeeze

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: dca48d44bffbb2cb465e1ed53030e96b509ee0fb
Pull Request resolved: #59264
@jerryzh168
Copy link
Contributor Author

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jerryzh168 jerryzh168 requested a review from pmeier June 4, 2021 00:34
Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! If it passes locally with the suggested check, it should be fine.

@jerryzh168
Copy link
Contributor Author

LGTM! If it passes locally with the suggested check, it should be fine.

yes it did pass the shape check locally

Summary:
fixes: #59200
Previously batchnorm 1d did unsqueeze twice but only squeeze once before return when the dimension
for input Tensor is 2, this PR adds an extra squeeze

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D28810597](https://our.internmc.facebook.com/intern/diff/D28810597)

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jun 7, 2021
Summary:
Previously batchnorm 1d did unsqueeze twice but only squeeze once before return when the dimension
for input Tensor is 2, this PR adds an extra squeeze

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: 111c8fdf6a50f6b11ffb2301a2686345153fb37a
Pull Request resolved: #59264
Summary:
fixes: #59200
Previously batchnorm 1d did unsqueeze twice but only squeeze once before return when the dimension
for input Tensor is 2, this PR adds an extra squeeze

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: [D28810597](https://our.internmc.facebook.com/intern/diff/D28810597)

[ghstack-poisoned]
jerryzh168 added a commit that referenced this pull request Jun 8, 2021
Summary:
Previously batchnorm 1d did unsqueeze twice but only squeeze once before return when the dimension
for input Tensor is 2, this PR adds an extra squeeze

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: a6678302fae0de0a131359832ec0621585277694
Pull Request resolved: #59264
@jerryzh168
Copy link
Contributor Author

@jerryzh168 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 9de0c21.

deniskokarev pushed a commit to deniskokarev/pytorch that referenced this pull request Jun 9, 2021
Summary:
Pull Request resolved: pytorch#59264

Previously batchnorm 1d did unsqueeze twice but only squeeze once before return when the dimension
for input Tensor is 2, this PR adds an extra squeeze

Test Plan: Imported from OSS

Reviewed By: supriyar

Differential Revision: D28810597

fbshipit-source-id: 879873bbf39ed3607762684694f6e81b423740c2
@facebook-github-bot facebook-github-bot deleted the gh/jerryzh168/611/head branch June 12, 2021 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants