Foreach Test Refactor: Pointwise, Min/Max-imum #61327

crcrpar · 2021-07-07T01:22:10Z

rewrite pointwise unittests using ops decorator
rewrite minimum&maximum unittests using ops decorator
enable minimum/maximum fastpath for BFloat16
remove _test_data method

- rewrite pointwise unittests using `ops` decorator - rewrite minimum&maximum unittests using `ops` decorator - enable minimum/maximum fastpath for BFloat16 - remove _test_data method

facebook-github-bot · 2021-07-07T01:22:15Z

💊 CI failures summary and remediations

As of commit 48b2cd4 (more details on the Dr. CI page and at hud.pytorch.org/pr/61327):

4/4 failures introduced in this PR

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_xla_linux_bionic_py3_6_clang9_build (1/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-win.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-win.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_linux_xenial_py3_clang5_asan_test1 (2/4)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jul 20 05:34:12 SUMMARY: UndefinedBehaviorSanit.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in

Jul 20 05:34:12     #9 0x55ec1e80b8f2 in PyEval_EvalCode /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/ceval.c:731
Jul 20 05:34:12     #10 0x55ec1e873cd5 in run_mod /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:1025
Jul 20 05:34:12     #11 0x55ec1e875d5d in PyRun_StringFlags /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:949
Jul 20 05:34:12     #12 0x55ec1e875dbb in PyRun_SimpleStringFlags /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Python/pythonrun.c:445
Jul 20 05:34:12     #13 0x55ec1e876926 in run_command /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Modules/main.c:301
Jul 20 05:34:12     #14 0x55ec1e876926 in Py_Main /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Modules/main.c:749
Jul 20 05:34:12     #15 0x55ec1e7b0196 in main /home/builder/ktietz/cos6/ci_cos6/python_1622833237666/work/Programs/python.c:69
Jul 20 05:34:12     #16 0x7f5e0366583f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291
Jul 20 05:34:12     #17 0x55ec1e84033d in _start (/opt/conda/bin/python3.6+0x1a733d)
Jul 20 05:34:12 
Jul 20 05:34:12 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in 
Jul 20 05:34:12 + retcode=1
Jul 20 05:34:12 + set -e
Jul 20 05:34:12 + return 1
Jul 20 05:34:12 + [[ pytorch-linux-xenial-py3-clang5-asan-test1 == *-NO_AVX-* ]]
Jul 20 05:34:12 + [[ pytorch-linux-xenial-py3-clang5-asan-test1 == *-NO_AVX2-* ]]
Jul 20 05:34:12 + '[' -n https://github.com/pytorch/pytorch/pull/61327 ']'
Jul 20 05:34:12 + [[ pytorch-linux-xenial-py3-clang5-asan-test1 != *coverage* ]]
Jul 20 05:34:12 ++ mktemp
Jul 20 05:34:12 + DETERMINE_FROM=/tmp/tmp.n3vARtDqED
Jul 20 05:34:12 + file_diff_from_base /tmp/tmp.n3vARtDqED

pytorch_linux_xenial_py3_6_gcc5_4_build (3/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-win.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-win.yml
CONFLICT (add/add): Merge conflict in .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Auto-merging .azure_pipelines/job_templates/build-verify-publish-template-unix.yml
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_macos_10_13_py3_test (4/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jul 20 06:40:55 test_remote_message_script_de...yUniqueId(created_on=0, local_id=0) to be created.

Jul 20 06:40:22 frame #12: std::__1::__function::__func<std::__1::__bind<torch::distributed::rpc::ProcessGroupAgent::enqueueRecv(torch::distributed::rpc::RecvWork)::$_6, torch::distributed::rpc::RecvWork>, std::__1::allocator<std::__1::__bind<torch::distributed::rpc::ProcessGroupAgent::enqueueRecv(torch::distributed::rpc::RecvWork)::$_6, torch::distributed::rpc::RecvWork> >, void ()>::operator()() + 42 (0x112a3099a in libtorch_cpu.dylib)
Jul 20 06:40:22 frame #13: c10::ThreadPool::main_loop(unsigned long) + 569 (0x108ad0389 in libc10.dylib)
Jul 20 06:40:22 frame #14: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, c10::ThreadPool::ThreadPool(int, int, std::__1::function<void ()>)::$_0> >(void*) + 67 (0x108ad0a33 in libc10.dylib)
Jul 20 06:40:22 frame #15: _pthread_start + 148 (0x7fff69998109 in libsystem_pthread.dylib)
Jul 20 06:40:22 frame #16: thread_start + 15 (0x7fff69993b8b in libsystem_pthread.dylib)
Jul 20 06:40:22 
Jul 20 06:40:22 ok (5.585s)
Jul 20 06:40:32   test_remote_message_dropped_pickle (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (9.664s)
Jul 20 06:40:42   test_remote_message_dropped_pickle_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (9.703s)
Jul 20 06:40:50   test_remote_message_script_delay_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (8.494s)
Jul 20 06:40:55   test_remote_message_script_delay_timeout_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... [E request_callback_no_python.cpp:555] Received error while processing request type 260: falseINTERNAL ASSERT FAILED at "../torch/csrc/distributed/rpc/rref_context.cpp":390, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.
Jul 20 06:40:55 Exception raised from getOwnerRRef at ../torch/csrc/distributed/rpc/rref_context.cpp:390 (most recent call first):
Jul 20 06:40:55 frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 98 (0x10c86c6d2 in libc10.dylib)
Jul 20 06:40:55 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 106 (0x10c86ae4a in libc10.dylib)
Jul 20 06:40:55 frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 64 (0x10c86b080 in libc10.dylib)
Jul 20 06:40:55 frame #3: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 1620 (0x119806fc4 in libtorch_cpu.dylib)
Jul 20 06:40:55 frame #4: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> >) const + 86 (0x1197f2556 in libtorch_cpu.dylib)
Jul 20 06:40:55 frame #5: torch::distributed::rpc::RequestCallbackImpl::processScriptRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 376 (0x10be75268 in libtorch_python.dylib)
Jul 20 06:40:55 frame #6: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 437 (0x1197f11a5 in libtorch_cpu.dylib)
Jul 20 06:40:55 frame #7: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 74 (0x10be75fda in libtorch_python.dylib)
Jul 20 06:40:55 frame #8: c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> > c10::ivalue::Future::thenAsync<torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const::$_1>(torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const::$_1, std::__1::shared_ptr<c10::Type>)::'lambda'(c10::ivalue::Future&)::operator()(c10::ivalue::Future&) + 223 (0x1197f8d7f in libtorch_cpu.dylib)

Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ngimel · 2021-07-18T23:53:07Z

test/test_foreach.py

+            opinfo.sample_inputs(device, dtype, N, noncontiguous=not is_fastpath),
+            opinfo.sample_inputs(device, dtype, N, noncontiguous=not is_fastpath),
+        ]
+        self._pointwise_test(dtype, op, ref, inputs, is_fastpath, is_inplace=False, values=None)


are there any tests where values is not None?

oh, I forgot it at all. I'm adding tests with values.

ngimel · 2021-07-19T01:19:08Z

test/test_foreach.py

+    def test_minmax_inf_nan(self, device, dtype, op):
+        inputs = (
+            [
+                torch.tensor([inf], device=device, dtype=dtype),


can you please use float('inf') etc here, and remove torch._six imports? Also, how is this test different from the previous one?

can you please use float('inf') etc here, and remove torch._six imports?

Does merge of test_minmax_inf_nan and test_minmax_float_inf_nan sound good to you?

Also, how is this test different from the previous one?

The only difference is the use of @ops decorator.

I must be missing something, both tests have @ops(foreach_minmax_op_db), but nm, combining these 2 tests is good.

ngimel · 2021-07-19T01:41:58Z

test/test_foreach.py

+        try:
+            actual = foreach_op(tensors1, tensors2, tensors3)
+        except RuntimeError as e:
+            with self.assertRaisesRegex(type(e), re.escape(str(e))):


when do errors happen here? It seems like inputs corresponding to the same op are always on the same device, e.g. the below translates into

[native_op(*_cuda_tensors), native_op(*_cpu_tensors)]

which should work. Also, no harm in writing it like this, it's a bit cleaner than zip of previously zipped tensors.

when do errors happen here?

cpu kernels don't support bfloat16 and half, but @ops(foreach_pointwiese_op_db) creates the test cases of these two dtypes. Do you think using @dtypes decorator and limiting the used dtypes to, e.g., all_fp_dtypes?

It seems like inputs corresponding to the same op are always on the same device,...

that sounds way better, thank you.

Yeah, if the purpose of this test is to test behavior on the different devices, it's better to do it for a single datatype (e.g. float). But you probably want to actually test that inputs on the different devices through an error, and now you are not doing that.

ngimel

Nice work, @crcrpar!

ngimel · 2021-07-21T21:11:38Z

test/test_foreach.py

-            complex(1.0 - random.random(), 1.0 - random.random()),
-        )
-        for N, scalar in itertools.product(N_values, scalars):
+        for N, scalar in itertools.product(N_values, Scalars):


nice refactor

facebook-github-bot · 2021-07-21T21:16:45Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-07-22T05:01:34Z

@ngimel merged this pull request in 8a2063e.

refactor pointwise&min-max test

0ab7dc1

- rewrite pointwise unittests using `ops` decorator - rewrite minimum&maximum unittests using `ops` decorator - enable minimum/maximum fastpath for BFloat16 - remove _test_data method

facebook-github-bot added the cla signed label Jul 7, 2021

pytorchbot added the open source label Jul 7, 2021

crcrpar mentioned this pull request Jul 7, 2021

Foreach Functions Tracking Issue #58833

Open

28 tasks

ezyang requested a review from ngimel July 12, 2021 13:05

ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 12, 2021

ngimel reviewed Jul 19, 2021

View reviewed changes

crcrpar added 4 commits July 20, 2021 11:24

different devices test: only fp32 and fp64

137648e

inf nan test merge & remove torch._six imports

97ed5d6

add pointwise test with singular & multiple values

2667a4e

Cast to acc type before multiplying scaling param

48b2cd4

ngimel approved these changes Jul 21, 2021

View reviewed changes

facebook-github-bot closed this in 8a2063e Jul 22, 2021

facebook-github-bot added the Merged label Jul 22, 2021

crcrpar deleted the fe/test-refactor branch August 18, 2021 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Foreach Test Refactor: Pointwise, Min/Max-imum #61327

Foreach Test Refactor: Pointwise, Min/Max-imum #61327

Uh oh!

crcrpar commented Jul 7, 2021

Uh oh!

facebook-github-bot commented Jul 7, 2021 •

edited

Loading

Uh oh!

ngimel Jul 18, 2021

Uh oh!

crcrpar Jul 19, 2021

Uh oh!

ngimel Jul 19, 2021

Uh oh!

crcrpar Jul 19, 2021

Uh oh!

ngimel Jul 19, 2021

Uh oh!

ngimel Jul 19, 2021

Uh oh!

crcrpar Jul 19, 2021

Uh oh!

ngimel Jul 19, 2021

Uh oh!

ngimel left a comment

Uh oh!

ngimel Jul 21, 2021

Uh oh!

facebook-github-bot commented Jul 21, 2021

Uh oh!

facebook-github-bot commented Jul 22, 2021

Uh oh!

Uh oh!

Foreach Test Refactor: Pointwise, Min/Max-imum #61327

Foreach Test Refactor: Pointwise, Min/Max-imum #61327

Uh oh!

Conversation

crcrpar commented Jul 7, 2021

Uh oh!

facebook-github-bot commented Jul 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 4 new failures recognized by patterns

pytorch_xla_linux_bionic_py3_6_clang9_build (1/4)

pytorch_linux_xenial_py3_clang5_asan_test1 (2/4)

pytorch_linux_xenial_py3_6_gcc5_4_build (3/4)

pytorch_macos_10_13_py3_test (4/4)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 21, 2021

Uh oh!

facebook-github-bot commented Jul 22, 2021

Uh oh!

Uh oh!

facebook-github-bot commented Jul 7, 2021 •

edited

Loading