Skip to content

Conversation

jeffdaily
Copy link
Collaborator

Fixes current ROCm CI test2 brokenness until tensorpipe is fully supported by ROCm.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 18, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 3110303 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 1/2 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_clang7_asan_test1 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Aug 18 23:04:34 test_remote_message_script_de...yUniqueId(created_on=0, local_id=0) to be created.
Aug 18 23:03:49 frame #13: <unknown function> + 0x198f3c70 (0x7f4495b9dc70 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Aug 18 23:03:49 frame #14: c10::ThreadPool::main_loop(unsigned long) + 0x7f1 (0x7f4472d97b91 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 18 23:03:49 frame #15: <unknown function> + 0xb8c80 (0x7f44bf47fc80 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
Aug 18 23:03:49 frame #16: <unknown function> + 0x76ba (0x7f44bfb1a6ba in /lib/x86_64-linux-gnu/libpthread.so.0)
Aug 18 23:03:49 frame #17: clone + 0x6d (0x7f44bf85051d in /lib/x86_64-linux-gnu/libc.so.6)
Aug 18 23:03:49 
Aug 18 23:03:50 ok (8.513s)
Aug 18 23:04:02   test_remote_message_dropped_pickle (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (12.519s)
Aug 18 23:04:15   test_remote_message_dropped_pickle_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (12.526s)
Aug 18 23:04:26   test_remote_message_script_delay_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (11.625s)
Aug 18 23:04:34   test_remote_message_script_delay_timeout_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... [E request_callback_no_python.cpp:559] Received error while processing request type 260: falseINTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp":387, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.
Aug 18 23:04:34 Exception raised from getOwnerRRef at /var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp:387 (most recent call first):
Aug 18 23:04:34 frame #0: <unknown function> + 0x1a231c (0x7f5ef522531c in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 18 23:04:34 frame #1: std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const + 0x6d (0x7f5f16ca9fcd in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Aug 18 23:04:34 frame #2: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x160 (0x7f5ef5223800 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 18 23:04:34 frame #3: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x18a (0x7f5ef521e66a in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 18 23:04:34 frame #4: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x115 (0x7f5ef521ed75 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 18 23:04:34 frame #5: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 0xd62 (0x7f5f17f490e2 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Aug 18 23:04:34 frame #6: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> >) const + 0x223 (0x7f5f17f0de33 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Aug 18 23:04:34 frame #7: torch::distributed::rpc::RequestCallbackImpl::processScriptRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x8e3 (0x7f5f3a6ec613 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
Aug 18 23:04:34 frame #8: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x78d (0x7f5f17f0afbd in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)

1 job timed out:

  • pytorch_linux_xenial_py3_clang7_asan_test1

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@github-actions github-actions bot added the module: rocm AMD GPU support for Pytorch label Aug 18, 2021
@jeffdaily jeffdaily requested a review from walterddr August 18, 2021 19:51
@facebook-github-bot
Copy link
Contributor

@walterddr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@walterddr merged this pull request in be9be9b.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants