Skip to content

Conversation

r-barnes
Copy link
Contributor

Differential Revision: D29716659

Differential Revision: D29716659

fbshipit-source-id: d2967212d00ddfb8dce2892cfe2ac45241296ace
@facebook-github-bot facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue cla signed labels Jul 15, 2021
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jul 15, 2021

💊 CI failures summary and remediations

As of commit 7e70b32 (more details on the Dr. CI page and at hud.pytorch.org/pr/61735):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_macos_10_13_py3_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jul 16 01:47:31 test_remote_message_script_de...yUniqueId(created_on=0, local_id=0) to be created.
Jul 16 01:47:03 frame #12: std::__1::__function::__func<std::__1::__bind<torch::distributed::rpc::ProcessGroupAgent::enqueueRecv(torch::distributed::rpc::RecvWork)::$_6, torch::distributed::rpc::RecvWork>, std::__1::allocator<std::__1::__bind<torch::distributed::rpc::ProcessGroupAgent::enqueueRecv(torch::distributed::rpc::RecvWork)::$_6, torch::distributed::rpc::RecvWork> >, void ()>::operator()() + 42 (0x11fd04cba in libtorch_cpu.dylib)
Jul 16 01:47:03 frame #13: c10::ThreadPool::main_loop(unsigned long) + 569 (0x11a328369 in libc10.dylib)
Jul 16 01:47:03 frame #14: void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, c10::ThreadPool::ThreadPool(int, int, std::__1::function<void ()>)::$_0> >(void*) + 67 (0x11a328a13 in libc10.dylib)
Jul 16 01:47:03 frame #15: _pthread_start + 148 (0x7fff6ed8d109 in libsystem_pthread.dylib)
Jul 16 01:47:03 frame #16: thread_start + 15 (0x7fff6ed88b8b in libsystem_pthread.dylib)
Jul 16 01:47:03 
Jul 16 01:47:03 ok (4.259s)
Jul 16 01:47:11   test_remote_message_dropped_pickle (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (8.262s)
Jul 16 01:47:20   test_remote_message_dropped_pickle_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (8.295s)
Jul 16 01:47:27   test_remote_message_script_delay_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (7.082s)
Jul 16 01:47:31   test_remote_message_script_delay_timeout_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... [E request_callback_no_python.cpp:555] Received error while processing request type 260: falseINTERNAL ASSERT FAILED at "../torch/csrc/distributed/rpc/rref_context.cpp":390, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.
Jul 16 01:47:31 Exception raised from getOwnerRRef at ../torch/csrc/distributed/rpc/rref_context.cpp:390 (most recent call first):
Jul 16 01:47:31 frame #0: c10::Error::Error(c10::SourceLocation, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >) + 98 (0x1172136b2 in libc10.dylib)
Jul 16 01:47:31 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 106 (0x117211e2a in libc10.dylib)
Jul 16 01:47:31 frame #2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 64 (0x117212060 in libc10.dylib)
Jul 16 01:47:31 frame #3: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 1711 (0x11cb381bf in libtorch_cpu.dylib)
Jul 16 01:47:31 frame #4: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> >) const + 86 (0x11cb22a16 in libtorch_cpu.dylib)
Jul 16 01:47:31 frame #5: torch::distributed::rpc::RequestCallbackImpl::processScriptRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 376 (0x11680ff98 in libtorch_python.dylib)
Jul 16 01:47:31 frame #6: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 437 (0x11cb21665 in libtorch_cpu.dylib)
Jul 16 01:47:31 frame #7: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const + 74 (0x116810d0a in libtorch_python.dylib)
Jul 16 01:47:31 frame #8: c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> > c10::ivalue::Future::thenAsync<torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const::$_1>(torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::__1::vector<c10::Stream, std::__1::allocator<c10::Stream> >) const::$_1, std::__1::shared_ptr<c10::Type>)::'lambda'(c10::ivalue::Future&)::operator()(c10::ivalue::Future&) + 223 (0x11cb2932f in libtorch_cpu.dylib)

Preview docs built from this PR

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D29716659

@albanD albanD removed their request for review July 15, 2021 23:50
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are no such thing is tp_vectorcall_offset

0, /* tp_itemsize */
nullptr, /* tp_dealloc */
// NOLINTNEXTLINE(modernize-use-nullptr)
0, /* tp_vectorcall_offset */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Item after tp_dealloc is tp_print, isn't it?
https://github.com/python/cpython/blob/0a0a135bae2692d069b18d2d590397fbe0a0d39a/Include/object.h#L353-L354

Suggested change
0, /* tp_vectorcall_offset */
nullptr, /* tp_print */

@malfet
Copy link
Contributor

malfet commented Jul 16, 2021

Actually, see https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_vectorcall_offset
It is void* on older version of python, but a ssize_t on Python-3.8+

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 349f2f7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed fb-exported Merged oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants