Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault caused by polyglot/joins.rb #2607

Closed
larkost opened this issue Jun 25, 2014 · 7 comments
Closed

segmentation fault caused by polyglot/joins.rb #2607

larkost opened this issue Jun 25, 2014 · 7 comments
Assignees
Milestone

Comments

@larkost
Copy link
Collaborator

larkost commented Jun 25, 2014

The polyglot/joins.rb test is failing consistently for me on newton.

./test-runner -i rb polyglot/joins.rb

I am running from the larkost/2531-python-3 branch, but the latest next point is recent at 4f55510.
Here is the output I am seeing:

== Running polyglot/joins.rb (ruby)
RethinkDB server failed: Error: rethinkdb process 8000 failed with error code -5
info: Our machine ID: fca34a0f-b1c2-45b6-a381-1510f8f27423
info: Created directory '/home/ssd2/larkost/rethinkdb/test/rql_test/run/server_33692/rdb' and a metadata file inside it.
info: Running rethinkdb 1.13.0-335-g04df13-dirty (debug) (GCC 4.6.3)...
info: Running on Linux 3.2.0-61-generic x86_64
info: Using cache size of 1024 MB
warn: Requested cache size is larger than available memory.
info: Loading data from directory /home/ssd2/larkost/rethinkdb/test/rql_test/run/server_33692/rdb
info: Our machine ID is fca34a0f-b1c2-45b6-a381-1510f8f27423
info: Listening for intracluster connections on port 19595
info: Listening for client driver connections on port 33692
info: Listening for administrative HTTP connections on port 35832
info: Listening on addresses: 127.0.0.1, 127.0.1.1, ::1
info: To fully expose RethinkDB on the network, bind to all addresses
info: by running rethinkdb with the `--bind all` command line option.
info: Server ready
Version: rethinkdb 1.13.0-335-g04df13-dirty (debug) (GCC 4.6.3)
error: Error in src/arch/runtime/thread_pool.cc at line 343:
error: Segmentation fault from reading the address 0xffffffffffffffe8.
error: Backtrace:
error: Wed Jun 25 16:18:48 2014

       1: rethinkdb_backtrace(void**, int) at rethinkdb_backtrace.cc:101
       2: backtrace_t::backtrace_t() at backtrace.cc:202
       3: lazy_backtrace_formatter_t::lazy_backtrace_formatter_t() at backtrace.cc:282
       4: format_backtrace(bool) at backtrace.cc:197
       5: report_fatal_error(char const*, int, char const*, ...) at errors.cc:83
       6: linux_thread_pool_t::sigsegv_handler(int, siginfo*, void*) at thread_pool.cc:343
       7: /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f71903a2cb0] at 0x7f71903a2cb0 (/lib/x86_64-linux-gnu/libpthread.so.0)
       8: std::string::compare(std::string const&) const at 0x7f7190b6068c (/usr/lib/x86_64-linux-gnu/libstdc++.so.6)
       9: bool std::operator< <char, std::char_traits<char>, std::allocator<char> >(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at basic_string.h:2512
       10: std::less<std::string>::operator()(std::string const&, std::string const&) const at stl_function.h:236
       11: std::_Rb_tree<std::string, std::pair<std::string const, ql::wire_func_t>, std::_Select1st<std::pair<std::string const, ql::wire_func_t> >, std::less<std::string>, std::allocator<std::pair<std::string const, ql::wire_func_t> > >::_M_lower_bound(std::_Rb_tree_node<std::pair<std::string const, ql::wire_func_t> > const*, std::_Rb_tree_node<std::pair<std::string const, ql::wire_func_t> > const*, std::string const&) const at stl_tree.h:1106
       12: std::_Rb_tree<std::string, std::pair<std::string const, ql::wire_func_t>, std::_Select1st<std::pair<std::string const, ql::wire_func_t> >, std::less<std::string>, std::allocator<std::pair<std::string const, ql::wire_func_t> > >::find(std::string const&) const at stl_tree.h:1549
       13: std::map<std::string, ql::wire_func_t, std::less<std::string>, std::allocator<std::pair<std::string const, ql::wire_func_t> > >::count(std::string const&) const at stl_map.h:769
       14: ql::global_optargs_t::get_optarg(ql::env_t*, std::string const&) at env.cc:92
       15: ql::batchspec_t::user(ql::batch_type_t, ql::env_t*) at batching.cc:86
       16: ql::concatmap_trans_t::lst_transform(std::vector<counted_t<ql::datum_t const>, std::allocator<counted_t<ql::datum_t const> > >*) at shards.cc:753
       17: ql::ungrouped_op_t::operator()(std::map<counted_t<ql::datum_t const>, std::vector<counted_t<ql::datum_t const>, std::allocator<counted_t<ql::datum_t const> > >, std::less<counted_t<ql::datum_t const> >, std::allocator<std::pair<counted_t<ql::datum_t const> const, std::vector<counted_t<ql::datum_t const>, std::allocator<counted_t<ql::datum_t const> > > > > >*, counted_t<ql::datum_t const> const&) at shards.cc:577
       18: ql::eager_datum_stream_t::next_grouped_batch(ql::env_t*, ql::batchspec_t const&, std::map<counted_t<ql::datum_t const>, std::vector<counted_t<ql::datum_t const>, std::allocator<counted_t<ql::datum_t const> > >, std::less<counted_t<ql::datum_t const> >, std::allocator<std::pair<counted_t<ql::datum_t const> const, std::vector<counted_t<ql::datum_t const>, std::allocator<counted_t<ql::datum_t const> > > > > >*) at datum_stream.cc:624
       19: ql::eager_datum_stream_t::next_batch_impl(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:654
       20: ql::datum_stream_t::next_batch(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:574
       21: ql::zip_datum_stream_t::next_raw_batch(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:880
       22: ql::eager_datum_stream_t::next_grouped_batch(ql::env_t*, ql::batchspec_t const&, std::map<counted_t<ql::datum_t const>, std::vector<counted_t<ql::datum_t const>, std::allocator<counted_t<ql::datum_t const> > >, std::less<counted_t<ql::datum_t const> >, std::allocator<std::pair<counted_t<ql::datum_t const> const, std::vector<counted_t<ql::datum_t const>, std::allocator<counted_t<ql::datum_t const> > > > > >*) at datum_stream.cc:619
       23: ql::eager_datum_stream_t::next_batch_impl(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:654
       24: ql::datum_stream_t::next_batch(ql::env_t*, ql::batchspec_t const&) at datum_stream.cc:574
       25: ql::stream_cache_t::serve(long, Response*, signal_t*) at stream_cache.cc:54
       26: ql::run(ql::protob_t<Query>, rdb_context_t*, signal_t*, ql::stream_cache_t*, Response*) at term.cc:278
       27: rdb_query_server_t::run_query(ql::protob_t<Query> const&, Response*, client_context_t*) at query_server.cc:71
       28: void query_server_t::connection_loop<json_protocol_t>(linux_tcp_conn_t*, client_context_t*) at protob.cc:318
       29: query_server_t::handle_conn(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t) at protob.cc:282
       30: std::_Mem_fn<void (query_server_t::*)(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t)>::operator()(query_server_t*, scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t) const at functional:551
       31: void std::_Bind<std::_Mem_fn<void (query_server_t::*)(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t)> (query_server_t*, std::_Placeholder<1>, auto_drainer_t::lock_t)>::__call<void, scoped_ptr_t<linux_tcp_conn_descriptor_t>&, 0, 1, 2>(std::tuple<scoped_ptr_t<linux_tcp_conn_descriptor_t>&>&&, std::_Index_tuple<0, 1, 2>) at functional:1146
       32: void std::_Bind<std::_Mem_fn<void (query_server_t::*)(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t)> (query_server_t*, std::_Placeholder<1>, auto_drainer_t::lock_t)>::operator()<scoped_ptr_t<linux_tcp_conn_descriptor_t>&, void>(scoped_ptr_t<linux_tcp_conn_descriptor_t>&&&) at functional:1206
       33: std::_Function_handler<void (scoped_ptr_t<linux_tcp_conn_descriptor_t>&), std::_Bind<std::_Mem_fn<void (query_server_t::*)(scoped_ptr_t<linux_tcp_conn_descriptor_t> const&, auto_drainer_t::lock_t)> (query_server_t*, std::_Placeholder<1>, auto_drainer_t::lock_t)> >::_M_invoke(std::_Any_data const&, scoped_ptr_t<linux_tcp_conn_descriptor_t>&) at functional:1780
       34: std::function<void (scoped_ptr_t<linux_tcp_conn_descriptor_t>&)>::operator()(scoped_ptr_t<linux_tcp_conn_descriptor_t>&) const at functional:2162
       35: linux_nonthrowing_tcp_listener_t::handle(int) at network.cc:939
       36: std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(int)>::operator()(linux_nonthrowing_tcp_listener_t*, int) const at functional:551
       37: void std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(int)> (linux_nonthrowing_tcp_listener_t*, int)>::__call<void, , 0, 1>(std::tuple<>&&, std::_Index_tuple<0, 1>) at functional:1147
       38: void std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(int)> (linux_nonthrowing_tcp_listener_t*, int)>::operator()<, void>() at functional:1206
       39: callable_action_instance_t<std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(int)> (linux_nonthrowing_tcp_listener_t*, int)> >::run_action() at callable_action.hpp:28
       40: callable_action_wrapper_t::run() at runtime_utils.cc:43
       41: coro_t::run() at coroutines.cc:199
       42: void coro_t::spawn_now_dangerously<std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(int)> (linux_nonthrowing_tcp_listener_t*, int)> >(std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(int)> (linux_nonthrowing_tcp_listener_t*, int)> const&) at coroutines.hpp:52
       43: linux_nonthrowing_tcp_listener_t::accept_loop(auto_drainer_t::lock_t) at network.cc:907
       44: std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(auto_drainer_t::lock_t)>::operator()(linux_nonthrowing_tcp_listener_t*, auto_drainer_t::lock_t) const at functional:551
       45: void std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(auto_drainer_t::lock_t)> (linux_nonthrowing_tcp_listener_t*, auto_drainer_t::lock_t)>::__call<void, , 0, 1>(std::tuple<>&&, std::_Index_tuple<0, 1>) at functional:1146
       46: void std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(auto_drainer_t::lock_t)> (linux_nonthrowing_tcp_listener_t*, auto_drainer_t::lock_t)>::operator()<, void>() at functional:1206
       47: callable_action_instance_t<std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(auto_drainer_t::lock_t)> (linux_nonthrowing_tcp_listener_t*, auto_drainer_t::lock_t)> >::run_action() at callable_action.hpp:28
       48: callable_action_wrapper_t::run() at runtime_utils.cc:43
       49: coro_t::run() at coroutines.cc:199
       50: coro_t* coro_t::spawn_sometime<std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(auto_drainer_t::lock_t)> (linux_nonthrowing_tcp_listener_t*, auto_drainer_t::lock_t)> >(std::_Bind<std::_Mem_fn<void (linux_nonthrowing_tcp_listener_t::*)(auto_drainer_t::lock_t)> (linux_nonthrowing_tcp_listener_t*, auto_drainer_t::lock_t)> const&) at coroutines.hpp:58
       51: linux_nonthrowing_tcp_listener_t::begin_listening() at network.cc:686
       52: linux_tcp_listener_t::linux_tcp_listener_t(std::set<ip_address_t, std::less<ip_address_t>, std::allocator<ip_address_t> > const&, int, std::function<void (scoped_ptr_t<linux_tcp_conn_descriptor_t>&)> const&) at network.cc:972
       53: query_server_t::query_server_t(rdb_context_t*, std::set<ip_address_t, std::less<ip_address_t>, std::allocator<ip_address_t> > const&, int, query_handler_t*, boost::shared_ptr<semilattice_readwrite_view_t<auth_semilattice_metadata_t> >) at protob.cc:178
       54: rdb_query_server_t::rdb_query_server_t(std::set<ip_address_t, std::less<ip_address_t>, std::allocator<ip_address_t> > const&, int, rdb_context_t*) at query_server.cc:17
       55: do_serve(io_backender_t*, bool, base_path_t const&, metadata_persistence::cluster_persistent_file_t*, metadata_persistence::auth_persistent_file_t*, unsigned long, serve_info_t const&, os_signal_cond_t*) at serve.cc:307
       56: serve(io_backender_t*, base_path_t const&, metadata_persistence::cluster_persistent_file_t*, metadata_persistence::auth_persistent_file_t*, unsigned long, serve_info_t const&, os_signal_cond_t*) at serve.cc:422
       57: run_rethinkdb_serve(base_path_t const&, serve_info_t*, file_direct_io_mode_t, int, unsigned long, uuid_u const*, cluster_semilattice_metadata_t const*, directory_lock_t*, bool*) at command_line.cc:815
       58: void std::_Bind<void (*(base_path_t, serve_info_t*, file_direct_io_mode_t, int, unsigned long, uuid_u*, cluster_semilattice_metadata_t*, directory_lock_t*, bool*))(base_path_t const&, serve_info_t*, file_direct_io_mode_t, int, unsigned long, uuid_u const*, cluster_semilattice_metadata_t const*, directory_lock_t*, bool*)>::__call<void, , 0, 1, 2, 3, 4, 5, 6, 7, 8>(std::tuple<>&&, std::_Index_tuple<0, 1, 2, 3, 4, 5, 6, 7, 8>) at functional:1147
       59: void std::_Bind<void (*(base_path_t, serve_info_t*, file_direct_io_mode_t, int, unsigned long, uuid_u*, cluster_semilattice_metadata_t*, directory_lock_t*, bool*))(base_path_t const&, serve_info_t*, file_direct_io_mode_t, int, unsigned long, uuid_u const*, cluster_semilattice_metadata_t const*, directory_lock_t*, bool*)>::operator()<, void>() at functional:1206
       60: std::_Function_handler<void (), std::_Bind<void (*(base_path_t, serve_info_t*, file_direct_io_mode_t, int, unsigned long, uuid_u*, cluster_semilattice_metadata_t*, directory_lock_t*, bool*))(base_path_t const&, serve_info_t*, file_direct_io_mode_t, int, unsigned long, uuid_u const*, cluster_semilattice_metadata_t const*, directory_lock_t*, bool*)> >::_M_invoke(std::_Any_data const&) at functional:1780
       61: std::function<void ()>::operator()() const at functional:2162
       62: starter_t::run_wrapper(std::function<void ()> const&) at runtime.cc:61
       63: std::_Mem_fn<void (starter_t::*)(std::function<void ()> const&)>::operator()(starter_t*, std::function<void ()> const&) const at functional:551
       64: void std::_Bind<std::_Mem_fn<void (starter_t::*)(std::function<void ()> const&)> (starter_t*, std::function<void ()>)>::__call<void, , 0, 1>(std::tuple<>&&, std::_Index_tuple<0, 1>) at functional:1147
       65: void std::_Bind<std::_Mem_fn<void (starter_t::*)(std::function<void ()> const&)> (starter_t*, std::function<void ()>)>::operator()<, void>() at functional:1206
       66: std::_Function_handler<void (), std::_Bind<std::_Mem_fn<void (starter_t::*)(std::function<void ()> const&)> (starter_t*, std::function<void ()>)> >::_M_invoke(std::_Any_data const&) at functional:1780
       67: std::function<void ()>::operator()() const at functional:2162
       68: callable_action_instance_t<std::function<void ()> >::run_action() at callable_action.hpp:28
       69: callable_action_wrapper_t::run() at runtime_utils.cc:43
       70: coro_t::run() at coroutines.cc:199
       71: coro_t* coro_t::spawn_sometime<std::function<void ()> >(std::function<void ()> const&) at coroutines.hpp:58
       72: starter_t::on_thread_switch() at runtime.cc:57
       73: linux_message_hub_t::on_event(int) at message_hub.cc:154
       74: epoll_event_queue_t::run() at epoll.cc:115
       75: linux_thread_pool_t::start_thread(void*) at thread_pool.cc:160
       76: /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f719039ae9a] at 0x7f719039ae9a (/lib/x86_64-linux-gnu/libpthread.so.0)
       77: clone+0x6d at 0x7f71900c73fd (/lib/x86_64-linux-gnu/libc.so.6)
error: Exiting.
== Failed polyglot/joins.rb test with result code 1 (/home/ssd2/larkost/rethinkdb/test/rql_test/src/joins.yaml) Output:
Failed 6 tests
polyglot/joins.rb/31-CPP: Error: Connection closed by server! when comparing #<RethinkDB::Cursor:8241660 (exhausted): r.table("messages").orderby({"index" => "id"}).eq_join(
  "sender_id",
  r.table("senders")
).without({"right" => {"id" => true}}).zip.eq_join(
  "receiver_id",
  r.table("receivers")
).without({"right" => {"id" => true}}).zip> and [{:receiver_id=>1,
  :sender=>"Sender One",
  :receiver=>"Receiver One",
  :msg=>"Message One",
  :sender_id=>1,
  :id=>10},
 {:receiver_id=>1,
  :sender=>"Sender One",
  :receiver=>"Receiver One",
  :msg=>"Message Two",
  :sender_id=>1,
  :id=>20},
 {:receiver_id=>1,
  :sender=>"Sender One",
  :receiver=>"Receiver One",
  :msg=>"Message Three",
  :sender_id=>1,
  :id=>30}]
TEST FAILURE: polyglot/joins.rb/32-CPP
TEST BODY: r.db('test').table_drop('test1')
    VALUE: <RqlRuntimeError "Error: Connection Closed.">
    EXPECTED: {:dropped=>1}


TEST FAILURE: polyglot/joins.rb/33-CPP
TEST BODY: r.db('test').table_drop('test2')
    VALUE: <RqlRuntimeError "Error: Connection Closed.">
    EXPECTED: {:dropped=>1}


TEST FAILURE: polyglot/joins.rb/34-CPP
TEST BODY: r.db('test').table_drop('test3')
    VALUE: <RqlRuntimeError "Error: Connection Closed.">
    EXPECTED: {:dropped=>1}


TEST FAILURE: polyglot/joins.rb/35-CPP
TEST BODY: r.db('test').table_drop('senders')
    VALUE: <RqlRuntimeError "Error: Connection Closed.">
    EXPECTED: {:dropped=>1}


TEST FAILURE: polyglot/joins.rb/36-CPP
TEST BODY: r.db('test').table_drop('messages')
    VALUE: <RqlRuntimeError "Error: Connection Closed.">
    EXPECTED: {:dropped=>1}


TEST FAILURE: polyglot/joins.rb/37-CPP
TEST BODY: r.db('test').table_drop('receivers')
    VALUE: <RqlRuntimeError "Error: Connection Closed.">
    EXPECTED: {:dropped=>1}


Ruby: 26 of 33 tests passed. 7 tests failed.
@Tryneus
Copy link
Member

Tryneus commented Jun 25, 2014

Looking into this.

@Tryneus Tryneus assigned Tryneus and unassigned mlucy Jun 25, 2014
@Tryneus Tryneus added this to the 1.13.x milestone Jun 25, 2014
@mlucy
Copy link
Member

mlucy commented Jun 25, 2014

So, I actually just finished looking into this: git bisect blames b1948d3 .

@Tryneus Tryneus modified the milestones: 1.14, 1.13.x Jun 25, 2014
@mlucy mlucy assigned mlucy and Tryneus and unassigned Tryneus and mlucy Jun 26, 2014
@Tryneus
Copy link
Member

Tryneus commented Jun 26, 2014

Ok, pretty sure we tracked this down. As you can see from the backtrace and the commit linked to by @mlucy, this is tied into using the stream cache and the recent env_t changes by @srh. I would not be surprised if this bug did not already exist and was only exposed by the above changes, but I'm not going to spend the time looking for proof.

This appears to be caused by the concatmap_trans_t containing a pointer to the env_t that is stored at construction. The lifetime of the concatmap_trans_t, however, extends past the original env_t (which existed on the stack in ql::run(...)). Later, when we get more data from the datum stream through the stream cache, a new env_t is constructed on the stack, but nothing exists to update this pointer.

At this point, the concatmap_trans_t will use whatever data exists on the stack wherever the old env_t was, which results in undefined behavior, memory corruption, and crashes.

@srh, this seems tied in with the work you've already done regarding refactoring env_t usage, do you want this?

@srh srh assigned srh and unassigned Tryneus Jun 26, 2014
@srh
Copy link
Contributor

srh commented Jun 26, 2014

All such env_t * fields were eliminated back when env_t was refactored a long time ago for the purpose of concurrent query evaluation, but now they're back. Putting such evaluation-context parameters into fields because it's "helpful" is an intrinsic code smell, @mlucy should take this issue.

@mlucy mlucy assigned mlucy and unassigned srh Jun 26, 2014
@mlucy
Copy link
Member

mlucy commented Jun 26, 2014

I'm can take this; I have nothing else on my plate until the ReQL discussion period is over.

@Tryneus
Copy link
Member

Tryneus commented Jun 26, 2014

I have a fix up in review, actually, just finishing up testing.

@Tryneus
Copy link
Member

Tryneus commented Jun 26, 2014

Fix has been approved and merged to next in commit 30bbbb3. Will be in release 1.14. Review was 1709.

@Tryneus Tryneus closed this as completed Jun 26, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants