Crash during table fuzzing, Guarantee failed: [last >= first - 1] #4979

Closed
Tryneus opened this Issue Oct 22, 2015 · 7 comments

Comments

Projects
None yet
3 participants
@Tryneus
Member

Tryneus commented Oct 22, 2015

This was running on v2.1.x (a007331), using the invocation: table_fuzzer.py --servers 7 --threads 48 --kill. Unfortunately the core file was overwritten by another crash (Branch history is missing pieces).

error: Error in ../src/clustering/generic/raft_core.tcc at line 1040:
error: Guarantee failed: [last >= first - 1]
error: Backtrace:
2015-10-21T18:49:47.365139506 39.897259s error: Wed Oct 21 18:49:46 2015

1 [0x1bb88ac]: rethinkdb_backtrace(void**, int) at rethinkdb_backtrace.cc:101
2 [0x17dd771]: backtrace_t::backtrace_t() at backtrace.cc:203
3 [0x17ddddf]: lazy_backtrace_formatter_t::lazy_backtrace_formatter_t() at backtrace.cc:283
4 [0x17dd6bf]: format_backtrace(bool) at backtrace.cc:198
5 [0x17d15a2]: report_fatal_error(char const*, int, char const*, ...) at errors.cc:83
6 [0x1d9c36f]: raft_member_t<table_raft_state_t>::apply_log_entries(raft_member_t<table_raft_state_t>::state_and_config_t*, raft_log_t<table_raft_state_t> const&, unsigned long, unsigned long) at raft_core.tcc:1040
7 [0x1da8488]: /home/ssd2/grey/rethinkdb/build/debug/rethinkdb() [0x1da8488] at 0x1da8488 ()
8 [0x1de317f]: std::_Function_handler<bool (raft_member_t<table_raft_state_t>::state_and_config_t*), raft_member_t<table_raft_state_t>::on_append_entries_rpc(raft_rpc_request_t<table_raft_state_t>::append_entries_t const&, raft_rpc_reply_t::append_entries_t*)::{lambda(raft_member_t<table_raft_state_t>::state_and_config_t*)#1}>::_M_invoke(std::_Any_data const&, raft_member_t<table_raft_state_t>::state_and_config_t*) at functional:1765
9 [0x1d9c60f]: std::function<bool (raft_member_t<table_raft_state_t>::state_and_config_t*)>::operator()(raft_member_t<table_raft_state_t>::state_and_config_t*) const at functional:2162
10 [0x1d9b8da]: watchable_variable_t<raft_member_t<table_raft_state_t>::state_and_config_t>::apply_atomic_op(std::function<bool (raft_member_t<table_raft_state_t>::state_and_config_t*)> const&) at watchable.hpp:221
11 [0x1dcaa99]: raft_member_t<table_raft_state_t>::on_append_entries_rpc(raft_rpc_request_t<table_raft_state_t>::append_entries_t const&, raft_rpc_reply_t::append_entries_t*) at raft_core.tcc:677
12 [0x1dbe475]: raft_member_t<table_raft_state_t>::on_rpc(raft_rpc_request_t<table_raft_state_t> const&, raft_rpc_reply_t*) at raft_core.tcc:302
13 [0x1db391c]: raft_networked_member_t<table_raft_state_t>::on_rpc(signal_t*, raft_rpc_request_t<table_raft_state_t> const&, mailbox_addr_t<void (raft_rpc_reply_t)> const&) at raft_network.tcc:81
14 [0x1de49cf]: std::_Mem_fn<void (raft_networked_member_t<table_raft_state_t>::*)(signal_t*, raft_rpc_request_t<table_raft_state_t> const&, mailbox_addr_t<void (raft_rpc_reply_t)> const&)>::operator()(raft_networked_member_t<table_raft_state_t>*, signal_t*, raft_rpc_request_t<table_raft_state_t> const&, mailbox_addr_t<void (raft_rpc_reply_t)> const&) const at functional:551
15 [0x1dd86ea]: void std::_Bind<std::_Mem_fn<void (raft_networked_member_t<table_raft_state_t>::*)(signal_t*, raft_rpc_request_t<table_raft_state_t> const&, mailbox_addr_t<void (raft_rpc_reply_t)> const&)> (raft_networked_member_t<table_raft_state_t>*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>)>::__call<void, signal_t*&&, raft_rpc_request_t<table_raft_state_t>&&, mailbox_addr_t<void (raft_rpc_reply_t)>&&, 0, 1, 2, 3>(std::tuple<signal_t*&&, raft_rpc_request_t<table_raft_state_t>&&, mailbox_addr_t<void (raft_rpc_reply_t)>&&>&&, std::_Index_tuple<0, 1, 2, 3>) at functional:1147
16 [0x1dcbad9]: void std::_Bind<std::_Mem_fn<void (raft_networked_member_t<table_raft_state_t>::*)(signal_t*, raft_rpc_request_t<table_raft_state_t> const&, mailbox_addr_t<void (raft_rpc_reply_t)> const&)> (raft_networked_member_t<table_raft_state_t>*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>)>::operator()<signal_t*, raft_rpc_request_t<table_raft_state_t>, mailbox_addr_t<void (raft_rpc_reply_t)>, void>(signal_t*&&, raft_rpc_request_t<table_raft_state_t>&&, mailbox_addr_t<void (raft_rpc_reply_t)>&&) at functional:1206
17 [0x1dbf3e5]: std::_Function_handler<void (signal_t*, raft_rpc_request_t<table_raft_state_t>, mailbox_addr_t<void (raft_rpc_reply_t)>), std::_Bind<std::_Mem_fn<void (raft_networked_member_t<table_raft_state_t>::*)(signal_t*, raft_rpc_request_t<table_raft_state_t> const&, mailbox_addr_t<void (raft_rpc_reply_t)> const&)> (raft_networked_member_t<table_raft_state_t>*, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>)> >::_M_invoke(std::_Any_data const&, signal_t*, raft_rpc_request_t<table_raft_state_t>, mailbox_addr_t<void (raft_rpc_reply_t)>) at functional:1780
18 [0x1e0c956]: std::function<void (signal_t*, raft_rpc_request_t<table_raft_state_t>, mailbox_addr_t<void (raft_rpc_reply_t)>)>::operator()(signal_t*, raft_rpc_request_t<table_raft_state_t>, mailbox_addr_t<void (raft_rpc_reply_t)>) const at functional:2161
19 [0x1e0a679]: mailbox_t<void (raft_rpc_request_t<table_raft_state_t>, mailbox_addr_t<void (raft_rpc_reply_t)>)>::read_impl_t::read(read_stream_t*, signal_t*) at typed.hpp:245
20 [0x1783ae7]: mailbox_manager_t::mailbox_read_coroutine(connectivity_cluster_t::connection_t*, auto_drainer_t::lock_t, threadnum_t, unsigned long, std::vector<char, std::allocator<char> >*, long, mailbox_manager_t::force_yield_t) at mailbox.cc:277
21 [0x1783957]: /home/ssd2/grey/rethinkdb/build/debug/rethinkdb() [0x1783957] at 0x1783957 ()
22 [0x1784492]: /home/ssd2/grey/rethinkdb/build/debug/rethinkdb() [0x1784492] at 0x1784492 ()
23 [0x15ffa04]: callable_action_wrapper_t::run() at runtime_utils.cc:43
24 [0x1607c0b]: coro_t::run() at coroutines.cc:207

@Tryneus Tryneus added this to the 2.1.x milestone Oct 22, 2015

@Tryneus

This comment has been minimized.

Show comment
Hide comment
@Tryneus

Tryneus Oct 22, 2015

Member

Files are available on newton in ~grey/issue_4979/ (but again, the core file is for a different crash).

Member

Tryneus commented Oct 22, 2015

Files are available on newton in ~grey/issue_4979/ (but again, the core file is for a different crash).

@Tryneus

This comment has been minimized.

Show comment
Hide comment
@Tryneus

Tryneus Oct 23, 2015

Member

I ran into this again (on next this time). Following this crash, further attempts to relaunch the server result in a different crash. Core files suspiciously missing. Data files available on newton in ~grey/issue_4979_2/.

error: Error in ../src/clustering/generic/raft_core.hpp at line 271:
error: Guarantee failed: [index <= get_latest_index()] the log doesn't go forward this far
error: Backtrace:
error: Thu Oct 22 18:09:45 2015

       1 [0x1d22d28]: rethinkdb_backtrace(void**, int) at rethinkdb_backtrace.cc:101
       2 [0x190af01]: backtrace_t::backtrace_t() at backtrace.cc:203
       3 [0x190b56f]: lazy_backtrace_formatter_t::lazy_backtrace_formatter_t() at backtrace.cc:283
       4 [0x190ae4f]: format_backtrace(bool) at backtrace.cc:198
       5 [0x18fed0e]: report_fatal_error(char const*, int, char const*, ...) at errors.cc:83
       6 [0x1e38ded]: raft_log_t<table_raft_state_t>::get_entry_ref(unsigned long) const at raft_core.hpp:271
       7 [0x1f216e3]: raft_member_t<table_raft_state_t>::apply_log_entries(raft_member_t<table_raft_state_t>::state_and_config_t*, raft_log_t<table_raft_state_t> const&, unsigned long, unsigned long) at raft_core.tcc:1043
       8 [0x1f2b044]: /home/grey/rethinkdb/build/debug/rethinkdb() [0x1f2b044] at 0x1f2b044 ()
       9 [0x1f5250b]: std::_Function_handler<bool (raft_member_t<table_raft_state_t>::state_and_config_t*), raft_member_t<table_raft_state_t>::state_and_config_t(raft_member_id_t const&, raft_storage_interface_t<table_raft_state_t>*, raft_network_interface_t<table_raft_state_t>*, std::string const&)::{lambda(raft_member_t<table_raft_state_t>::state_and_config_t*)#3}>::_M_invoke(std::_Any_data const&, raft_member_t<table_raft_state_t>::state_and_config_t*) at functional:1765
       10 [0x1f21937]: std::function<bool (raft_member_t<table_raft_state_t>::state_and_config_t*)>::operator()(raft_member_t<table_raft_state_t>::state_and_config_t*) const at functional:2162
       11 [0x1f20c02]: watchable_variable_t<raft_member_t<table_raft_state_t>::state_and_config_t>::apply_atomic_op(std::function<bool (raft_member_t<table_raft_state_t>::state_and_config_t*)> const&) at watchable.hpp:221
       12 [0x1f39928]: raft_member_t<table_raft_state_t>::raft_member_t(raft_member_id_t const&, raft_storage_interface_t<table_raft_state_t>*, raft_network_interface_t<table_raft_state_t>*, std::string const&) at raft_core.tcc:108
       13 [0x1f3419f]: raft_networked_member_t<table_raft_state_t>::raft_networked_member_t(raft_member_id_t const&, mailbox_manager_t*, watchable_map_t<raft_member_id_t, raft_business_card_t<table_raft_state_t> >*, raft_storage_interface_t<table_raft_state_t>*, std::string const&) at raft_network.tcc:24
       14 [0x1f27518]: table_manager_t::table_manager_t(uuid_u const&, mailbox_manager_t*, server_config_client_t*, watchable_map_t<std::pair<peer_id_t, uuid_u>, table_manager_bcard_t>*, backfill_throttler_t*, watchable_map_t<std::pair<uuid_u, uuid_u>, empty_value_t>*, base_path_t const&, io_backender_t*, uuid_u const&, multi_table_manager_timestamp_t::epoch_t const&, raft_member_id_t const&, raft_storage_interface_t<table_raft_state_t>*, multistore_ptr_t*, perfmon_collection_t*) at table_manager.cc:67
       15 [0x1fcfb94]: multi_table_manager_t::active_table_t::active_table_t(multi_table_manager_t*, multi_table_manager_t::table_t*, uuid_u const&, multi_table_manager_timestamp_t::epoch_t const&, raft_member_id_t const&, raft_storage_interface_t<table_raft_state_t>*, multistore_ptr_t*, perfmon_collection_t*) at multi_table_manager.cc:136
       16 [0x1fd7289]: scoped_ptr_t<multi_table_manager_t::active_table_t> make_scoped<multi_table_manager_t::active_table_t, multi_table_manager_t*, multi_table_manager_t::table_t*&, uuid_u const&, multi_table_manager_timestamp_t::epoch_t const&, raft_member_id_t const&, raft_storage_interface_t<table_raft_state_t>*&, multistore_ptr_t*, perfmon_collection_t*>(multi_table_manager_t*&&, multi_table_manager_t::table_t*&, uuid_u const&, multi_table_manager_timestamp_t::epoch_t const&, raft_member_id_t const&, raft_storage_interface_t<table_raft_state_t>*&, multistore_ptr_t*&&, perfmon_collection_t*&&) at scoped.hpp:122
       17 [0x1fcec58]: /home/grey/rethinkdb/build/debug/rethinkdb() [0x1fcec58] at 0x1fcec58 ()
       18 [0x1fd3d17]: /home/grey/rethinkdb/build/debug/rethinkdb() [0x1fd3d17] at 0x1fd3d17 ()
       19 [0x1e44de1]: std::function<void (uuid_u const&, table_active_persistent_state_t const&, raft_storage_interface_t<table_raft_state_t>*, metadata_file_t::read_txn_t*)>::operator()(uuid_u const&, table_active_persistent_state_t const&, raft_storage_interface_t<table_raft_state_t>*, metadata_file_t::read_txn_t*) const at functional:2162
       20 [0x1e41644]: real_table_persistence_interface_t::read_all_metadata(std::function<void (uuid_u const&, table_active_persistent_state_t const&, raft_storage_interface_t<table_raft_state_t>*, metadata_file_t::read_txn_t*)> const&, std::function<void (uuid_u const&, table_inactive_persistent_state_t const&, metadata_file_t::read_txn_t*)> const&, signal_t*) at table_interface.cc:203
       21 [0x1fcf173]: multi_table_manager_t::multi_table_manager_t(uuid_u const&, mailbox_manager_t*, server_config_client_t*, watchable_map_t<peer_id_t, multi_table_manager_bcard_t>*, watchable_map_t<std::pair<peer_id_t, uuid_u>, table_manager_bcard_t>*, watchable_map_t<std::pair<uuid_u, uuid_u>, empty_value_t>*, table_persistence_interface_t*, base_path_t const&, io_backender_t*, perfmon_collection_repo_t*) at multi_table_manager.cc:66
       22 [0x1d4a1ce]: do_serve(io_backender_t*, bool, base_path_t const&, metadata_file_t*, serve_info_t const&, os_signal_cond_t*) at serve.cc:295
Member

Tryneus commented Oct 23, 2015

I ran into this again (on next this time). Following this crash, further attempts to relaunch the server result in a different crash. Core files suspiciously missing. Data files available on newton in ~grey/issue_4979_2/.

error: Error in ../src/clustering/generic/raft_core.hpp at line 271:
error: Guarantee failed: [index <= get_latest_index()] the log doesn't go forward this far
error: Backtrace:
error: Thu Oct 22 18:09:45 2015

       1 [0x1d22d28]: rethinkdb_backtrace(void**, int) at rethinkdb_backtrace.cc:101
       2 [0x190af01]: backtrace_t::backtrace_t() at backtrace.cc:203
       3 [0x190b56f]: lazy_backtrace_formatter_t::lazy_backtrace_formatter_t() at backtrace.cc:283
       4 [0x190ae4f]: format_backtrace(bool) at backtrace.cc:198
       5 [0x18fed0e]: report_fatal_error(char const*, int, char const*, ...) at errors.cc:83
       6 [0x1e38ded]: raft_log_t<table_raft_state_t>::get_entry_ref(unsigned long) const at raft_core.hpp:271
       7 [0x1f216e3]: raft_member_t<table_raft_state_t>::apply_log_entries(raft_member_t<table_raft_state_t>::state_and_config_t*, raft_log_t<table_raft_state_t> const&, unsigned long, unsigned long) at raft_core.tcc:1043
       8 [0x1f2b044]: /home/grey/rethinkdb/build/debug/rethinkdb() [0x1f2b044] at 0x1f2b044 ()
       9 [0x1f5250b]: std::_Function_handler<bool (raft_member_t<table_raft_state_t>::state_and_config_t*), raft_member_t<table_raft_state_t>::state_and_config_t(raft_member_id_t const&, raft_storage_interface_t<table_raft_state_t>*, raft_network_interface_t<table_raft_state_t>*, std::string const&)::{lambda(raft_member_t<table_raft_state_t>::state_and_config_t*)#3}>::_M_invoke(std::_Any_data const&, raft_member_t<table_raft_state_t>::state_and_config_t*) at functional:1765
       10 [0x1f21937]: std::function<bool (raft_member_t<table_raft_state_t>::state_and_config_t*)>::operator()(raft_member_t<table_raft_state_t>::state_and_config_t*) const at functional:2162
       11 [0x1f20c02]: watchable_variable_t<raft_member_t<table_raft_state_t>::state_and_config_t>::apply_atomic_op(std::function<bool (raft_member_t<table_raft_state_t>::state_and_config_t*)> const&) at watchable.hpp:221
       12 [0x1f39928]: raft_member_t<table_raft_state_t>::raft_member_t(raft_member_id_t const&, raft_storage_interface_t<table_raft_state_t>*, raft_network_interface_t<table_raft_state_t>*, std::string const&) at raft_core.tcc:108
       13 [0x1f3419f]: raft_networked_member_t<table_raft_state_t>::raft_networked_member_t(raft_member_id_t const&, mailbox_manager_t*, watchable_map_t<raft_member_id_t, raft_business_card_t<table_raft_state_t> >*, raft_storage_interface_t<table_raft_state_t>*, std::string const&) at raft_network.tcc:24
       14 [0x1f27518]: table_manager_t::table_manager_t(uuid_u const&, mailbox_manager_t*, server_config_client_t*, watchable_map_t<std::pair<peer_id_t, uuid_u>, table_manager_bcard_t>*, backfill_throttler_t*, watchable_map_t<std::pair<uuid_u, uuid_u>, empty_value_t>*, base_path_t const&, io_backender_t*, uuid_u const&, multi_table_manager_timestamp_t::epoch_t const&, raft_member_id_t const&, raft_storage_interface_t<table_raft_state_t>*, multistore_ptr_t*, perfmon_collection_t*) at table_manager.cc:67
       15 [0x1fcfb94]: multi_table_manager_t::active_table_t::active_table_t(multi_table_manager_t*, multi_table_manager_t::table_t*, uuid_u const&, multi_table_manager_timestamp_t::epoch_t const&, raft_member_id_t const&, raft_storage_interface_t<table_raft_state_t>*, multistore_ptr_t*, perfmon_collection_t*) at multi_table_manager.cc:136
       16 [0x1fd7289]: scoped_ptr_t<multi_table_manager_t::active_table_t> make_scoped<multi_table_manager_t::active_table_t, multi_table_manager_t*, multi_table_manager_t::table_t*&, uuid_u const&, multi_table_manager_timestamp_t::epoch_t const&, raft_member_id_t const&, raft_storage_interface_t<table_raft_state_t>*&, multistore_ptr_t*, perfmon_collection_t*>(multi_table_manager_t*&&, multi_table_manager_t::table_t*&, uuid_u const&, multi_table_manager_timestamp_t::epoch_t const&, raft_member_id_t const&, raft_storage_interface_t<table_raft_state_t>*&, multistore_ptr_t*&&, perfmon_collection_t*&&) at scoped.hpp:122
       17 [0x1fcec58]: /home/grey/rethinkdb/build/debug/rethinkdb() [0x1fcec58] at 0x1fcec58 ()
       18 [0x1fd3d17]: /home/grey/rethinkdb/build/debug/rethinkdb() [0x1fd3d17] at 0x1fd3d17 ()
       19 [0x1e44de1]: std::function<void (uuid_u const&, table_active_persistent_state_t const&, raft_storage_interface_t<table_raft_state_t>*, metadata_file_t::read_txn_t*)>::operator()(uuid_u const&, table_active_persistent_state_t const&, raft_storage_interface_t<table_raft_state_t>*, metadata_file_t::read_txn_t*) const at functional:2162
       20 [0x1e41644]: real_table_persistence_interface_t::read_all_metadata(std::function<void (uuid_u const&, table_active_persistent_state_t const&, raft_storage_interface_t<table_raft_state_t>*, metadata_file_t::read_txn_t*)> const&, std::function<void (uuid_u const&, table_inactive_persistent_state_t const&, metadata_file_t::read_txn_t*)> const&, signal_t*) at table_interface.cc:203
       21 [0x1fcf173]: multi_table_manager_t::multi_table_manager_t(uuid_u const&, mailbox_manager_t*, server_config_client_t*, watchable_map_t<peer_id_t, multi_table_manager_bcard_t>*, watchable_map_t<std::pair<peer_id_t, uuid_u>, table_manager_bcard_t>*, watchable_map_t<std::pair<uuid_u, uuid_u>, empty_value_t>*, table_persistence_interface_t*, base_path_t const&, io_backender_t*, perfmon_collection_repo_t*) at multi_table_manager.cc:66
       22 [0x1d4a1ce]: do_serve(io_backender_t*, bool, base_path_t const&, metadata_file_t*, serve_info_t const&, os_signal_cond_t*) at serve.cc:295
@VeXocide

This comment has been minimized.

Show comment
Hide comment
@VeXocide

VeXocide Oct 26, 2015

Member

In CR 3299 by @danielmewes.

Member

VeXocide commented Oct 26, 2015

In CR 3299 by @danielmewes.

@VeXocide

This comment has been minimized.

Show comment
Hide comment
@VeXocide

VeXocide Oct 29, 2015

Member

Merged into next and v2.1.x via commits 9f12645 and 944b402 respectively.

Member

VeXocide commented Oct 29, 2015

Merged into next and v2.1.x via commits 9f12645 and 944b402 respectively.

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Nov 7, 2015

Member

Unfortunately I just saw this again on next while running table_fuzzer.py --servers 7 --threads 48 --changefeeds --kill

Member

danielmewes commented Nov 7, 2015

Unfortunately I just saw this again on next while running table_fuzzer.py --servers 7 --threads 48 --changefeeds --kill

@danielmewes danielmewes reopened this Nov 7, 2015

@danielmewes danielmewes modified the milestones: 2.2-polish, 2.2, 2.2.x Nov 9, 2015

@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Jan 21, 2016

Member

@aphyr confirmed that the [index <= get_latest_index()] crash can still happen on RethinkDB 2.2.3.

We need to figure out what's going on.

Member

danielmewes commented Jan 21, 2016

@aphyr confirmed that the [index <= get_latest_index()] crash can still happen on RethinkDB 2.2.3.

We need to figure out what's going on.

@danielmewes danielmewes added the pr:high label Jan 21, 2016

danielmewes added a commit that referenced this issue Jan 28, 2016

Fixes bugs in the clustering layer.
This also adds some new comments and guarantees to the Raft code,
and makes a few thinks more obviously correct.

Fixes #5289.
Fixes #4979.
Fixes #4949.
Fixes #4866. (probably)

danielmewes added a commit that referenced this issue Jan 28, 2016

Fixes bugs in the clustering layer.
This also adds some new comments and guarantees to the Raft code,
and makes a few thinks more obviously correct.

Fixes #5289.
Fixes #4979.
Fixes #4949.
Fixes #4866. (probably)
@danielmewes

This comment has been minimized.

Show comment
Hide comment
@danielmewes

danielmewes Jan 28, 2016

Member

This got fixed by the fix for #5289 (comment)

Member

danielmewes commented Jan 28, 2016

This got fixed by the fix for #5289 (comment)

@danielmewes danielmewes modified the milestones: 2.2.x, 2.2.4 Jan 29, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment