Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repair_kill_1_test: Assertion `!_stopping' failed in get_rpc_client #4767

Closed
bhalevy opened this issue Jul 29, 2019 · 3 comments

Comments

@bhalevy
Copy link
Contributor

commented Jul 29, 2019

Scylla version f215286
Seen in dtest-release/190/artifact/logs-release.2/1564316195593_repair_additional_test.RepairAdditionalTest.repair_kill_1_test/node1.log:

INFO  2019-07-28 12:16:05,495 [shard 1] repair - Repair 358 out of 513 ranges, id=1, shard=1, keyspace=ks, table={cf}, range=(3391303419384142334, 3418976342153246664]
INFO  2019-07-28 12:16:05,495 [shard 1] repair - Repair 359 out of 513 ranges, id=1, shard=1, keyspace=ks, table={cf}, range=(3418976342153246664, 3458424664894510817]
INFO  2019-07-28 12:16:05,498 [shard 0] storage_service - Stop transport: stop_gossiping done
INFO  2019-07-28 12:16:05,500 [shard 1] repair - Got error in row level repair: seastar::rpc::closed_error (connection is closed)
INFO  2019-07-28 12:16:05,502 [shard 1] repair - Repair 360 out of 513 ranges, id=1, shard=1, keyspace=ks, table={cf}, range=(3458424664894510817, 3469076849114652208]
INFO  2019-07-28 12:16:05,502 [shard 1] repair - Got error in row level repair: seastar::rpc::closed_error (connection is closed)
scylla: message/messaging_service.cc:549: seastar::shared_ptr<netw::messaging_service::rpc_protocol_client_wrapper> netw::messaging_service::get_rpc_client(netw::messaging_verb, netw::messaging_service::msg_addr): Assertion `!_stopping' failed.
Aborting on shard 1.
Backtrace:
  0x0000000002fe3e52
  0x0000000002eeaa95
  0x0000000002eead95
  0x0000000002eeae43
  0x00007f12586a902f
  /jenkins/workspace/scylla-master/dtest-release/scylla-dtest/../scylla/dynamic_libs/libc.so.6+0x000000000003853e
  /jenkins/workspace/scylla-master/dtest-release/scylla-dtest/../scylla/dynamic_libs/libc.so.6+0x0000000000022894
  /jenkins/workspace/scylla-master/dtest-release/scylla-dtest/../scylla/dynamic_libs/libc.so.6+0x0000000000022768
  /jenkins/workspace/scylla-master/dtest-release/scylla-dtest/../scylla/dynamic_libs/libc.so.6+0x00000000000309f5
  0x0000000001f4c43e
  0x0000000001f51ecb
  0x000000000268dbad
  0x000000000277490a
  0x0000000002776dfa
  0x0000000002784889
  0x00000000027876c9
  0x0000000000757d51
$ addr2line -Cfpi -e logs-release.2/scylla 
  0x0000000001f4c43e
  0x0000000001f51ecb
  0x000000000268dbad
  0x000000000277490a
  0x0000000002776dfa
  0x0000000002784889
  0x00000000027876c9
  0x0000000000757d51
netw::messaging_service::get_rpc_client(netw::messaging_verb, netw::msg_addr) at /jenkins/workspace/scylla-master/dtest-release/scylla/message/messaging_service.cc:549 (discriminator 1)
netw::messaging_service::make_sink_and_source_for_repair_get_row_diff_with_rpc_stream(unsigned int, netw::msg_addr) at /jenkins/workspace/scylla-master/dtest-release/scylla/message/messaging_service.cc:715
repair_meta::repair_meta(seastar::sharded<database>&, table&, seastar::lw_shared_ptr<schema const>, nonwrapping_range<dht::token>, row_level_diff_detect_algorithm, unsigned long, unsigned long, seastar::bool_class<repair_master_tag>, unsigned int, shard_config, unsigned long)::{lambda(unsigned int, netw::msg_addr)#2}::operator()(unsigned int, netw::msg_addr) const at /jenkins/workspace/scylla-master/dtest-release/scylla/repair/row_level.cc:675
 (inlined by) std::_Function_handler<seastar::future<seastar::rpc::sink<repair_hash_with_cmd>, seastar::rpc::source<repair_row_on_wire_with_cmd> > (unsigned int, netw::msg_addr), repair_meta::repair_meta(seastar::sharded<database>&, table&, seastar::lw_shared_ptr<schema const>, nonwrapping_range<dht::token>, row_level_diff_detect_algorithm, unsigned long, unsigned long, seastar::bool_class<repair_master_tag>, unsigned int, shard_config, unsigned long)::{lambda(unsigned int, netw::msg_addr)#2}>::_M_invoke(std::_Any_data const&, unsigned int&&, netw::msg_addr&&) at /usr/include/c++/8/bits/std_function.h:283
repair_meta::get_row_diff_with_rpc_stream(std::unordered_set<repair_hash, std::hash<repair_hash>, std::equal_to<repair_hash>, std::allocator<repair_hash> >, seastar::bool_class<needs_all_rows_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, gms::inet_address, unsigned int) at /usr/include/c++/8/bits/std_function.h:687
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release/scylla/repair/row_level.cc:101
 (inlined by) repair_meta::get_row_diff_with_rpc_stream(std::unordered_set<repair_hash, std::hash<repair_hash>, std::equal_to<repair_hash>, std::allocator<repair_hash> >, seastar::bool_class<needs_all_rows_tag>, seastar::bool_class<update_peer_row_hash_sets_tag>, gms::inet_address, unsigned int) at /jenkins/workspace/scylla-master/dtest-release/scylla/repair/row_level.cc:1611
row_level_repair::get_missing_rows_from_follower_nodes(repair_meta&) at /jenkins/workspace/scylla-master/dtest-release/scylla/repair/row_level.cc:2372
row_level_repair::run()::{lambda()#1}::operator()() const at /jenkins/workspace/scylla-master/dtest-release/scylla/repair/row_level.cc:2464
seastar::noncopyable_function<void ()>::direct_vtable_for<seastar::async<row_level_repair::run()::{lambda()#1}>(seastar::thread_attributes, std::decay&&, (std::decay<row_level_repair::run()::{lambda()#1}>::type&&)...)::{lambda(seastar::async<{lambda()#1}>(seastar::futurize<std::result_of<std::decay<auto:1>::type ()>::type>::type, seastar::thread_attributes, std::decay<auto:1>::type&&)::work&)#1}::operator()(seastar::async<{lambda()#1}>(seastar::futurize<std::result_of<std::decay<{lambda()#1}>::type ()>::type>::type, seastar::thread_attributes, std::decay<{lambda()#1}>::type&&)::work)::{lambda()#1}>::call(seastar::noncopyable_function<void ()> const*) at /jenkins/workspace/scylla-master/dtest-release/scylla/seastar/include/seastar/core/apply.hh:35
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release/scylla/seastar/include/seastar/core/apply.hh:43
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release/scylla/seastar/include/seastar/core/future.hh:1367
 (inlined by) ?? at /jenkins/workspace/scylla-master/dtest-release/scylla/seastar/include/seastar/core/future.hh:1401
 (inlined by) seastar::async<row_level_repair::run()::{lambda()#1}>(seastar::thread_attributes, std::decay&&, (std::decay<row_level_repair::run()::{lambda()#1}>::type&&)...)::{lambda(seastar::async<{lambda()#1}>(seastar::futurize<std::result_of<std::decay<auto:1>::type ()>::type>::type, seastar::thread_attributes, std::decay<auto:1>::type&&)::work&)#1}::operator()(seastar::async<{lambda()#1}>(seastar::futurize<std::result_of<std::decay<{lambda()#1}>::type ()>::type>::type, seastar::thread_attributes, std::decay<{lambda()#1}>::type&&)::work)::{lambda()#1}::operator()() const at /jenkins/workspace/scylla-master/dtest-release/scylla/seastar/include/seastar/core/thread.hh:324
 (inlined by) seastar::noncopyable_function<void ()>::direct_vtable_for<seastar::async<row_level_repair::run()::{lambda()#1}>(seastar::thread_attributes, std::decay&&, (std::decay<row_level_repair::run()::{lambda()#1}>::type&&)...)::{lambda(seastar::async<{lambda()#1}>(seastar::futurize<std::result_of<std::decay<auto:1>::type ()>::type>::type, seastar::thread_attributes, std::decay<auto:1>::type&&)::work&)#1}::operator()(seastar::async<{lambda()#1}>(seastar::futurize<std::result_of<std::decay<{lambda()#1}>::type ()>::type>::type, seastar::thread_attributes, std::decay<{lambda()#1}>::type&&)::work)::{lambda()#1}>::call(seastar::noncopyable_function<void ()> const*) at /jenkins/workspace/scylla-master/dtest-release/scylla/seastar/include/seastar/util/noncopyable_function.hh:71
seastar::thread_context::main() at /jenkins/workspace/scylla-master/dtest-release/scylla/seastar/build/release/../../include/seastar/util/noncopyable_function.hh:145
 (inlined by) seastar::thread_context::main() at /jenkins/workspace/scylla-master/dtest-release/scylla/seastar/build/release/../../src/core/thread.cc:317

Coredump: node1-reactor-1.41791.1564316165.core.gz
Binary: scylla

I speculate that this could be related to 44b5878 (Fix possible stalls in row level repair) that was recently introduced.

@bhalevy bhalevy added the dtest label Jul 29, 2019
@asias

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2019

Commit 44b5878 (Fix possible stalls in row level repair) actually enabled the rpc streaming usage in row level repair.

The problem is:

  • message service is shutdown, setting the _stopping flag
  • user of message service calls messaging_service::get_rpc_client which calls assert(!_stopping);
asias added a commit to asias/scylla that referenced this issue Jul 30, 2019
…pc_client

get_rpc_client assumes the messaging_service is not stopped. We should check
is_stopping() before we call get_rpc_client.

We do such check in existing code, e.g., send_message and friends. Do
the same check in the newly introduced
make_sink_and_source_for_stream_mutation_fragments() and friends for row
level repair.

Fixes: scylladb#4767
@asias

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2019

PR sent for this issue: #4772

tgrabiec added a commit that referenced this issue Jul 30, 2019
…pc_client

get_rpc_client assumes the messaging_service is not stopped. We should check
is_stopping() before we call get_rpc_client.

We do such check in existing code, e.g., send_message and friends. Do
the same check in the newly introduced
make_sink_and_source_for_stream_mutation_fragments() and friends for row
level repair.

Fixes: #4767
asias added a commit to asias/scylla that referenced this issue Aug 7, 2019
…pc_client

get_rpc_client assumes the messaging_service is not stopped. We should check
is_stopping() before we call get_rpc_client.

We do such check in existing code, e.g., send_message and friends. Do
the same check in the newly introduced
make_sink_and_source_for_stream_mutation_fragments() and friends for row
level repair.

Fixes: scylladb#4767
(cherry picked from commit 5d3e4d7)

Note: only the change for make_sink_and_source_for_stream_mutation_fragments is backported.
tgrabiec added a commit that referenced this issue Aug 8, 2019
…pc_client

get_rpc_client assumes the messaging_service is not stopped. We should check
is_stopping() before we call get_rpc_client.

We do such check in existing code, e.g., send_message and friends. Do
the same check in the newly introduced
make_sink_and_source_for_stream_mutation_fragments() and friends for row
level repair.

Fixes: #4767
(cherry picked from commit 5d3e4d7)

Note: only the change for make_sink_and_source_for_stream_mutation_fragments is backported.
Message-Id: <06079d4e48ea81ba567a2f45be2ab3a51f042e28.1565189319.git.asias@scylladb.com>
@avikivity

This comment has been minimized.

Copy link
Contributor

commented Aug 13, 2019

Backports already done, removing label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.