New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

longevity 1tb: Reactor stalled for 8000 ms on shard x, nemesises took very long time #4005

Closed
amoskong opened this Issue Dec 20, 2018 · 10 comments

Comments

Projects
None yet
3 participants
@amoskong
Copy link
Contributor

amoskong commented Dec 20, 2018

Installation details
Scylla version (or git commit hash): 3.0.rc2-0.20181213.6b011fbe0
Cluster size: 4
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0441d73b099ec2c23

$ grep -v "compaction -" job.log | grep -i "reactor stall" | grep -v "2000 ms" | grep -v "4000 ms"|wc -l
893
Backtrace:
Dec 14 20:20:14 info   | Reactor stalled for 8000 ms on shard 7.
Dec 14 20:20:14 info   | Backtrace:
Dec 14 20:20:14 info   | 0x00000000006a7692
Dec 14 20:20:14 info   | 0x00000000005bedbc
Dec 14 20:20:14 info   | 0x00000000005bf276
Dec 14 20:20:14 info   | /lib64/libpthread.so.0+0x000000000000f5cf
Dec 14 20:20:14 info   | 0x00000000022d6517
Dec 14 20:20:14 info   | 0x0000000002f3e2cb
Dec 14 20:20:14 info   | 0x00000000019a29e5
Dec 14 20:20:14 info   | 0x000000000183dddc
Dec 14 20:20:14 info   | 0x000000000386f59b
Dec 14 20:20:14 info   | 0x000000000387569f
Dec 14 20:20:14 info   | 0x000000000387f214
Dec 14 20:20:14 info   | 0x00000000037c5363
Dec 14 20:20:14 info   | 0x00000000037fc9f6
Dec 14 20:20:14 info   | 0x00000000037fe5a0
Dec 14 20:20:14 info   | 0x00000000005a016e
Dec 14 20:20:14 info   | 0x00000000005a0331
Dec 14 20:20:14 info   | 0x000000000066e9c7
Dec 14 20:20:14 info   | 0x000000000067313a
Dec 14 20:20:14 info   | 0x00000000007561cd
Dec 14 20:20:14 info   | /lib64/libpthread.so.0+0x0000000000007dd4
Dec 14 20:20:14 info   | /lib64/libc.so.6+0x00000000000fdeac
Dec 14 20:20:14 info   |  [shard 7] stream_session - [Stream #876c09e1-ffdd-11e8-9311-000000000009] Start sending ks=keyspace1, cf=standard1, estimated_partitions=648, with new rpc streaming
Dec 14 20:20:14 err    |  [shard 7] storage_proxy - Exception when communicating with 172.16.99.155: seastar::semaphore_timed_out (Semaphore timedout)
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at /usr/src/debug/scylla-3.0.rc2/seastar/util/backtrace.hh:56
seastar::backtrace_buffer::append_backtrace() at /usr/src/debug/scylla-3.0.rc2/seastar/core/reactor.cc:410
 (inlined by) print_with_backtrace at /usr/src/debug/scylla-3.0.rc2/seastar/core/reactor.cc:431
seastar::reactor::block_notifier(int) at /usr/src/debug/scylla-3.0.rc2/seastar/core/reactor.cc:749
_L_unlock_13 at funlockfile.c:?
utils::murmur_hash::hash3_x64_128(std::experimental::fundamentals_v1::basic_string_view<signed char, std::char_traits<signed char> >, unsigned long, std::array<unsigned long, 2ul>&) at /usr/src/debug/scylla-3.0.rc2/utils/murmur_hash.cc:217
dht::murmur3_partitioner::get_token(std::experimental::fundamentals_v1::basic_string_view<signed char, std::char_traits<signed char> >) at /usr/src/debug/scylla-3.0.rc2/dht/murmur3_partitioner.cc:75
 (inlined by) dht::murmur3_partitioner::get_token(sstables::key_view const&) at /usr/src/debug/scylla-3.0.rc2/dht/murmur3_partitioner.cc:92
int sstables::binary_search<utils::chunked_vector<sstables::summary_entry, 131072ul> >(utils::chunked_vector<sstables::summary_entry, 131072ul> const&, sstables::key const&, dht::token const&) at /usr/src/debug/scylla-3.0.rc2/sstables/binary_search.hh:64
operator() at /usr/src/debug/scylla-3.0.rc2/sstables/sstables.cc:4253
 (inlined by) sstables::sstable::get_sample_indexes_for_range(nonwrapping_range<dht::token> const&) at /usr/src/debug/scylla-3.0.rc2/sstables/sstables.cc:4257
 (inlined by) sstables::sstable::estimated_keys_for_range(nonwrapping_range<dht::token> const&) at /usr/src/debug/scylla-3.0.rc2/sstables/sstables.cc:4290
_ZZN9streaming9send_info19estimate_partitionsEvENKUlmOT_E_clIRKN7seastar13lw_shared_ptrIN8sstables7sstableEEEEEDamS2_ at /usr/src/debug/scylla-3.0.rc2/streaming/stream_transfer_task.cc:112
 (inlined by) _ZSt10accumulateINSt8__detail20_Node_const_iteratorIN7seastar13lw_shared_ptrIN8sstables7sstableEEELb1ELb1EEEmZN9streaming9send_info19estimate_partitionsEvEUlmOT_E_ET0_SA_SA_SD_T1_ at /opt/scylladb/include/c++/7/bits/stl_numeric.h:154
 (inlined by) _ZN5boost10accumulateISt13unordered_setIN7seastar13lw_shared_ptrIN8sstables7sstableEEESt4hashIS6_ESt8equal_toIS6_ESaIS6_EEmZN9streaming9send_info19estimate_partitionsEvEUlmOT_E_EET0_RKSF_SI_T1_ at /opt/scylladb/include/boost/range/numeric.hpp:54
 (inlined by) streaming::send_info::estimate_partitions() at /usr/src/debug/scylla-3.0.rc2/streaming/stream_transfer_task.cc:111
 (inlined by) streaming::send_mutation_fragments(seastar::lw_shared_ptr<streaming::send_info>) at /usr/src/debug/scylla-3.0.rc2/streaming/stream_transfer_task.cc:159
operator() at /usr/src/debug/scylla-3.0.rc2/streaming/stream_transfer_task.cc:213
 (inlined by) operator() at /usr/src/debug/scylla-3.0.rc2/seastar/core/sharded.hh:515
 (inlined by) apply<seastar::sharded<T>::invoke_on_all(Func&&)::<lambda(unsigned int)> [with Func = streaming::stream_transfer_task::execute()::<lambda(database&)>; Service = database]::<lambda()>&> at /usr/src/debug/scylla-3.0.rc2/seastar/core/future.hh:1399
 (inlined by) submit_to<seastar::sharded<T>::invoke_on_all(Func&&)::<lambda(unsigned int)> [with Func = streaming::stream_transfer_task::execute()::<lambda(database&)>; Service = database]::<lambda()> > at /usr/src/debug/scylla-3.0.rc2/seastar/core/reactor.hh:1247
 (inlined by) operator() at /usr/src/debug/scylla-3.0.rc2/seastar/core/sharded.hh:517
 (inlined by) apply<seastar::sharded<T>::invoke_on_all(Func&&) [with Func = streaming::stream_transfer_task::execute()::<lambda(database&)>; Service = database]::<lambda(unsigned int)>, unsigned int> at /usr/src/debug/scylla-3.0.rc2/seastar/core/future.hh:1399
 (inlined by) futurize_apply<seastar::sharded<T>::invoke_on_all(Func&&) [with Func = streaming::stream_transfer_task::execute()::<lambda(database&)>; Service = database]::<lambda(unsigned int)>, unsigned int> at /usr/src/debug/scylla-3.0.rc2/seastar/core/future.hh:1471
 (inlined by) parallel_for_each<boost::range_detail::integer_iterator<unsigned int>, seastar::sharded<T>::invoke_on_all(Func&&) [with Func = streaming::stream_transfer_task::execute()::<lambda(database&)>; Service = database]::<lambda(unsigned int)> > at /usr/src/debug/scylla-3.0.rc2/seastar/core/future-util.hh:129
parallel_for_each<boost::integer_range<unsigned int>, seastar::sharded<T>::invoke_on_all(Func&&) [with Func = streaming::stream_transfer_task::execute()::<lambda(database&)>; Service = database]::<lambda(unsigned int)> > at /usr/src/debug/scylla-3.0.rc2/seastar/core/future-util.hh:175
 (inlined by) invoke_on_all<streaming::stream_transfer_task::execute()::<lambda(database&)> > at /usr/src/debug/scylla-3.0.rc2/seastar/core/sharded.hh:517
 (inlined by) streaming::stream_transfer_task::execute() at /usr/src/debug/scylla-3.0.rc2/streaming/stream_transfer_task.cc:210
operator()<std::pair<const utils::UUID, streaming::stream_transfer_task> > at /usr/src/debug/scylla-3.0.rc2/streaming/stream_session.cc:554
 (inlined by) apply<streaming::stream_session::start_streaming_files()::<lambda(auto:150&)>&, std::pair<const utils::UUID, streaming::stream_transfer_task>&> at /usr/src/debug/scylla-3.0.rc2/seastar/core/future.hh:1356
 (inlined by) apply<streaming::stream_session::start_streaming_files()::<lambda(auto:150&)>&, std::pair<const utils::UUID, streaming::stream_transfer_task>&> at /usr/src/debug/scylla-3.0.rc2/seastar/core/future.hh:1382
 (inlined by) do_for_each<std::_Rb_tree_iterator<std::pair<const utils::UUID, streaming::stream_transfer_task> >, streaming::stream_session::start_streaming_files()::<lambda(auto:150&)> > at /usr/src/debug/scylla-3.0.rc2/seastar/core/future-util.hh:563
streaming::stream_session::start_streaming_files() at /usr/src/debug/scylla-3.0.rc2/streaming/stream_session.cc:552
operator() at /usr/src/debug/scylla-3.0.rc2/streaming/stream_session.cc:396
 (inlined by) apply at /usr/src/debug/scylla-3.0.rc2/seastar/core/apply.hh:35
 (inlined by) apply<streaming::stream_session::on_initialization_complete()::<lambda()> > at /usr/src/debug/scylla-3.0.rc2/seastar/core/apply.hh:43
 (inlined by) apply_tuple<streaming::stream_session::on_initialization_complete()::<lambda()> > at /usr/src/debug/scylla-3.0.rc2/seastar/core/future.hh:1343
 (inlined by) apply<streaming::stream_session::on_initialization_complete()::<lambda()> > at /usr/src/debug/scylla-3.0.rc2/seastar/core/future.hh:1377
 (inlined by) operator()<seastar::future_state<> > at /usr/src/debug/scylla-3.0.rc2/seastar/core/future.hh:963
 (inlined by) run_and_dispose at /usr/src/debug/scylla-3.0.rc2/seastar/core/future.hh:414
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at /usr/src/debug/scylla-3.0.rc2/seastar/core/reactor.cc:2694
seastar::reactor::run_some_tasks() at /usr/src/debug/scylla-3.0.rc2/seastar/core/reactor.cc:3117
seastar::reactor::run_some_tasks() at /usr/src/debug/scylla-3.0.rc2/seastar/util/log.hh:313
 (inlined by) seastar::reactor::run() at /usr/src/debug/scylla-3.0.rc2/seastar/core/reactor.cc:3264
operator() at /usr/src/debug/scylla-3.0.rc2/seastar/core/reactor.cc:4333
std::function<void ()>::operator()() const at /opt/scylladb/include/c++/7/bits/std_function.h:706
 (inlined by) seastar::posix_thread::start_routine(void*) at /usr/src/debug/scylla-3.0.rc2/seastar/core/posix.cc:52
start_thread at pthread_create.c:?
__clone at ??:?

@asias

@asias

This comment has been minimized.

Copy link
Contributor

asias commented Dec 20, 2018

The log shows we have stall in streaming::send_info::estimate_partitions(). We can futurize it.
What nemesises took long time? What operation triggered streaming?

@asias

This comment has been minimized.

Copy link
Contributor

asias commented Dec 20, 2018

Patch on list: [PATCH scylla] streaming: Futurize estimate_partitions

@amoskong

This comment has been minimized.

Copy link
Contributor Author

amoskong commented Dec 20, 2018

  • terminate_and_replace_node nemesis took almost 2 days
2018-12-14 16:56:03,713 nemesis          L0439 INFO | sdcm.nemesis.ChaosMonkey: >>>>>>>>>>>>>Started random_disrupt_method terminate_and_replace_node
2018-12-16 15:15:05,396 nemesis          L0447 INFO | sdcm.nemesis.ChaosMonkey: <<<<<<<<<<<<<Finished random_disrupt_method terminate_and_replace_node
  • destroy_data_then_rebuild took 5 hours
2018-12-16 17:30:32,980 nemesis          L0439 INFO | sdcm.nemesis.ChaosMonkey: >>>>>>>>>>>>>Started random_disrupt_method destroy_data_then_rebuild
2018-12-16 22:29:37,348 nemesis          L0447 INFO | sdcm.nemesis.ChaosMonkey: <<<<<<<<<<<<<Finished random_disrupt_method destroy_data_then_rebuild
  • another terminate_and_replace_node took 6 hours
2018-12-17 04:38:57,088 nemesis          L0439 INFO | sdcm.nemesis.ChaosMonkey: >>>>>>>>>>>>>Started random_disrupt_method terminate_and_replace_node
2018-12-17 10:33:47,230 nemesis          L0447 INFO | sdcm.nemesis.ChaosMonkey: <<<<<<<<<<<<<Finished random_disrupt_method terminate_and_replace_node
```
@amoskong

This comment has been minimized.

Copy link
Contributor Author

amoskong commented Dec 20, 2018

In another older 1tb longevity, the nemesises were very very slow... the job log was lost after renaming.

There are only some streaming error in terminate_and_replace_node nemesis.

1) destroy_data_then_repair
start: 2018-12-05 20:15:28
finish: 2018-12-07 10:49:04
duration: 138815s = 38 hours
Result: PASS

2) terminate_and_replace_node
start: 2018-12-07 11:21:31
finish: 2018-12-09 17:36:57
duration: 195325 s = 54 hours
Result: failed in repair new node. There are some streaming error, the max latency is about 30s

Grafana snapshot:

fireshot capture 1 - grafana - scylla per server metrics ne_ - http___18 209 104 145_3000_dashboar

@slivne

This comment has been minimized.

Copy link
Contributor

slivne commented Dec 20, 2018

@slivne slivne added the bug label Dec 20, 2018

@slivne slivne added this to the 3.0 milestone Dec 20, 2018

@asias

This comment has been minimized.

Copy link
Contributor

asias commented Dec 20, 2018

@amoskong Do we have this issue in 3.0.rc1 ?

@asias

This comment has been minimized.

Copy link
Contributor

asias commented Dec 20, 2018

@amoskong Do you still have the disk IO Grafana dashboard snapshot?

@asias

This comment has been minimized.

Copy link
Contributor

asias commented Dec 21, 2018

@roydahan Do you think this longevity test is a victim of the nvme irq issue?

@amoskong

This comment has been minimized.

Copy link
Contributor Author

amoskong commented Dec 24, 2018

@amoskong

This comment has been minimized.

Copy link
Contributor Author

amoskong commented Dec 24, 2018

avikivity added a commit that referenced this issue Jan 22, 2019

streaming: Futurize estimate_partitions
The loop can take a long time if the number of sstables and/or ranges
are large. To fix, futurize the loop.

Fixes: #4005

Message-Id: <3b05cb84f3f57cc566702142c6365a04b075018e.1545290730.git.asias@scylladb.com>
(cherry picked from commit bcba6b4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment