Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

partitioned_sstable_set::insert might stall when called by table::make_reader_v2_excluding_sstables #14244

Closed
bhalevy opened this issue Jun 14, 2023 · 8 comments · Fixed by #14364
Assignees
Milestone

Comments

@bhalevy
Copy link
Member

bhalevy commented Jun 14, 2023

As seen in e.g. #14162,
table::make_reader_v2_excluding_sstables may call partitioned_sstable_set::insert many times,
for all non-excluded sstables, and that might stall in

_leveled_sstables.add({make_interval(*sst), value_set({sst})});

when inserting to boost interval set.

For example, see #14162 (comment)

2023-05-28T13:32:28+00:00 longevity-schemachanges-3h-2023-1-db-node-e42548b2-1     !INFO | scylla[5199]: Reactor stalled for 32 ms on shard 1. Backtrace: 0x524ce63 0x524c250 0x524d630 0x3cb1f 0x140f391 0x1f7bc56 0x3805aa7 0x21e5945 0x21ed1ed 0x21eb362 0x21dfa3a 0x21dd486 0x21c79f9 0x21bed93 0x1a133a6 0x21c6101 0x21d36d7 0x21bed16 0x19a799c 0x1a134ff 0x1944d84 0x19abb2c 0x19ad105 0x369a632 0x364c4e2 0x36ace23 0x36ab3bf 0x36a8919 0x36a7d75 0x54d5ef6
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:797
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:816
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1346
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1123
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1140
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1382
?? ??:0
compound_type<(allow_prefixes)0>::iterator::read_current() at ././compound.hh:159
 (inlined by) iterator at ././compound.hh:176
compound_type<(allow_prefixes)0>::begin(managed_bytes_basic_view<(mutable_view)0>) at ././compound.hh:194
 (inlined by) legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes_basic_view<(mutable_view)0>) const at ././compound_compat.hh:154
 (inlined by) partition_key_view::legacy_tri_compare(schema const&, partition_key_view) const at ./keys.cc:77
partition_key::legacy_tri_compare(schema const&, partition_key const&) const at ././keys.hh:736
 (inlined by) dht::ring_position_tri_compare(schema const&, dht::ring_position_view, dht::ring_position_view) at ./dht/i_partitioner.cc:292
tri_compare(compatible_ring_position_or_view const&, compatible_ring_position_or_view const&) at ././compatible_ring_position.hh:35
 (inlined by) operator<(compatible_ring_position_or_view const&, compatible_ring_position_or_view const&) at ././compatible_ring_position.hh:38
 (inlined by) std::less<compatible_ring_position_or_view>::operator()(compatible_ring_position_or_view const&, compatible_ring_position_or_view const&) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_function.h:408
 (inlined by) boost::enable_if<boost::icl::is_interval<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >, bool>::type boost::icl::domain_equal<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >(boost::icl::interval_traits<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >::domain_type const&, boost::icl::interval_traits<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >::domain_type const&) at /usr/include/boost/icl/concept/interval.hpp:62
 (inlined by) boost::enable_if<boost::icl::is_continuous_interval<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >, bool>::type boost::icl::touches<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >(boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> const&, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> const&) at /usr/include/boost/icl/concept/interval.hpp:955
 (inlined by) bool boost::icl::segmental::joinable<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator> >(boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator> const&, boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator>::iterator&, boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator>::iterator&) at /usr/include/boost/icl/detail/interval_set_algo.hpp:229
 (inlined by) boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator>::iterator boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator> >(boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator>&, boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator>::iterator&) at /usr/include/boost/icl/detail/interval_set_algo.hpp:283
void boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator>::handle_left_combined<boost::icl::inplace_plus<std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > > >(std::_Rb_tree_iterator<std::pair<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> const, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > > >) at /usr/include/boost/icl/interval_map.hpp:171
void boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator>, compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator>::add_segment<boost::icl::inplace_plus<std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > > >(boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> const&, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > const&, std::_Rb_tree_iterator<std::pair<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> const, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > > >&) at /usr/include/boost/icl/interval_base_map.hpp:858
void boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator>, compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator>::add_main<boost::icl::inplace_plus<std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > > >(boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>&, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > const&, std::_Rb_tree_iterator<std::pair<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> const, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > > >&, std::_Rb_tree_iterator<std::pair<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> const, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > > > const&) at /usr/include/boost/icl/interval_base_map.hpp:872
std::_Rb_tree_iterator<std::pair<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> const, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > > > boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator>, compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator>::_add<boost::icl::inplace_plus<std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > > >(std::pair<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > > const&) at /usr/include/boost/icl/interval_base_map.hpp:968
boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator>, compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, boost::icl::partial_absorber, std::less, boost::icl::inplace_plus, boost::icl::inter_section, boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::allocator>::add(std::pair<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less>, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sstable> >, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > > > const&) at /usr/include/boost/icl/interval_base_map.hpp:316
 (inlined by) sstables::partitioned_sstable_set::insert(seastar::lw_shared_ptr<sstables::sstable>) at ./sstables/sstable_set.cc:335
sstables::sstable_set::insert(seastar::lw_shared_ptr<sstables::sstable>) at ./sstables/sstable_set.cc:170
operator() at ./replica/table.cc:2463
 (inlined by) void std::__invoke_impl<void, replica::table::make_reader_v2_excluding_sstables(seastar::lw_shared_ptr<schema const>, reader_permit, std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >&, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr, seastar::bool_class<streamed_mutation::forwarding_tag>, seastar::bool_class<mutation_reader::partition_range_forwarding_tag>) const::$_48&, seastar::lw_shared_ptr<sstables::sstable> const&>(std::__invoke_other, replica::table::make_reader_v2_excluding_sstables(seastar::lw_shared_ptr<schema const>, reader_permit, std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >&, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr, seastar::bool_class<streamed_mutation::forwarding_tag>, seastar::bool_class<mutation_reader::partition_range_forwarding_tag>) const::$_48&, seastar::lw_shared_ptr<sstables::sstable> const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<void, replica::table::make_reader_v2_excluding_sstables(seastar::lw_shared_ptr<schema const>, reader_permit, std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >&, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr, seastar::bool_class<streamed_mutation::forwarding_tag>, seastar::bool_class<mutation_reader::partition_range_forwarding_tag>) const::$_48&, seastar::lw_shared_ptr<sstables::sstable> const&>, void>::type std::__invoke_r<void, replica::table::make_reader_v2_excluding_sstables(seastar::lw_shared_ptr<schema const>, reader_permit, std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >&, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr, seastar::bool_class<streamed_mutation::forwarding_tag>, seastar::bool_class<mutation_reader::partition_range_forwarding_tag>) const::$_48&, seastar::lw_shared_ptr<sstables::sstable> const&>(replica::table::make_reader_v2_excluding_sstables(seastar::lw_shared_ptr<schema const>, reader_permit, std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >&, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr, seastar::bool_class<streamed_mutation::forwarding_tag>, seastar::bool_class<mutation_reader::partition_range_forwarding_tag>) const::$_48&, seastar::lw_shared_ptr<sstables::sstable> const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111
 (inlined by) std::_Function_handler<void (seastar::lw_shared_ptr<sstables::sstable> const&), replica::table::make_reader_v2_excluding_sstables(seastar::lw_shared_ptr<schema const>, reader_permit, std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >&, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr, seastar::bool_class<streamed_mutation::forwarding_tag>, seastar::bool_class<mutation_reader::partition_range_forwarding_tag>) const::$_48>::_M_invoke(std::_Any_data const&, seastar::lw_shared_ptr<sstables::sstable> const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290
std::function<void (seastar::lw_shared_ptr<sstables::sstable> const&)>::operator()(seastar::lw_shared_ptr<sstables::sstable> const&) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) sstables::partitioned_sstable_set::for_each_sstable(std::function<void (seastar::lw_shared_ptr<sstables::sstable> const&)>) const at ./sstables/sstable_set.cc:315
sstables::sstable_set::for_each_sstable(std::function<void (seastar::lw_shared_ptr<sstables::sstable> const&)>) const at ./sstables/sstable_set.cc:165
 (inlined by) sstables::compound_sstable_set::for_each_sstable(std::function<void (seastar::lw_shared_ptr<sstables::sstable> const&)>) const at ./sstables/sstable_set.cc:1012
sstables::sstable_set::for_each_sstable(std::function<void (seastar::lw_shared_ptr<sstables::sstable> const&)>) const at ./sstables/sstable_set.cc:165
replica::table::make_reader_v2_excluding_sstables(seastar::lw_shared_ptr<schema const>, reader_permit, std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >&, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr, seastar::bool_class<streamed_mutation::forwarding_tag>, seastar::bool_class<mutation_reader::partition_range_forwarding_tag>) const at ./replica/table.cc:2459
operator() at ./replica/table.cc:2613
 (inlined by) flat_mutation_reader_v2 std::__invoke_impl<flat_mutation_reader_v2, replica::table::as_mutation_source_excluding(std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >&) const::$_50&, seastar::lw_shared_ptr<schema const>, reader_permit, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr, seastar::bool_class<streamed_mutation::forwarding_tag>, seastar::bool_class<mutation_reader::partition_range_forwarding_tag> >(std::__invoke_other, replica::table::as_mutation_source_excluding(std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >&) const::$_50&, seastar::lw_shared_ptr<schema const>&&, reader_permit&&, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr&&, seastar::bool_class<streamed_mutation::forwarding_tag>&&, seastar::bool_class<mutation_reader::partition_range_forwarding_tag>&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<flat_mutation_reader_v2, replica::table::as_mutation_source_excluding(std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >&) const::$_50&, seastar::lw_shared_ptr<schema const>, reader_permit, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr, seastar::bool_class<streamed_mutation::forwarding_tag>, seastar::bool_class<mutation_reader::partition_range_forwarding_tag> >, flat_mutation_reader_v2>::type std::__invoke_r<flat_mutation_reader_v2, replica::table::as_mutation_source_excluding(std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >&) const::$_50&, seastar::lw_shared_ptr<schema const>, reader_permit, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr, seastar::bool_class<streamed_mutation::forwarding_tag>, seastar::bool_class<mutation_reader::partition_range_forwarding_tag> >(replica::table::as_mutation_source_excluding(std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >&) const::$_50&, seastar::lw_shared_ptr<schema const>&&, reader_permit&&, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr&&, seastar::bool_class<streamed_mutation::forwarding_tag>&&, seastar::bool_class<mutation_reader::partition_range_forwarding_tag>&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:114
 (inlined by) std::_Function_handler<flat_mutation_reader_v2 (seastar::lw_shared_ptr<schema const>, reader_permit, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr, seastar::bool_class<streamed_mutation::forwarding_tag>, seastar::bool_class<mutation_reader::partition_range_forwarding_tag>), replica::table::as_mutation_source_excluding(std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >&) const::$_50>::_M_invoke(std::_Any_data const&, seastar::lw_shared_ptr<schema const>&&, reader_permit&&, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr&&, seastar::bool_class<streamed_mutation::forwarding_tag>&&, seastar::bool_class<mutation_reader::partition_range_forwarding_tag>&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290
std::function<flat_mutation_reader_v2 (seastar::lw_shared_ptr<schema const>, reader_permit, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr, seastar::bool_class<streamed_mutation::forwarding_tag>, seastar::bool_class<mutation_reader::partition_range_forwarding_tag>)>::operator()(seastar::lw_shared_ptr<schema const>, reader_permit, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr, seastar::bool_class<streamed_mutation::forwarding_tag>, seastar::bool_class<mutation_reader::partition_range_forwarding_tag>) const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) mutation_source::make_reader_v2(seastar::lw_shared_ptr<schema const>, reader_permit, nonwrapping_interval<dht::ring_position> const&, query::partition_slice const&, seastar::io_priority_class const&, tracing::trace_state_ptr, seastar::bool_class<streamed_mutation::forwarding_tag>, seastar::bool_class<mutation_reader::partition_range_forwarding_tag>) const at ././readers/mutation_source.hh:163
replica::table::do_push_view_replica_updates(seastar::lw_shared_ptr<schema const>, mutation, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, mutation_source, tracing::trace_state_ptr, reader_concurrency_semaphore&, seastar::io_priority_class const&, enum_set<super_enum<query::partition_slice::option, (query::partition_slice::option)0, (query::partition_slice::option)1, (query::partition_slice::option)2, (query::partition_slice::option)3, (query::partition_slice::option)4, (query::partition_slice::option)5, (query::partition_slice::option)6, (query::partition_slice::option)7, (query::partition_slice::option)8, (query::partition_slice::option)9, (query::partition_slice::option)10, (query::partition_slice::option)11, (query::partition_slice::option)12> >) const at ./replica/table.cc:2573
replica::table::stream_view_replica_updates(seastar::lw_shared_ptr<schema const> const&, mutation&&, std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >, std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >&) const at ./replica/table.cc:2592
operator() at ./db/view/view.cc:2599
 (inlined by) seastar::noncopyable_function<seastar::future<row_locker::lock_holder> (mutation)>::direct_vtable_for<db::view::view_updating_consumer::view_updating_consumer(seastar::lw_shared_ptr<schema const>, reader_permit, replica::table&, std::vector<seastar::lw_shared_ptr<sstables::sstable>, std::allocator<seastar::lw_shared_ptr<sstables::sstable> > >, seastar::abort_source const&, evictable_reader_handle_v2&)::$_31>::call(seastar::noncopyable_function<seastar::future<row_locker::lock_holder> (mutation)> const*, mutation) at ././seastar/include/seastar/util/noncopyable_function.hh:124
seastar::noncopyable_function<seastar::future<row_locker::lock_holder> (mutation)>::operator()(mutation) const at ././seastar/include/seastar/util/noncopyable_function.hh:210
 (inlined by) db::view::view_updating_consumer::do_flush_buffer() at ./db/view/view.cc:2567
db::view::view_updating_consumer::consume_end_of_stream() at ././db/view/view_updating_consumer.hh:125
 (inlined by) auto flat_mutation_reader_v2::impl::consume_in_thread<db::view::view_updating_consumer, flat_mutation_reader_v2::filter>(db::view::view_updating_consumer, flat_mutation_reader_v2::filter) at ././readers/flat_mutation_reader_v2.hh:339
auto flat_mutation_reader_v2::consume_in_thread<db::view::view_updating_consumer, flat_mutation_reader_v2::filter>(db::view::view_updating_consumer, flat_mutation_reader_v2::filter) at ././readers/flat_mutation_reader_v2.hh:477
operator() at ./db/view/view_update_generator.cc:163
 (inlined by) void std::__invoke_impl<void, db::view::view_update_generator::start()::$_0>(std::__invoke_other, db::view::view_update_generator::start()::$_0&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
std::__invoke_result<db::view::view_update_generator::start()::$_0>::type std::__invoke<db::view::view_update_generator::start()::$_0>(db::view::view_update_generator::start()::$_0&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:96
 (inlined by) decltype(auto) std::__apply_impl<db::view::view_update_generator::start()::$_0, std::tuple<>>(db::view::view_update_generator::start()::$_0&&, std::tuple<>&&, std::integer_sequence<unsigned long>) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:1852
 (inlined by) decltype(auto) std::apply<db::view::view_update_generator::start()::$_0, std::tuple<> >(db::view::view_update_generator::start()::$_0&&, std::tuple<>&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/tuple:1863
 (inlined by) seastar::future<void> seastar::futurize<void>::apply<db::view::view_update_generator::start()::$_0>(db::view::view_update_generator::start()::$_0&&, std::tuple<>&&) at ././seastar/include/seastar/core/future.hh:2111
 (inlined by) operator() at ././seastar/include/seastar/core/thread.hh:258
 (inlined by) seastar::noncopyable_function<void ()>::direct_vtable_for<seastar::async<db::view::view_update_generator::start()::$_0>(seastar::thread_attributes, db::view::view_update_generator::start()::$_0&&)::{lambda()#1}>::call(seastar::noncopyable_function<void ()> const*) at ././seastar/include/seastar/util/noncopyable_function.hh:124
seastar::noncopyable_function<void ()>::operator()() const at ./build/release/seastar/./seastar/include/seastar/util/noncopyable_function.hh:210
 (inlined by) seastar::thread_context::main() at ./build/release/seastar/./seastar/src/core/thread.cc:299
?? ??:0
@bhalevy bhalevy self-assigned this Jun 14, 2023
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 14, 2023
Used by table::make_reader_v2_excluding_sstables to
create a sstable_set based on the current one
that includes all non-excluded sstables.

It checks if the majority of the sstables is included
or excluded and picks one of 2 strategies:
either cloning the existing set and erasing the
excluded sstables out of it, if they are the minority,
or making a new set and inserting to it all the non-excluded
sstables if the included sstables are the minority.

We do that especially for partitioned_sstable_set
since inserting to or erasing from its _leveled_sstables interval_map
is expensive and we want to avoid that as much as possible
to prevent reactor stalls.

Refs scylladb#14244

The reason this change doesn't fix the issue entirely is
that it will deal with a typical case where the number of excluded sstables
(that are staging and processed by the view update generator)
is relatively small.

But when we have many of them, say about about half of the
total number of sstables, calling partitioned_sstable_set::insert or erase
enough times without preemption might still stall.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 14, 2023
Used by table::make_reader_v2_excluding_sstables to
create a sstable_set based on the current one
that includes all non-excluded sstables.

It checks if the majority of the sstables is included
or excluded and picks one of 2 strategies:
either cloning the existing set and erasing the
excluded sstables out of it, if they are the minority,
or making a new set and inserting to it all the non-excluded
sstables if the included sstables are the minority.

We do that especially for partitioned_sstable_set
since inserting to or erasing from its _leveled_sstables interval_map
is expensive and we want to avoid that as much as possible
to prevent reactor stalls.

Refs scylladb#14244

The reason this change doesn't fix the issue entirely is
that it will deal with a typical case where the number of excluded sstables
(that are staging and processed by the view update generator)
is relatively small.

But when we have many of them, say about about half of the
total number of sstables, calling partitioned_sstable_set::insert or erase
enough times without preemption might still stall.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 14, 2023
Used by table::make_reader_v2_excluding_sstables to
create a sstable_set based on the current one
that includes all non-excluded sstables.

It checks if the majority of the sstables is included
or excluded and picks one of 2 strategies:
either cloning the existing set and erasing the
excluded sstables out of it, if they are the minority,
or making a new set and inserting to it all the non-excluded
sstables if the included sstables are the minority.

We do that especially for partitioned_sstable_set
since inserting to or erasing from its _leveled_sstables interval_map
is expensive and we want to avoid that as much as possible
to prevent reactor stalls.

Refs scylladb#14244

The reason this change doesn't fix the issue entirely is
that it will deal with a typical case where the number of excluded sstables
(that are staging and processed by the view update generator)
is relatively small.

But when we have many of them, say about about half of the
total number of sstables, calling partitioned_sstable_set::insert or erase
enough times without preemption might still stall.

Refs scylladb#14089

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 15, 2023
Used by table::make_reader_v2_excluding_sstables to
create a sstable_set based on the current one
that includes all non-excluded sstables.

It checks if the majority of the sstables is included
or excluded and picks one of 2 strategies:
either cloning the existing set and erasing the
excluded sstables out of it, if they are the minority,
or making a new set and inserting to it all the non-excluded
sstables if the included sstables are the minority.

We do that especially for partitioned_sstable_set
since inserting to or erasing from its _leveled_sstables interval_map
is expensive and we want to avoid that as much as possible
to prevent reactor stalls.

Refs scylladb#14244

The reason this change doesn't fix the issue entirely is
that it will deal with a typical case where the number of excluded sstables
(that are staging and processed by the view update generator)
is relatively small.

But when we have many of them, say about about half of the
total number of sstables, calling partitioned_sstable_set::insert or erase
enough times without preemption might still stall.

Refs scylladb#14089

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
bhalevy added a commit to bhalevy/scylla that referenced this issue Jun 20, 2023
Used by table::make_reader_v2_excluding_sstables to
create a sstable_set based on the current one
that includes all non-excluded sstables.

It checks if the majority of the sstables is included
or excluded and picks one of 2 strategies:
either cloning the existing set and erasing the
excluded sstables out of it, if they are the minority,
or making a new set and inserting to it all the non-excluded
sstables if the included sstables are the minority.

We do that especially for partitioned_sstable_set
since inserting to or erasing from its _leveled_sstables interval_map
is expensive and we want to avoid that as much as possible
to prevent reactor stalls.

Refs scylladb#14244

The reason this change doesn't fix the issue entirely is
that it will deal with a typical case where the number of excluded sstables
(that are staging and processed by the view update generator)
is relatively small.

But when we have many of them, say about about half of the
total number of sstables, calling partitioned_sstable_set::insert or erase
enough times without preemption might still stall.

Refs scylladb#14089

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
raphaelsc added a commit to raphaelsc/scylla that referenced this issue Jun 25, 2023
View building from staging creates a reader from scratch (memtable
+ sstables - staging) for every partition, in order to calculate
the diff between new staging data and data in base sstable set,
and then pushes the result into the view replicas.

perf shows that the reader creation is very expensive:
+   12.15%    10.75%  reactor-3        scylla             [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes
+   10.01%     9.99%  reactor-3        scylla             [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    8.95%     8.94%  reactor-3        scylla             [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()
+    7.29%     7.28%  reactor-3        scylla             [.] dht::ring_position_tri_compare
+    6.28%     6.27%  reactor-3        scylla             [.] dht::tri_compare
+    4.11%     3.52%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    4.09%     4.07%  reactor-3        scylla             [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state
+    3.46%     0.93%  reactor-3        scylla             [.] sstables::sstable_run::will_introduce_overlapping
+    2.53%     2.53%  reactor-3        libstdc++.so.6     [.] std::_Rb_tree_increment
+    2.45%     2.45%  reactor-3        scylla             [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.14%     2.13%  reactor-3        scylla             [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.07%     2.07%  reactor-3        scylla             [.] logalloc::region_impl::free
+    2.06%     1.91%  reactor-3        scylla             [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()scylladb#1}::operator()() const::{lambda()scylladb#1}::operator()
+    2.04%     2.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    1.87%     0.00%  reactor-3        [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
+    1.86%     0.00%  reactor-3        [kernel.kallsyms]  [k] do_syscall_64
+    1.39%     1.38%  reactor-3        libc.so.6          [.] __memcmp_avx2_movbe
+    1.37%     0.92%  reactor-3        scylla             [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::
+    1.34%     1.33%  reactor-3        scylla             [.] logalloc::region_impl::alloc_small
+    1.33%     1.33%  reactor-3        scylla             [.] seastar::memory::small_pool::add_more_objects
+    1.30%     0.35%  reactor-3        scylla             [.] seastar::reactor::do_run
+    1.29%     1.29%  reactor-3        scylla             [.] seastar::memory::allocate
+    1.19%     0.05%  reactor-3        libc.so.6          [.] syscall
+    1.16%     1.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst
+    1.07%     0.79%  reactor-3        scylla             [.] sstables::partitioned_sstable_set::insert

That shows some significant amount of work for inserting sstables
into the interval map and maintaining the sstable run (which sorts
fragments by first key and checks for overlapping).

The interval map is known for having issues with L0 sstables, as
it will have to be replicated almost to every single interval
stored by the map, causing terrible space and time complexity.
With enough L0 sstables, it can fall into quadratic behavior.

With this patch, view building was made 3x faster.

from
INFO  2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s

to
INFO  2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s

Refs scylladb#14089.
Fixes scylladb#14244.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
raphaelsc added a commit to raphaelsc/scylla that referenced this issue Jun 25, 2023
View building from staging creates a reader from scratch (memtable
+ sstables - staging) for every partition, in order to calculate
the diff between new staging data and data in base sstable set,
and then pushes the result into the view replicas.

perf shows that the reader creation is very expensive:
+   12.15%    10.75%  reactor-3        scylla             [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes
+   10.01%     9.99%  reactor-3        scylla             [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    8.95%     8.94%  reactor-3        scylla             [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()
+    7.29%     7.28%  reactor-3        scylla             [.] dht::ring_position_tri_compare
+    6.28%     6.27%  reactor-3        scylla             [.] dht::tri_compare
+    4.11%     3.52%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    4.09%     4.07%  reactor-3        scylla             [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state
+    3.46%     0.93%  reactor-3        scylla             [.] sstables::sstable_run::will_introduce_overlapping
+    2.53%     2.53%  reactor-3        libstdc++.so.6     [.] std::_Rb_tree_increment
+    2.45%     2.45%  reactor-3        scylla             [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.14%     2.13%  reactor-3        scylla             [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.07%     2.07%  reactor-3        scylla             [.] logalloc::region_impl::free
+    2.06%     1.91%  reactor-3        scylla             [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()scylladb#1}::operator()() const::{lambda()scylladb#1}::operator()
+    2.04%     2.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    1.87%     0.00%  reactor-3        [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
+    1.86%     0.00%  reactor-3        [kernel.kallsyms]  [k] do_syscall_64
+    1.39%     1.38%  reactor-3        libc.so.6          [.] __memcmp_avx2_movbe
+    1.37%     0.92%  reactor-3        scylla             [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::
+    1.34%     1.33%  reactor-3        scylla             [.] logalloc::region_impl::alloc_small
+    1.33%     1.33%  reactor-3        scylla             [.] seastar::memory::small_pool::add_more_objects
+    1.30%     0.35%  reactor-3        scylla             [.] seastar::reactor::do_run
+    1.29%     1.29%  reactor-3        scylla             [.] seastar::memory::allocate
+    1.19%     0.05%  reactor-3        libc.so.6          [.] syscall
+    1.16%     1.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst
+    1.07%     0.79%  reactor-3        scylla             [.] sstables::partitioned_sstable_set::insert

That shows some significant amount of work for inserting sstables
into the interval map and maintaining the sstable run (which sorts
fragments by first key and checks for overlapping).

The interval map is known for having issues with L0 sstables, as
it will have to be replicated almost to every single interval
stored by the map, causing terrible space and time complexity.
With enough L0 sstables, it can fall into quadratic behavior.

With this patch, view building was made 3x faster.

from
INFO  2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s

to
INFO  2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s

Refs scylladb#14089.
Fixes scylladb#14244.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
raphaelsc added a commit to raphaelsc/scylla that referenced this issue Jun 27, 2023
View building from staging creates a reader from scratch (memtable
+ sstables - staging) for every partition, in order to calculate
the diff between new staging data and data in base sstable set,
and then pushes the result into the view replicas.

perf shows that the reader creation is very expensive:
+   12.15%    10.75%  reactor-3        scylla             [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes
+   10.01%     9.99%  reactor-3        scylla             [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    8.95%     8.94%  reactor-3        scylla             [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()
+    7.29%     7.28%  reactor-3        scylla             [.] dht::ring_position_tri_compare
+    6.28%     6.27%  reactor-3        scylla             [.] dht::tri_compare
+    4.11%     3.52%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    4.09%     4.07%  reactor-3        scylla             [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state
+    3.46%     0.93%  reactor-3        scylla             [.] sstables::sstable_run::will_introduce_overlapping
+    2.53%     2.53%  reactor-3        libstdc++.so.6     [.] std::_Rb_tree_increment
+    2.45%     2.45%  reactor-3        scylla             [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.14%     2.13%  reactor-3        scylla             [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.07%     2.07%  reactor-3        scylla             [.] logalloc::region_impl::free
+    2.06%     1.91%  reactor-3        scylla             [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()scylladb#1}::operator()() const::{lambda()scylladb#1}::operator()
+    2.04%     2.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    1.87%     0.00%  reactor-3        [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
+    1.86%     0.00%  reactor-3        [kernel.kallsyms]  [k] do_syscall_64
+    1.39%     1.38%  reactor-3        libc.so.6          [.] __memcmp_avx2_movbe
+    1.37%     0.92%  reactor-3        scylla             [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::
+    1.34%     1.33%  reactor-3        scylla             [.] logalloc::region_impl::alloc_small
+    1.33%     1.33%  reactor-3        scylla             [.] seastar::memory::small_pool::add_more_objects
+    1.30%     0.35%  reactor-3        scylla             [.] seastar::reactor::do_run
+    1.29%     1.29%  reactor-3        scylla             [.] seastar::memory::allocate
+    1.19%     0.05%  reactor-3        libc.so.6          [.] syscall
+    1.16%     1.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst
+    1.07%     0.79%  reactor-3        scylla             [.] sstables::partitioned_sstable_set::insert

That shows some significant amount of work for inserting sstables
into the interval map and maintaining the sstable run (which sorts
fragments by first key and checks for overlapping).

The interval map is known for having issues with L0 sstables, as
it will have to be replicated almost to every single interval
stored by the map, causing terrible space and time complexity.
With enough L0 sstables, it can fall into quadratic behavior.

This overhead is fixed by not building a new fresh sstable set
when recreating the reader, but rather supplying a predicate
to sstable set that will filter out staging sstables when
creating either a single-key or range scan reader.

This could have another benefit over today's approach which
may incorrectly consider a staging sstable as non-staging, and
could happen if the sstable wasn't included in the current
batch for view building.

With this improvement, view building was measured to be 3x faster.

from
INFO  2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s

to
INFO  2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s

Refs scylladb#14089.
Fixes scylladb#14244.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
denesb added a commit that referenced this issue Jun 27, 2023
…g' from Raphael "Raph" Carvalho

View building from staging creates a reader from scratch (memtable
\+ sstables - staging) for every partition, in order to calculate
the diff between new staging data and data in base sstable set,
and then pushes the result into the view replicas.

perf shows that the reader creation is very expensive:
```
+   12.15%    10.75%  reactor-3        scylla             [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes
+   10.01%     9.99%  reactor-3        scylla             [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    8.95%     8.94%  reactor-3        scylla             [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()
+    7.29%     7.28%  reactor-3        scylla             [.] dht::ring_position_tri_compare
+    6.28%     6.27%  reactor-3        scylla             [.] dht::tri_compare
+    4.11%     3.52%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    4.09%     4.07%  reactor-3        scylla             [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state
+    3.46%     0.93%  reactor-3        scylla             [.] sstables::sstable_run::will_introduce_overlapping
+    2.53%     2.53%  reactor-3        libstdc++.so.6     [.] std::_Rb_tree_increment
+    2.45%     2.45%  reactor-3        scylla             [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.14%     2.13%  reactor-3        scylla             [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.07%     2.07%  reactor-3        scylla             [.] logalloc::region_impl::free
+    2.06%     1.91%  reactor-3        scylla             [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()#1 const::{lambda()#1
+    2.04%     2.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    1.87%     0.00%  reactor-3        [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
+    1.86%     0.00%  reactor-3        [kernel.kallsyms]  [k] do_syscall_64
+    1.39%     1.38%  reactor-3        libc.so.6          [.] __memcmp_avx2_movbe
+    1.37%     0.92%  reactor-3        scylla             [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::
+    1.34%     1.33%  reactor-3        scylla             [.] logalloc::region_impl::alloc_small
+    1.33%     1.33%  reactor-3        scylla             [.] seastar::memory::small_pool::add_more_objects
+    1.30%     0.35%  reactor-3        scylla             [.] seastar::reactor::do_run
+    1.29%     1.29%  reactor-3        scylla             [.] seastar::memory::allocate
+    1.19%     0.05%  reactor-3        libc.so.6          [.] syscall
+    1.16%     1.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst
+    1.07%     0.79%  reactor-3        scylla             [.] sstables::partitioned_sstable_set::insert

```
That shows some significant amount of work for inserting sstables
into the interval map and maintaining the sstable run (which sorts
fragments by first key and checks for overlapping).

The interval map is known for having issues with L0 sstables, as
it will have to be replicated almost to every single interval
stored by the map, causing terrible space and time complexity.
With enough L0 sstables, it can fall into quadratic behavior.

This overhead is fixed by not building a new fresh sstable set
when recreating the reader, but rather supplying a predicate
to sstable set that will filter out staging sstables when
creating either a single-key or range scan reader.

This could have another benefit over today's approach which
may incorrectly consider a staging sstable as non-staging, if
the staging sst wasn't included in the current batch for view
building.

With this improvement, view building was measured to be 3x faster.

from
`INFO  2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s`

to
`INFO  2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s`

Refs #14089.
Fixes #14244.

Closes #14364

* github.com:scylladb/scylladb:
  table: Optimize creation of reader excluding staging for view building
  view_update_generator: Dump throughput and duration for view update from staging
  utils: Extract pretty printers into a header
@mykaul
Copy link
Contributor

mykaul commented Jun 28, 2023

@avikivity , @raphaelsc - who's backporting it?

@bhalevy
Copy link
Member Author

bhalevy commented Jun 29, 2023

It should brew in master for a little while to prove stable and not causing any regressions before being backported.
In the past, we usually backported patches only after being released so the bar was much higher.
I'm not sure that the release branches next promotion tests are sufficient to catch regressions caused by backported patches. It would be good to pass 5.3 rc testing cycle before backporting to stable versions.

@denesb
Copy link
Contributor

denesb commented Jun 29, 2023

It always depended on the severity of the bug fixed. Bugs that cause crashes or sever correctness issues were always immediately backported. Performance bugs are not backported immediately if at all. Another factor is the complexity of the fix.

raphaelsc added a commit to raphaelsc/scylla that referenced this issue Jul 2, 2023
View building from staging creates a reader from scratch (memtable
+ sstables - staging) for every partition, in order to calculate
the diff between new staging data and data in base sstable set,
and then pushes the result into the view replicas.

perf shows that the reader creation is very expensive:
+   12.15%    10.75%  reactor-3        scylla             [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes
+   10.01%     9.99%  reactor-3        scylla             [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    8.95%     8.94%  reactor-3        scylla             [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()
+    7.29%     7.28%  reactor-3        scylla             [.] dht::ring_position_tri_compare
+    6.28%     6.27%  reactor-3        scylla             [.] dht::tri_compare
+    4.11%     3.52%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    4.09%     4.07%  reactor-3        scylla             [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state
+    3.46%     0.93%  reactor-3        scylla             [.] sstables::sstable_run::will_introduce_overlapping
+    2.53%     2.53%  reactor-3        libstdc++.so.6     [.] std::_Rb_tree_increment
+    2.45%     2.45%  reactor-3        scylla             [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.14%     2.13%  reactor-3        scylla             [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.07%     2.07%  reactor-3        scylla             [.] logalloc::region_impl::free
+    2.06%     1.91%  reactor-3        scylla             [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()scylladb#1}::operator()() const::{lambda()scylladb#1}::operator()
+    2.04%     2.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    1.87%     0.00%  reactor-3        [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
+    1.86%     0.00%  reactor-3        [kernel.kallsyms]  [k] do_syscall_64
+    1.39%     1.38%  reactor-3        libc.so.6          [.] __memcmp_avx2_movbe
+    1.37%     0.92%  reactor-3        scylla             [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::
+    1.34%     1.33%  reactor-3        scylla             [.] logalloc::region_impl::alloc_small
+    1.33%     1.33%  reactor-3        scylla             [.] seastar::memory::small_pool::add_more_objects
+    1.30%     0.35%  reactor-3        scylla             [.] seastar::reactor::do_run
+    1.29%     1.29%  reactor-3        scylla             [.] seastar::memory::allocate
+    1.19%     0.05%  reactor-3        libc.so.6          [.] syscall
+    1.16%     1.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst
+    1.07%     0.79%  reactor-3        scylla             [.] sstables::partitioned_sstable_set::insert

That shows some significant amount of work for inserting sstables
into the interval map and maintaining the sstable run (which sorts
fragments by first key and checks for overlapping).

The interval map is known for having issues with L0 sstables, as
it will have to be replicated almost to every single interval
stored by the map, causing terrible space and time complexity.
With enough L0 sstables, it can fall into quadratic behavior.

This overhead is fixed by not building a new fresh sstable set
when recreating the reader, but rather supplying a predicate
to sstable set that will filter out staging sstables when
creating either a single-key or range scan reader.

This could have another benefit over today's approach which
may incorrectly consider a staging sstable as non-staging, if
the staging sst wasn't included in the current batch for view
building.

With this improvement, view building was measured to be 3x faster.

from
INFO  2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s

to
INFO  2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s

Refs scylladb#14089.
Fixes scylladb#14244.
Refs scylladb#2975.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
@DoronArazii DoronArazii added this to the 5.4 milestone Jul 4, 2023
nyh pushed a commit that referenced this issue Jul 6, 2023
View building from staging creates a reader from scratch (memtable
+ sstables - staging) for every partition, in order to calculate
the diff between new staging data and data in base sstable set,
and then pushes the result into the view replicas.

perf shows that the reader creation is very expensive:
+   12.15%    10.75%  reactor-3        scylla             [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes
+   10.01%     9.99%  reactor-3        scylla             [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    8.95%     8.94%  reactor-3        scylla             [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()
+    7.29%     7.28%  reactor-3        scylla             [.] dht::ring_position_tri_compare
+    6.28%     6.27%  reactor-3        scylla             [.] dht::tri_compare
+    4.11%     3.52%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    4.09%     4.07%  reactor-3        scylla             [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state
+    3.46%     0.93%  reactor-3        scylla             [.] sstables::sstable_run::will_introduce_overlapping
+    2.53%     2.53%  reactor-3        libstdc++.so.6     [.] std::_Rb_tree_increment
+    2.45%     2.45%  reactor-3        scylla             [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.14%     2.13%  reactor-3        scylla             [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.07%     2.07%  reactor-3        scylla             [.] logalloc::region_impl::free
+    2.06%     1.91%  reactor-3        scylla             [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()#1}::operator()() const::{lambda()#1}::operator()
+    2.04%     2.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    1.87%     0.00%  reactor-3        [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
+    1.86%     0.00%  reactor-3        [kernel.kallsyms]  [k] do_syscall_64
+    1.39%     1.38%  reactor-3        libc.so.6          [.] __memcmp_avx2_movbe
+    1.37%     0.92%  reactor-3        scylla             [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::
+    1.34%     1.33%  reactor-3        scylla             [.] logalloc::region_impl::alloc_small
+    1.33%     1.33%  reactor-3        scylla             [.] seastar::memory::small_pool::add_more_objects
+    1.30%     0.35%  reactor-3        scylla             [.] seastar::reactor::do_run
+    1.29%     1.29%  reactor-3        scylla             [.] seastar::memory::allocate
+    1.19%     0.05%  reactor-3        libc.so.6          [.] syscall
+    1.16%     1.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst
+    1.07%     0.79%  reactor-3        scylla             [.] sstables::partitioned_sstable_set::insert

That shows some significant amount of work for inserting sstables
into the interval map and maintaining the sstable run (which sorts
fragments by first key and checks for overlapping).

The interval map is known for having issues with L0 sstables, as
it will have to be replicated almost to every single interval
stored by the map, causing terrible space and time complexity.
With enough L0 sstables, it can fall into quadratic behavior.

This overhead is fixed by not building a new fresh sstable set
when recreating the reader, but rather supplying a predicate
to sstable set that will filter out staging sstables when
creating either a single-key or range scan reader.

This could have another benefit over today's approach which
may incorrectly consider a staging sstable as non-staging, if
the staging sst wasn't included in the current batch for view
building.

With this improvement, view building was measured to be 3x faster.

from
INFO  2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s

to
INFO  2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s

Refs #14089.
Fixes #14244.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14476
@DoronArazii DoronArazii added Requires-Backport-to-5.3 backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed labels Jul 13, 2023
@DoronArazii
Copy link

I saw only backport to 5.1 - added more labels until someone will check if it's necessary.

@bhalevy
Copy link
Member Author

bhalevy commented Jul 14, 2023

@raphaelsc ^^

raphaelsc added a commit to raphaelsc/scylla that referenced this issue Jul 20, 2023
View building from staging creates a reader from scratch (memtable
+ sstables - staging) for every partition, in order to calculate
the diff between new staging data and data in base sstable set,
and then pushes the result into the view replicas.

perf shows that the reader creation is very expensive:
+   12.15%    10.75%  reactor-3        scylla             [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes
+   10.01%     9.99%  reactor-3        scylla             [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    8.95%     8.94%  reactor-3        scylla             [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()
+    7.29%     7.28%  reactor-3        scylla             [.] dht::ring_position_tri_compare
+    6.28%     6.27%  reactor-3        scylla             [.] dht::tri_compare
+    4.11%     3.52%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    4.09%     4.07%  reactor-3        scylla             [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state
+    3.46%     0.93%  reactor-3        scylla             [.] sstables::sstable_run::will_introduce_overlapping
+    2.53%     2.53%  reactor-3        libstdc++.so.6     [.] std::_Rb_tree_increment
+    2.45%     2.45%  reactor-3        scylla             [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.14%     2.13%  reactor-3        scylla             [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.07%     2.07%  reactor-3        scylla             [.] logalloc::region_impl::free
+    2.06%     1.91%  reactor-3        scylla             [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()scylladb#1}::operator()() const::{lambda()scylladb#1}::operator()
+    2.04%     2.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    1.87%     0.00%  reactor-3        [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
+    1.86%     0.00%  reactor-3        [kernel.kallsyms]  [k] do_syscall_64
+    1.39%     1.38%  reactor-3        libc.so.6          [.] __memcmp_avx2_movbe
+    1.37%     0.92%  reactor-3        scylla             [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::
+    1.34%     1.33%  reactor-3        scylla             [.] logalloc::region_impl::alloc_small
+    1.33%     1.33%  reactor-3        scylla             [.] seastar::memory::small_pool::add_more_objects
+    1.30%     0.35%  reactor-3        scylla             [.] seastar::reactor::do_run
+    1.29%     1.29%  reactor-3        scylla             [.] seastar::memory::allocate
+    1.19%     0.05%  reactor-3        libc.so.6          [.] syscall
+    1.16%     1.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst
+    1.07%     0.79%  reactor-3        scylla             [.] sstables::partitioned_sstable_set::insert

That shows some significant amount of work for inserting sstables
into the interval map and maintaining the sstable run (which sorts
fragments by first key and checks for overlapping).

The interval map is known for having issues with L0 sstables, as
it will have to be replicated almost to every single interval
stored by the map, causing terrible space and time complexity.
With enough L0 sstables, it can fall into quadratic behavior.

This overhead is fixed by not building a new fresh sstable set
when recreating the reader, but rather supplying a predicate
to sstable set that will filter out staging sstables when
creating either a single-key or range scan reader.

This could have another benefit over today's approach which
may incorrectly consider a staging sstable as non-staging, if
the staging sst wasn't included in the current batch for view
building.

With this improvement, view building was measured to be 3x faster.

from
INFO  2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s

to
INFO  2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s

Refs scylladb#14089.
Fixes scylladb#14244.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 1d8cb32)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
denesb pushed a commit that referenced this issue Jul 20, 2023
View building from staging creates a reader from scratch (memtable
+ sstables - staging) for every partition, in order to calculate
the diff between new staging data and data in base sstable set,
and then pushes the result into the view replicas.

perf shows that the reader creation is very expensive:
+   12.15%    10.75%  reactor-3        scylla             [.] lexicographical_tri_compare<compound_type<(allow_prefixes)0>::iterator, compound_type<(allow_prefixes)0>::iterator, legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()(managed_bytes_basic_view<(mutable_view)0>, managed_bytes
+   10.01%     9.99%  reactor-3        scylla             [.] boost::icl::is_empty<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    8.95%     8.94%  reactor-3        scylla             [.] legacy_compound_view<compound_type<(allow_prefixes)0> >::tri_comparator::operator()
+    7.29%     7.28%  reactor-3        scylla             [.] dht::ring_position_tri_compare
+    6.28%     6.27%  reactor-3        scylla             [.] dht::tri_compare
+    4.11%     3.52%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    4.09%     4.07%  reactor-3        scylla             [.] sstables::index_consume_entry_context<sstables::index_consumer>::process_state
+    3.46%     0.93%  reactor-3        scylla             [.] sstables::sstable_run::will_introduce_overlapping
+    2.53%     2.53%  reactor-3        libstdc++.so.6     [.] std::_Rb_tree_increment
+    2.45%     2.45%  reactor-3        scylla             [.] boost::icl::non_empty::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.14%     2.13%  reactor-3        scylla             [.] boost::icl::exclusive_less<boost::icl::continuous_interval<compatible_ring_position_or_view, std::less> >
+    2.07%     2.07%  reactor-3        scylla             [.] logalloc::region_impl::free
+    2.06%     1.91%  reactor-3        scylla             [.] sstables::index_consumer::consume_entry(sstables::parsed_partition_index_entry&&)::{lambda()#1}::operator()() const::{lambda()#1}::operator()
+    2.04%     2.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst+    1.87%     0.00%  reactor-3        [kernel.kallsyms]  [k] entry_SYSCALL_64_after_hwframe
+    1.86%     0.00%  reactor-3        [kernel.kallsyms]  [k] do_syscall_64
+    1.39%     1.38%  reactor-3        libc.so.6          [.] __memcmp_avx2_movbe
+    1.37%     0.92%  reactor-3        scylla             [.] boost::icl::segmental::join_left<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::
+    1.34%     1.33%  reactor-3        scylla             [.] logalloc::region_impl::alloc_small
+    1.33%     1.33%  reactor-3        scylla             [.] seastar::memory::small_pool::add_more_objects
+    1.30%     0.35%  reactor-3        scylla             [.] seastar::reactor::do_run
+    1.29%     1.29%  reactor-3        scylla             [.] seastar::memory::allocate
+    1.19%     0.05%  reactor-3        libc.so.6          [.] syscall
+    1.16%     1.04%  reactor-3        scylla             [.] boost::icl::interval_base_map<boost::icl::interval_map<compatible_ring_position_or_view, std::unordered_set<seastar::lw_shared_ptr<sstables::sstable>, std::hash<seastar::lw_shared_ptr<sstables::sstable> >, std::equal_to<seastar::lw_shared_ptr<sstables::sst
+    1.07%     0.79%  reactor-3        scylla             [.] sstables::partitioned_sstable_set::insert

That shows some significant amount of work for inserting sstables
into the interval map and maintaining the sstable run (which sorts
fragments by first key and checks for overlapping).

The interval map is known for having issues with L0 sstables, as
it will have to be replicated almost to every single interval
stored by the map, causing terrible space and time complexity.
With enough L0 sstables, it can fall into quadratic behavior.

This overhead is fixed by not building a new fresh sstable set
when recreating the reader, but rather supplying a predicate
to sstable set that will filter out staging sstables when
creating either a single-key or range scan reader.

This could have another benefit over today's approach which
may incorrectly consider a staging sstable as non-staging, if
the staging sst wasn't included in the current batch for view
building.

With this improvement, view building was measured to be 3x faster.

from
INFO  2023-06-16 12:36:40,014 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 963957ms = 50kB/s

to
INFO  2023-06-16 14:47:12,129 [shard 0] view_update_generator - Processed keyspace1.standard1: 5 sstables in 319899ms = 150kB/s

Refs #14089.
Fixes #14244.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 1d8cb32)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14764
@denesb
Copy link
Contributor

denesb commented Jul 20, 2023

I saw only backport to 5.1 - added more labels until someone will check if it's necessary.

I don't understand how we have only a 5.1. backport...

In any case I just queued the 5.2 backport.

@denesb denesb removed the backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed label Jul 20, 2023
@DoronArazii
Copy link

@denesb for removing the "backport candidate" label, I guess we will need it also in 5.3

@denesb
Copy link
Contributor

denesb commented Jul 24, 2023

@denesb for removing the "backport candidate" label, I guess we will need it also in 5.3

5.3 is cancelled, removing the label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants