Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assertion failure: ./utils/intrusive_btree.hh:695: intrusive_b::tree<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12, 20, intrusive_b::key_search::linear, intrusive_b::with_debug::no>::iterator_base<false>::iterator_base(intrusive_b::tree::iterator_base::key_hook_ptr, intrusive_b::key_index) [Key = rows_entry, Hook = &rows_entry::_link, Compare = rows_entry::tri_compare, NodeSize = 12, LinearThreshold = 20, Search = intrusive_b::key_search::linear, Debug = intrusive_b::with_debug::no, Const = false]: Assertion `h->attached()' failed. #9728

Closed
jmaixl opened this issue Dec 3, 2021 · 14 comments
Assignees
Milestone

Comments

@jmaixl
Copy link

jmaixl commented Dec 3, 2021

Installation details
Scylla version (or git commit hash): 4.5.1-0.20211024.4c0eac049
Cluster size: 3
OS (RHEL/CentOS/Ubuntu/AWS AMI): Centos 7.9.2009

Hardware details (for performance issues)
Platform (physical/VM/cloud instance type/docker): AWS c5ad.4xlarge

Report UUID containing troubleshooting files: fdfa9238-7a6f-4332-8b79-ca7d4f987a13

I ran a write test against a 3 node cluster, inserting 30m rows at 25k rows/second.

The table has 1 partition key column and 3 clustering columns, with each partition containing 10k rows. The table uses LeveledCompactionStrategy.

What happens is that during some runs of this test, 1 or 2 of the nodes will suddenly go down and restart. On the down/restarted node, I see this in the log:

scylla: ./utils/intrusive_btree.hh:695: intrusive_b::tree<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12, 20, intrusive_b::key_search::linear, intrusive_b::with_debug::no>::iterator_base<false>::iterator_base(intrusive_b::tree::iterator_base::key_hook_ptr, intrusive_b::key_index) [Key = rows_entry, Hook = &rows_entry::_link, Compare = rows_entry::tri_compare, NodeSize = 12, LinearThreshold = 20, Search = intrusive_b::key_search::linear, Debug = intrusive_b::with_debug::no, Const = false]: Assertion `h->attached()' failed.
scylla[23605]: Aborting on shard 12.
scylla[23605]: Backtrace:
scylla[23605]: 0x3f75e08
scylla[23605]: 0x3fa7f52
scylla[23605]: 0x7f0e4dcf81df
scylla[23605]: /opt/scylladb/libreloc/libc.so.6+0x3d9d4
scylla[23605]: /opt/scylladb/libreloc/libc.so.6+0x268a3
scylla[23605]: /opt/scylladb/libreloc/libc.so.6+0x26788
scylla[23605]: /opt/scylladb/libreloc/libc.so.6+0x36025
scylla[23605]: 0x1228b10
scylla[23605]: 0x125fd9b
scylla[23605]: 0x12d5aa3
scylla[23605]: 0x12c9954
scylla[23605]: 0x130d136
scylla[23605]: 0x3f88c4f
scylla[23605]: 0x3f89e37
scylla[23605]: 0x3fa8488
scylla[23605]: 0x3f5438a
scylla[23605]: /opt/scylladb/libreloc/libpthread.so.0+0x93f8
scylla[23605]: /opt/scylladb/libreloc/libc.so.6+0x101902
systemd[1]: scylla-server.service: main process exited, code=killed, status=6/ABRT
systemd[1]: Unit scylla-server.service entered failed state.
systemd[1]: scylla-server.service failed.
systemd[1]: scylla-server.service holdoff time over, scheduling restart.
systemd[1]: Stopped Scylla Server.
systemd[1]: Starting Scylla Server...

Any idea of what this means?

@avikivity avikivity changed the title Scylla assertion/performance problem causing nodes to be restarted assertion failure: ./utils/intrusive_btree.hh:695: intrusive_b::tree<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12, 20, intrusive_b::key_search::linear, intrusive_b::with_debug::no>::iterator_base<false>::iterator_base(intrusive_b::tree::iterator_base::key_hook_ptr, intrusive_b::key_index) [Key = rows_entry, Hook = &rows_entry::_link, Compare = rows_entry::tri_compare, NodeSize = 12, LinearThreshold = 20, Search = intrusive_b::key_search::linear, Debug = intrusive_b::with_debug::no, Const = false]: Assertion `h->attached()' failed. Dec 5, 2021
@xemul
Copy link
Contributor

xemul commented Dec 6, 2021

Decoded (hopefully correctly)

void seastar::backtrace<seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::backtrace_buffer::append_backtrace() at ./build/release/seastar/./seastar/src/core/reactor.cc:758
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:788

23605[Backtrace #1]
seastar::print_with_backtrace(char const*, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:800
 (inlined by) seastar::sigabrt_action() at ./build/release/seastar/./seastar/src/core/reactor.cc:3597
 (inlined by) operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:3579
 (inlined by) __invoke at ./build/release/seastar/./seastar/src/core/reactor.cc:3575

23605[Backtrace #2]
?? ??:0

6[Backtrace #3]
?? ??:0

6[Backtrace #4]
?? ??:0

6[Backtrace #5]
?? ??:0

6[Backtrace #6]
?? ??:0

23605[Backtrace #7]
addr2line: DWARF error: could not find variable specification at offset 7811c
iterator_base at ././utils/intrusive_btree.hh:695
 (inlined by) iterator at ././utils/intrusive_btree.hh:878
 (inlined by) rows_entry::on_evicted(cache_tracker&) at ./row_cache.cc:1200

23605[Backtrace #8]
operator() at ./row_cache.cc:80
 (inlined by) decltype(auto) with_allocator<cache_tracker::cache_tracker(mutation_application_stats&)::$_0::operator()() const::{lambda()#1}>(allocation_strategy&, cache_tracker::cache_tracker(mutation_application_stats&)::$_0::operator()() const::{lambda()#1}&&) at ././utils/allocation_strategy.hh:311
 (inlined by) operator() at ./row_cache.cc:65
 (inlined by) seastar::memory::reclaiming_result std::__invoke_impl<seastar::memory::reclaiming_result, cache_tracker::cache_tracker(mutation_application_stats&)::$_0&>(std::__invoke_other, cache_tracker::cache_tracker(mutation_application_stats&)::$_0&) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:60
 (inlined by) std::enable_if<is_invocable_r_v<seastar::memory::reclaiming_result, cache_tracker::cache_tracker(mutation_application_stats&)::$_0&>, seastar::memory::reclaiming_result>::type std::__invoke_r<seastar::memory::reclaiming_result, cache_tracker::cache_tracker(mutation_application_stats&)::$_0&>(cache_tracker::cache_tracker(mutation_application_stats&)::$_0&) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:113
 (inlined by) std::_Function_handler<seastar::memory::reclaiming_result (), cache_tracker::cache_tracker(mutation_application_stats&)::$_0>::_M_invoke(std::_Any_data const&) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_function.h:291

23605[Backtrace #9]
addr2line: DWARF error: could not find variable specification at offset 6a5d9
addr2line: DWARF error: could not find variable specification at offset 6a7b2
std::function<seastar::memory::reclaiming_result ()>::operator()() const at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_function.h:622
 (inlined by) logalloc::region_impl::evict_some() at ./utils/logalloc.cc:1705
 (inlined by) logalloc::reclaim_from_evictable(logalloc::region_impl&, unsigned long, seastar::bool_class<is_preemptible_tag>) at ./utils/logalloc.cc:1926
 (inlined by) logalloc::tracker::impl::compact_and_evict_locked(unsigned long, unsigned long, seastar::bool_class<is_preemptible_tag>) at ./utils/logalloc.cc:2121

23605[Backtrace #10]
logalloc::tracker::impl::reclaim(unsigned long, seastar::bool_class<is_preemptible_tag>) at ./utils/logalloc.cc:2035

23605[Backtrace #11]
seastar::noncopyable_function<void (unsigned long)>::operator()(unsigned long) const at ././seastar/include/seastar/util/noncopyable_function.hh:209
 (inlined by) logalloc::background_reclaimer::main_loop() at ./utils/logalloc.cc:406

23605[Backtrace #12]
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2263
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2672

23605[Backtrace #13]
seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2831

23605[Backtrace #14]
operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4025
 (inlined by) void std::__invoke_impl<void, seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_97&>(std::__invoke_other, seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_97&) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:60
 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_97&>, void>::type std::__invoke_r<void, seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_97&>(seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_97&) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/invoke.h:110
 (inlined by) std::_Function_handler<void (), seastar::smp::configure(boost::program_options::variables_map, seastar::reactor_config)::$_97>::_M_invoke(std::_Any_data const&) at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_function.h:291

23605[Backtrace #15]
std::function<void ()>::operator()() const at /usr/lib/gcc/x86_64-redhat-linux/10/../../../../include/c++/10/bits/std_function.h:622
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:60

0[Backtrace #16]
?? ??:0

6[Backtrace #17]
?? ??:0


@xemul
Copy link
Contributor

xemul commented Dec 6, 2021

IOW -- there's a rows_entry on the LRU, but not in the cache_entry's tree

@jmaixl
Copy link
Author

jmaixl commented Dec 6, 2021

Is this some race condition? It seems odd I ran into this during a pretty simple write test, and ran into it not 100% of the time.

@xemul
Copy link
Contributor

xemul commented Dec 8, 2021

That's really strange. It looks more like mishandling of exception or misordered insertion/removal of a entry wrt yields.

@avikivity
Copy link
Member

@jmaixl can you share the schema? maybe it will give us a clue.

xemul added a commit to xemul/scylla that referenced this issue Dec 8, 2021
The B-tree's insert_before() is throwing operation, its caller
must account for that. When the rows_entry's collection was
switched on B-tree all the risky places were fixed by ee9e104,
but few places went under the radar.

In the cache_flat_mutation_reader there's a place where a C-pointer
is inserted into the tree, thus potentially leaking the entry.

In the partition_snapshot_row_cursor there are two places that not
only leak the entry, but also leave it in the LRU list. The latter
it quite nasty, because those entry can be evicted, eviction code
tries to get rows_entry iterator from "this", but the hook happens
to be unattached (because insertion threw) and fails the assert.

fixes: scylladb#9728

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
tgrabiec added a commit that referenced this issue Dec 8, 2021
There are several places that (still) use throwing b-tree .insert_before()
method and don't manage the inserted object lifetime. Some of those places
also leave the leaked rows_entry on the LRU delaying the assertion failure
by the time those entries get evicted (#9728)

To prevent such surprises in the future, the set removes the non-safe
inserters from the B-tree code. Actually most of this set is that removal
plus preparations for reviewability.

* xemul/br-rows-insertion-exception-safety:
  btree: Earnestly discourage from insertion of plain references
  row-cache: Handle exception (un)safety of rows_entry insertion
  partition_snapshot_row_cursor: Shuffle ensure_result creation
  mutation_partition: Use B-tree insertion sugar
  tests: Make B-tree tests use unique-ptrs for insertion
@jmaixl
Copy link
Author

jmaixl commented Dec 8, 2021

@jmaixl can you share the schema? maybe it will give us a clue.

Here is the schema:

    - create-keyspace: |
        CREATE KEYSPACE IF NOT EXISTS keyspace
        WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'};

    - create-table: |
        CREATE TABLE IF NOT EXISTS keyspace.table (
          a bigint,
          b bigint,
          c timestamp,
          d bigint,
          e boolean,
          f timestamp,
          g boolean,
          h text,
          i int,
          j boolean,
          k timestamp,
          l bigint,
          m bigint,
          n boolean,
          o int,
          p timestamp,
          q int,
          r int,
          s bigint,
          t int,
          u int,
          v bigint,
          PRIMARY KEY (a, b, c, d)
        ) WITH CLUSTERING ORDER BY (b ASC, c DESC, d DESC)
          AND compaction = {'class': 'LeveledCompactionStrategy'};

About 8 non-primary key columns are left unset if that makes a difference.

tgrabiec added a commit that referenced this issue Dec 10, 2021
There are several places that (still) use throwing b-tree .insert_before()
method and don't manage the inserted object lifetime. Some of those places
also leave the leaked rows_entry on the LRU delaying the assertion failure
by the time those entries get evicted (#9728)

To prevent such surprises in the future, the set removes the non-safe
inserters from the B-tree code. Actually most of this set is that removal
plus preparations for reviewability.

* xemul/br-rows-insertion-exception-safety-2:
  btree: Earnestly discourage from insertion of plain references
  row-cache: Handle exception (un)safety of rows_entry insertion
  partition_snapshot_row_cursor: Shuffle ensure_result creation
  mutation_partition: Use B-tree insertion sugar
  tests: Make B-tree tests use unique-ptrs for insertion
@avikivity
Copy link
Member

@xemul / @tgrabiec I'd like to backport this and need help in evaluation.

How risky is the fix compared to the original problem? I'd guess the backport is safe but did not review it in depth.

Should I backport the whole thing, or just selected commits?

How far back does it go?

@xemul
Copy link
Contributor

xemul commented Dec 13, 2021

@avikivity ,

The fix should be safe, it replaces row pointer with unique_ptr<> in some places and that's it.

You only need ee10363, but for it to apply smoothly you may want to have the 9fd8db3 too.

Only 4.5 needs it.

@jmaixl
Copy link
Author

jmaixl commented Dec 13, 2021

Just wondering, how soon should I expect this fix to be backported to 4.5 and released? (or perhaps in general how often do these bugfix releases come out or is there a timeline available somewhere?)

@avikivity
Copy link
Member

About a week.

avikivity pushed a commit that referenced this issue Dec 14, 2021
Both places get the C-pointer on the freshly allocated rows_entry,
insert it where needed and return back the dereferenced pointer.

The C-pointer is going to become smart-pointer that would go out
of scope before return. This change prepares for that by constructing
the ensure_result from the iterator, that's returned from insertion
of the entry.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit 9fd8db3)

Ref #9728
avikivity pushed a commit that referenced this issue Dec 14, 2021
The B-tree's insert_before() is throwing operation, its caller
must account for that. When the rows_entry's collection was
switched on B-tree all the risky places were fixed by ee9e104,
but few places went under the radar.

In the cache_flat_mutation_reader there's a place where a C-pointer
is inserted into the tree, thus potentially leaking the entry.

In the partition_snapshot_row_cursor there are two places that not
only leak the entry, but also leave it in the LRU list. The latter
it quite nasty, because those entry can be evicted, eviction code
tries to get rows_entry iterator from "this", but the hook happens
to be unattached (because insertion threw) and fails the assert.

fixes: #9728

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit ee10363)
avikivity pushed a commit that referenced this issue Dec 14, 2021
Both places get the C-pointer on the freshly allocated rows_entry,
insert it where needed and return back the dereferenced pointer.

The C-pointer is going to become smart-pointer that would go out
of scope before return. This change prepares for that by constructing
the ensure_result from the iterator, that's returned from insertion
of the entry.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit 9fd8db3)

Ref #9728
avikivity pushed a commit that referenced this issue Dec 14, 2021
The B-tree's insert_before() is throwing operation, its caller
must account for that. When the rows_entry's collection was
switched on B-tree all the risky places were fixed by ee9e104,
but few places went under the radar.

In the cache_flat_mutation_reader there's a place where a C-pointer
is inserted into the tree, thus potentially leaking the entry.

In the partition_snapshot_row_cursor there are two places that not
only leak the entry, but also leave it in the LRU list. The latter
it quite nasty, because those entry can be evicted, eviction code
tries to get rows_entry iterator from "this", but the hook happens
to be unattached (because insertion threw) and fails the assert.

fixes: #9728

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit ee10363)
avikivity pushed a commit that referenced this issue Dec 14, 2021
Both places get the C-pointer on the freshly allocated rows_entry,
insert it where needed and return back the dereferenced pointer.

The C-pointer is going to become smart-pointer that would go out
of scope before return. This change prepares for that by constructing
the ensure_result from the iterator, that's returned from insertion
of the entry.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit 9fd8db3)

Ref #9728
avikivity pushed a commit that referenced this issue Dec 14, 2021
The B-tree's insert_before() is throwing operation, its caller
must account for that. When the rows_entry's collection was
switched on B-tree all the risky places were fixed by ee9e104,
but few places went under the radar.

In the cache_flat_mutation_reader there's a place where a C-pointer
is inserted into the tree, thus potentially leaking the entry.

In the partition_snapshot_row_cursor there are two places that not
only leak the entry, but also leave it in the LRU list. The latter
it quite nasty, because those entry can be evicted, eviction code
tries to get rows_entry iterator from "this", but the hook happens
to be unattached (because insertion threw) and fails the assert.

fixes: #9728

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit ee10363)
@avikivity
Copy link
Member

Backported to 4.4, 4.5, 4.6.

@avikivity
Copy link
Member

I un-backported from 4.4 (fails to build). @xemul please do the backport, you can start from 4.5 where I resolved some conflicts.

@slivne slivne added this to the 4.7 milestone Dec 14, 2021
@xemul
Copy link
Contributor

xemul commented Dec 14, 2021

@avikivity , 4.4 doesn't need it.

@avikivity
Copy link
Member

Even better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants