Schema of partition entries in cache and memtable is updated atomically, causing high latency and possible OOM on large partitions #2577

tgrabiec · 2017-07-12T08:19:11Z

After schema change, entries in memtables and cache are still using the old schema. They're upgraded lazily (but atomically) on access. Schema is currently tracked per-partition. This may badly impact latency if partitions are large.

avikivity · 2020-04-22T08:58:15Z

@tgrabiec can you sketch a solution here? since @xemul is doing some cache work, we may trick him into fixing it

slivne · 2020-05-10T16:11:05Z

@tgrabiec ping

avikivity · 2020-05-11T11:34:55Z

Adding keywords for issue search: upgrade upgrade_entry

avikivity · 2020-05-11T11:36:22Z

#6425 outlines a simpler fix that has narrower scope (still this issue shoud be fixed)

tgrabiec · 2020-05-12T10:38:06Z

Currently, each partition entry has a chain of incremental versions (mutations). New versions are created when we try to add mutation on top of a snapshot. Snapshots are pointers to versions. When a snapshot is released, we try to merge versions on the spot. If merging is preempted, it will move to background. See mutation_cleaner::merge_and_destroy(partition_snapshot&). Version merging happens in partition_snapshot::merge_partition_versions().

We could reuse this mechanism for upgrading schema. We'd maintain an invariant that each partition version conforms to a single schema version. We will store schema_ptr per partition version rather than per partition entry. Version merging would check schemas and change behavior when they don't match. If schemas match we use the old behavior of merging newer to older. When schemas are different, we assume that the newer version has a newer schema and merge older to newer, upgrading the schema on-the-fly.

Snapshot readers (partition_snapshot_row_cursor) would have to be adjusted to take into account that versions may have different schemas and upgrade to snapshot's schema on-the-fly.
Applying a mutation to the data store (memtable or cache) would work like this. When the incoming mutation has the same schema as the current version, we apply the same way as today. If schemas are different, we create a snapshot and put the incoming mutation into a new version. Then we drop the snapshot. This will trigger version merging. It will complete on-the-spot for small partitions or will defer to mutation_cleaner.

When a read sees that the latest version is not at the data source's current schema, it should force the upgrade by creating an empty version conforming to the data source's current schema and let the same version merging mechanism do the upgrade. This way readers will not keep upgrading on-the-fly indefinitely.

tgrabiec · 2020-05-12T10:55:23Z

The above solution means that we will pay the cost of moving row entries between row trees during schema upgrade. Maybe that's acceptable. If we wanted to avoid that, we could improve the solution to allow a single partition version to be divided into two regions corresponding to the old and the new schema. We'd have:

class partition_version {
   position_in_partition _schema_pos;
   schema_ptr _schema; // Applies for entries with position() < _schema_pos or when !_old_schema
   schema_ptr _old_schema; // Applies for entries with position() >= _schema_pos
};

partition_snapshot_row_cursor would have to be told to recognize this.

Then version merging would first upgrade the old version to the new version, and then merge the two versions like it does today.

kostja · 2020-05-16T20:01:05Z

Why not simply trigger a dump after a schema change?

avikivity · 2020-05-16T20:06:25Z

A dump of what?

kostja · 2020-05-16T20:15:59Z

Of the memtable with old schema_ptr.

mykaul · 2022-12-08T09:01:24Z

Progress update: I have a working1 patch at https://github.com/michoecho/scylla/commits/gentle_schema_upgrade, but it's not prepared for review yet. It also has multiple local improvement opportunities (e.g. naming, using a move instead of a copy, etc.), and it needs more tests, but it's generally shaped up.

I'll probably wait with finalizing it until mutation_partition_v2 is done, as @tgrabiec said, because some conflicts are bound to occur, e.g. in the version merging algorithm.

Footnotes

Doing the desired thing and passing existing tests; I don't claim it to be correct yet. leftwards_arrow_with_hook

Somewhat optimistically setting it to 5.3 (assuming mutation partition will get in to either 5.2 or 5.3).

Preceding commits in this patch series have extended the MVCC mechanism to allow for versions with different schemas in the same entry/snapshot, with on-the-fly and background schema upgrades to the most recent version in the chain. Given that, we can perform gentle schema upgrades by simply adding an empty version with the target schema to the front of the entry. This patch is intended to be the first and only behaviour-changing patch in the series. Previous patches added code paths for multi-schema snapshots, but never exercised them, because before this patch two different schemas within a single MVCC chain never happened. This patch makes it happen and thus exercises all the code in the series up until now. Fixes scylladb#2577

After a schema change, memtable and cache have to be upgraded to the new schema. Currently, they are upgraded (on the first access after a schema change) atomically, i.e. all rows of the entry are upgraded with one non-preemptible call. This is a one of the last vestiges of the times when partition were treated atomically, and it is a well known source of numerous large stalls. This series makes schema upgrades gentle (preemptible). This is done by co-opting the existing MVCC machinery. Before the series, all partition_versions in the partition_entry chain have the same schema, and an entry upgrade replaces the entire chain with a single squashed and upgraded version. After the series, each partition_version has its own schema. A partition entry upgrade happens simply by adding an empty version with the new schema to the head of the chain. Row entries are upgraded to the current schema on-the-fly by the cursor during reads, and by the MVCC version merge ongoing in the background after the upgrade. The series: 1. Does some code cleanup in the mutation_partition area. 2. Adds a schema field to partition_version and removes it from its containers (partition_snapshot, cache_entry, memtable_entry). 3. Adds upgrading variants of constructors and apply() for `row` and its wrappers. 4. Prepares partition_snapshot_row_cursor, mutation_partition_v2::apply_monotonically and partition_snapshot::merge_partition_versions for dealing with heterogeneous version chains. 5. Modifies partition_entry::upgrade to perform upgrades by extending the version chain with a new schema instead of squashing it to a single upgraded version. Fixes #2577 Closes #13761 * github.com:scylladb/scylladb: test: mvcc_test: add a test for gentle schema upgrades partition_version: make partition_entry::upgrade() gentle partition_version: handle multi-schema snapshots in merge_partition_versions mutation_partition_v2: handle schema upgrades in apply_monotonically() partition_version: remove the unused "from" argument in partition_entry::upgrade() row_cache_test: prepare test_eviction_after_schema_change for gentle schema upgrades partition_version: handle multi-schema entries in partition_entry::squashed partition_snapshot_row_cursor: handle multi-schema snapshots partiton_version: prepare partition_snapshot::squashed() for multi-schema snapshots partition_version: prepare partition_snapshot::static_row() for multi-schema snapshots partition_version: add a logalloc::region argument to partition_entry::upgrade() memtable: propagate the region to memtable_entry::upgrade_schema() mutation_partition: add an upgrading variant of lazy_row::apply() mutation_partition: add an upgrading variant of rows_entry::rows_entry mutation_partition: switch an apply() call to apply_monotonically() mutation_partition: add an upgrading variant of rows_entry::apply_monotonically() mutation_fragment: add an upgrading variant of clustering_row::apply() mutation_partition: add an upgrading variant of row::row partition_version: remove _schema from partition_entry::operator<< partition_version: remove the schema argument from partition_entry::read() memtable: remove _schema from memtable_entry row_cache: remove _schema from cache_entry partition_version: remove the _schema field from partition_snapshot partition_version: add a _schema field to partition_version mutation_partition: change schema_ptr to schema& in mutation_partition::difference mutation_partition: change schema_ptr to schema& in mutation_partition constructor mutation_partition_v2: change schema_ptr to schema& in mutation_partition_v2 constructor mutation_partition: add upgrading variants of row::apply() partition_version: update the comment to apply_to_incomplete() mutation_partition_v2: clean up variants of apply() mutation_partition: remove apply_weak() mutation_partition_v2: remove a misleading comment in apply_monotonically() row_cache_test: add schema changes to test_concurrent_reads_and_eviction mutation_partition: fix mixed-schema apply()

avikivity · 2023-05-25T22:20:22Z

Party!!!!!!!eleven

michoecho · 2023-07-17T14:10:39Z

I just wanted to point out that the solution presented by #13761 relies on the assumption that reads with on-the-fly upgrades aren't that much more expensive than regular reads.

Until the incremental upgrade is done, reads will carry the additional cost of on-the-fly upgrades. If it turns out that this cost is too high, the cluster can become overloaded after the schema change, and the cure will become worse than the disease.

AFAIK the implementation of upgrades isn't very efficient. I strongly think we should do some performance testing of schema changes under load to rule out the possibility that after the change upgrades are indeed low-latency, but have an unacceptable throughput cost in exchange.

We should do that before we branch the next release.

juliayakovlev · 2024-02-15T12:25:11Z

Reproduced with 2023.1.5

2024-02-14 07:48:43.459 <2024-02-14 07:48:14.446>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=dcd8f0b9-8709-4141-9fa6-4a097b295e94: type=REACTOR_STALLED regex=Reactor stalled line_number=53017 node=longevity-large-partitions-200k-pks-db-node-682943fe-0-4
2024-02-14T07:48:14.446+00:00 longevity-large-partitions-200k-pks-db-node-682943fe-0-4     !INFO | scylla[9927]: Reactor stalled for 878 ms on shard 4. Backtrace: 0x54674f3 0x5466950 0x5467cc0 0x3cb5f 0x7fd00c0ecf9a 0x1bdb768 0x1e65e48 0x1e32288 0x1e46a4a 0x1e46bc6 0x1e46bc6 0x1e46bc6 0x1e46bc6 0x1de6e82 0x1ca30ae 0x1d30ec5 0x1d28801 0x1d30a86 0x5477dd4 0x5479057 0x5499ec1 0x544a33a 0x8b19c 0x10cc5f
?? ??:0
?? ??:0

void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:797
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:816
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1346
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1123
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1140
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1382
?? ??:0
?? ??:0
managed_bytes at ././utils/managed_bytes.hh:223
 (inlined by) atomic_cell_or_collection::copy(abstract_type const&) const at ./atomic_cell.cc:128
operator() at ./mutation_partition.cc:1584
 (inlined by) std::__exception_ptr::exception_ptr compact_radix_tree::tree<cell_and_hash, unsigned int>::copy_slots<compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>::clone<compact_radix_tree::tree<cell_and_hash, unsigned int>::leaf_node, row::row(schema const&, column_kind, row const&)::$_14&>(compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head const&, row::row(schema const&, column_kind, row const&)::$_14&, unsigned int) const::{lambda(unsigned int)#1}, row::row(schema const&, column_kind, row const&)::$_14&, compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> > >(compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head const&, cell_and_hash const*, unsigned int, unsigned int, compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >&, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>::clone<compact_radix_tree::tree<cell_and_hash, unsigned int>::leaf_node, row::row(schema const&, column_kind, row const&)::$_14&>(compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head const&, row::row(schema const&, column_kind, row const&)::$_14&, unsigned int) const::{lambda(unsigned int)#1}&&, row::row(schema const&, column_kind, row const&)::$_14&) at ././utils/compact-radix-tree.hh:1397
 (inlined by) std::pair<compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head*, std::__exception_ptr::exception_ptr> compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>::clone<compact_radix_tree::tree<cell_and_hash, unsigned int>::leaf_node, row::row(schema const&, column_kind, row const&)::$_14&>(compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head const&, row::row(schema const&, column_kind, row const&)::$_14&, unsigned int) const at ././utils/compact-radix-tree.hh:1284
 (inlined by) std::pair<compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head*, std::__exception_ptr::exception_ptr> compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >::clone<compact_radix_tree::tree<cell_and_hash, unsigned int>::leaf_node, row::row(schema const&, column_kind, row const&)::$_14&, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >(compact_radix_tree::variadic_union<compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> > const&, row::row(schema const&, column_kind, row const&)::$_14&, unsigned int) const at ././utils/compact-radix-tree.hh:820
 (inlined by) std::pair<compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head*, std::__exception_ptr::exception_ptr> compact_radix_tree::tree<cell_and_hash, unsigned int>::node_base<cell_and_hash, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)1, 4u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)2, 8u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::indirect_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)3, 16u>, compact_radix_tree::tree<cell_and_hash, unsigned int>::direct_layout<cell_and_hash, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)6, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)0, 0u, (compact_radix_tree::tree<cell_and_hash, unsigned int>::layout)4, 32u> >::clone<compact_radix_tree::tree<cell_and_hash, unsigned int>::leaf_node, row::row(schema const&, column_kind, row const&)::$_14&>(row::row(schema const&, column_kind, row const&)::$_14&, unsigned int) const at ././utils/compact-radix-tree.hh:828
 (inlined by) std::pair<compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head*, std::__exception_ptr::exception_ptr> compact_radix_tree::tree<cell_and_hash, unsigned int>::node_head::clone<row::row(schema const&, column_kind, row const&)::$_14&>(row::row(schema const&, column_kind, row const&)::$_14&, unsigned int) const at ././utils/compact-radix-tree.hh:486
void compact_radix_tree::tree<cell_and_hash, unsigned int>::clone_from<row::row(schema const&, column_kind, row const&)::$_14&>(compact_radix_tree::tree<cell_and_hash, unsigned int> const&, row::row(schema const&, column_kind, row const&)::$_14&) at ././utils/compact-radix-tree.hh:1853
 (inlined by) row at ./mutation_partition.cc:1587
 (inlined by) deletable_row at ././mutation_partition.hh:822
rows_entry at ././mutation_partition.hh:946
 (inlined by) rows_entry* allocation_strategy::construct<rows_entry, schema const&, rows_entry const&>(schema const&, rows_entry const&) at ././utils/allocation_strategy.hh:155
 (inlined by) operator() at ./mutation_partition.cc:152
 (inlined by) intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2084
intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2095
intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2095
intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2095
intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>* intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&>(current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&, intrusive_b::node<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>*, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&) const at ././utils/intrusive_btree.hh:2095
void intrusive_b::tree<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0>::clone_from<mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}>(intrusive_b::tree<rows_entry, &rows_entry::_link, rows_entry::tri_compare, 12ul, 20ul, (intrusive_b::key_search)0, (intrusive_b::with_debug)0> const&, mutation_partition::mutation_partition(schema const&, mutation_partition const&)::$_0&, current_deleter<rows_entry>()::{lambda(rows_entry*)#1}&&) at ././utils/intrusive_btree.hh:638
 (inlined by) mutation_partition at ./mutation_partition.cc:154
partition_entry::squashed(seastar::lw_shared_ptr<schema const>, seastar::lw_shared_ptr<schema const>) at ./partition_version.cc:501
 (inlined by) partition_entry::upgrade(seastar::lw_shared_ptr<schema const>, seastar::lw_shared_ptr<schema const>, mutation_cleaner&, cache_tracker*) at ./partition_version.cc:517
operator() at ./row_cache.cc:1324
 (inlined by) decltype(auto) with_allocator<row_cache::upgrade_entry(cache_entry&)::$_28>(allocation_strategy&, row_cache::upgrade_entry(cache_entry&)::$_28&&) at ././utils/allocation_strategy.hh:313
 (inlined by) row_cache::upgrade_entry(cache_entry&) at ./row_cache.cc:1323
 (inlined by) scanning_and_populating_reader::read_from_entry(cache_entry&) at ./row_cache.cc:569
 (inlined by) operator() at ./row_cache.cc:597
decltype(auto) logalloc::allocating_section::with_reclaiming_disabled<scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&>(logalloc::region&, scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&) at ././utils/logalloc.hh:499
 (inlined by) operator() at ././utils/logalloc.hh:521
 (inlined by) decltype(auto) logalloc::allocating_section::with_reserve<logalloc::allocating_section::operator()<scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}>(logalloc::region&, scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&&)::{lambda()#1}>(logalloc::region, logalloc::allocating_section::operator()<scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}>(logalloc::region&, scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&&)::{lambda()#1}) at ././utils/logalloc.hh:470
 (inlined by) decltype(auto) logalloc::allocating_section::operator()<scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}>(logalloc::region&, scanning_and_populating_reader::do_read_from_primary()::{lambda()#1}&&) at ././utils/logalloc.hh:520
 (inlined by) scanning_and_populating_reader::do_read_from_primary() at ./row_cache.cc:580
 (inlined by) scanning_and_populating_reader::read_from_primary() at ./row_cache.cc:626
operator() at ./row_cache.cc:641
 (inlined by) seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> > std::__invoke_impl<seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> >, scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2> >(std::__invoke_other, scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2>&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::__invoke_result<scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2> >::type std::__invoke<scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2> >(std::__invoke_result&&, (scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&)...) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:96
 (inlined by) std::invoke_result<scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2> >::type std::invoke<scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2> >(std::invoke_result&&, (scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&)...) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/functional:110
 (inlined by) auto seastar::internal::future_invoke<scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2> >(scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::optimized_optional<flat_mutation_reader_v2>&&) at ././seastar/include/seastar/core/future.hh:1223
 (inlined by) operator() at ././seastar/include/seastar/core/future.hh:1594
 (inlined by) void seastar::futurize<seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> > >::satisfy_with_result_of<seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> >::then_impl_nrvo<scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}, seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> > >(scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >&&, {lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::future_state<seastar::optimized_optional<flat_mutation_reader_v2> >&&)#1}::operator()(seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >, seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >&&, seastar::future_state<seastar::optimized_optional<flat_mutation_reader_v2> >) const::{lambda()#1}>(seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >, seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> >::then_impl_nrvo<scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}, seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> > >(scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >&&, {lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::future_state<seastar::optimized_optional<flat_mutation_reader_v2> >&&)#1}::operator()(seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >, seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >&&, seastar::future_state<seastar::optimized_optional<flat_mutation_reader_v2> >) const::{lambda()#1}) at ././seastar/include/seastar/core/future.hh:2132
 (inlined by) operator() at ././seastar/include/seastar/core/future.hh:1587
 (inlined by) seastar::continuation<seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >, scanning_and_populating_reader::read_from_secondary()::{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}, seastar::future<seastar::optimized_optional<flat_mutation_reader_v2> >::then_impl_nrvo<{lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}, seastar::future>({lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&&)::{lambda(seastar::internal::promise_base_with_type<seastar::optimized_optional<flat_mutation_reader_v2> >&&, {lambda(seastar::optimized_optional<flat_mutation_reader_v2>&&)#1}&, seastar::future_state<seastar::optimized_optional<flat_mutation_reader_v2> >&&)#1}, seastar::optimized_optional<flat_mutation_reader_v2> >::run_and_dispose() at ././seastar/include/seastar/core/future.hh:781
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2509
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2946
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3115
operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4331
 (inlined by) void std::__invoke_impl<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(std::__invoke_other, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>, void>::type std::__invoke_r<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111
 (inlined by) std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94>::_M_invoke(std::_Any_data const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290
std::function<void ()>::operator()() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:73
?? ??:0
?? ??:0

Packages

Scylla version: 2023.1.5-20240213.08fd6aec7a43 with build-id 448979e99e198eeab4a3b0e1b929397d337d2724

Kernel Version: 5.15.0-1051-gcp

Issue description

This issue is a regression.
It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 5 nodes (n2-highmem-16)

Scylla Nodes used in this run:

longevity-large-partitions-200k-pks-db-node-682943fe-0-5 (35.196.25.245 | 10.142.1.50) (shards: 14)
longevity-large-partitions-200k-pks-db-node-682943fe-0-4 (34.148.42.240 | 10.142.1.49) (shards: 14)
longevity-large-partitions-200k-pks-db-node-682943fe-0-3 (34.23.151.147 | 10.142.1.47) (shards: 14)
longevity-large-partitions-200k-pks-db-node-682943fe-0-2 (35.196.193.64 | 10.142.1.45) (shards: 14)
longevity-large-partitions-200k-pks-db-node-682943fe-0-1 (34.73.78.45 | 10.142.1.28) (shards: 14)

OS / Image: https://www.googleapis.com/compute/v1/projects/scylla-images/global/images/1433372650157216341 (gce: undefined_region)

Test: longevity-large-partition-200k-pks-4days-gce-test
Test id: 682943fe-290b-40b7-b835-3003ed6c3c85
Test name: enterprise-2023.1/longevity/longevity-large-partition-200k-pks-4days-gce-test
Test config file(s):

longevity-large-partition-200k_pks-4days.yaml

Logs and commands

Restore Monitor Stack command: $ hydra investigate show-monitor 682943fe-290b-40b7-b835-3003ed6c3c85
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 682943fe-290b-40b7-b835-3003ed6c3c85

Logs:

db-cluster-682943fe.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/db-cluster-682943fe.tar.gz
sct-runner-events-682943fe.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/sct-runner-events-682943fe.tar.gz
sct-682943fe.log.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/sct-682943fe.log.tar.gz
loader-set-682943fe.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/loader-set-682943fe.tar.gz
monitor-set-682943fe.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/682943fe-290b-40b7-b835-3003ed6c3c85/20240214_120511/monitor-set-682943fe.tar.gz

Jenkins job URL
Argus

juliayakovlev · 2024-03-11T09:19:15Z

Reproduced with 2023.1.6

Packages

Scylla version: 2023.1.6-20240306.ee8c8089d9c4 with build-id ba16490fa8be728988abec09fdc65e8f55710317

Kernel Version: 5.15.0-1053-gcp

Issue description

This issue is a regression.
It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 5 nodes (n2-highmem-16)

Scylla Nodes used in this run:

longevity-large-partitions-200k-pks-db-node-9e6b76ad-0-8 (35.227.106.216 | 10.142.0.37) (shards: 14)
longevity-large-partitions-200k-pks-db-node-9e6b76ad-0-7 (35.229.105.213 | 10.142.0.36) (shards: 14)
longevity-large-partitions-200k-pks-db-node-9e6b76ad-0-6 (35.227.116.225 | 10.142.0.34) (shards: 14)
longevity-large-partitions-200k-pks-db-node-9e6b76ad-0-5 (35.190.159.204 | 10.142.0.127) (shards: 14)
longevity-large-partitions-200k-pks-db-node-9e6b76ad-0-4 (35.231.229.220 | 10.142.0.124) (shards: 14)
longevity-large-partitions-200k-pks-db-node-9e6b76ad-0-3 (35.196.178.196 | 10.142.0.121) (shards: 14)
longevity-large-partitions-200k-pks-db-node-9e6b76ad-0-2 (34.74.144.6 | 10.142.0.7) (shards: 14)
longevity-large-partitions-200k-pks-db-node-9e6b76ad-0-1 (34.75.42.38 | 10.142.0.118) (shards: 14)

OS / Image: https://www.googleapis.com/compute/v1/projects/scylla-images/global/images/1393419829280999616 (gce: undefined_region)

Test: longevity-large-partition-200k-pks-4days-gce-test
Test id: 9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2
Test name: enterprise-2023.1/longevity/longevity-large-partition-200k-pks-4days-gce-test
Test config file(s):

longevity-large-partition-200k_pks-4days.yaml

Logs and commands

Restore Monitor Stack command: $ hydra investigate show-monitor 9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2

Logs:

db-cluster-9e6b76ad.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/db-cluster-9e6b76ad.tar.gz
sct-runner-events-9e6b76ad.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/sct-runner-events-9e6b76ad.tar.gz
sct-9e6b76ad.log.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/sct-9e6b76ad.log.tar.gz
loader-set-9e6b76ad.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/loader-set-9e6b76ad.tar.gz
monitor-set-9e6b76ad.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/9e6b76ad-3b33-4a0a-b9c8-f40ea4bd3bc2/20240308_165203/monitor-set-9e6b76ad.tar.gz

Jenkins job URL
Argus

soyacz · 2024-05-05T12:54:59Z

reproduced in 2023.1.8

Packages

Scylla version: 2023.1.8-20240502.c7683a2891c6 with build-id d7cbb560ad3a581b6eccbe170de0ca61fb618a19

Kernel Version: 5.15.0-1058-gcp

Issue description

This issue is a regression.
It is unknown if this issue is a regression.

Describe your issue in detail and steps it took to produce it.

Impact

Describe the impact this issue causes to the user.

How frequently does it reproduce?

Describe the frequency with how this issue can be reproduced.

Installation details

Cluster size: 5 nodes (n2-highmem-16)

Scylla Nodes used in this run:

longevity-large-partitions-200k-pks-db-node-1e8e61cb-0-8 (34.73.184.189 | 10.142.0.2) (shards: 14)
longevity-large-partitions-200k-pks-db-node-1e8e61cb-0-7 (34.139.249.79 | 10.142.0.8) (shards: 14)
longevity-large-partitions-200k-pks-db-node-1e8e61cb-0-6 (35.231.109.73 | 10.142.0.4) (shards: 14)
longevity-large-partitions-200k-pks-db-node-1e8e61cb-0-5 (34.148.7.198 | 10.142.1.30) (shards: 14)
longevity-large-partitions-200k-pks-db-node-1e8e61cb-0-4 (35.237.174.135 | 10.142.1.29) (shards: 14)
longevity-large-partitions-200k-pks-db-node-1e8e61cb-0-3 (35.237.104.54 | 10.142.0.124) (shards: 14)
longevity-large-partitions-200k-pks-db-node-1e8e61cb-0-2 (34.73.238.124 | 10.142.0.99) (shards: 14)
longevity-large-partitions-200k-pks-db-node-1e8e61cb-0-1 (34.148.168.231 | 10.142.0.97) (shards: 14)

OS / Image: https://www.googleapis.com/compute/v1/projects/scylla-images/global/images/254008101309101179 (gce: undefined_region)

Test: longevity-large-partition-200k-pks-4days-gce-test
Test id: 1e8e61cb-f19b-4ead-a0f8-0651cfa25bea
Test name: enterprise-2023.1/longevity/longevity-large-partition-200k-pks-4days-gce-test
Test config file(s):

longevity-large-partition-200k_pks-4days.yaml

Logs and commands

Restore Monitor Stack command: $ hydra investigate show-monitor 1e8e61cb-f19b-4ead-a0f8-0651cfa25bea
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 1e8e61cb-f19b-4ead-a0f8-0651cfa25bea

Logs:

db-cluster-1e8e61cb.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/db-cluster-1e8e61cb.tar.gz
sct-runner-events-1e8e61cb.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/sct-runner-events-1e8e61cb.tar.gz
sct-1e8e61cb.log.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/sct-1e8e61cb.log.tar.gz
loader-set-1e8e61cb.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/loader-set-1e8e61cb.tar.gz
monitor-set-1e8e61cb.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/1e8e61cb-f19b-4ead-a0f8-0651cfa25bea/20240503_164857/monitor-set-1e8e61cb.tar.gz

Jenkins job URL
Argus

tgrabiec added scale/large partitions symptom/performance Issues causing performance problems labels Jul 12, 2017

slivne modified the milestone: 2.x Jul 24, 2017

frank8989 mentioned this issue Oct 26, 2017

Scylla generated coredump when fail to allocate_segment #2924

Closed

slivne modified the milestones: 2.x, 2.3 Feb 1, 2018

slivne modified the milestones: 2.3, 2.4 Jul 12, 2018

tzach modified the milestones: 3.0, 3.x Sep 13, 2018

avikivity mentioned this issue Dec 2, 2018

Reactor stalled for 1622 ms during column definition #3951

Closed

slivne mentioned this issue Jul 7, 2019

Version 3.1: Reactor stalls during modifying table (large partition test) #4626

Closed

slivne mentioned this issue Jul 18, 2019

Segmentation fault: oversized allocation: 694565187584 bytes #4736

Open

bhalevy mentioned this issue Oct 3, 2019

Large Partition 4 Days - Reactor stalls during modify_table #5137

Closed

1 task

slivne added the Eng-2 label Oct 27, 2019

slivne mentioned this issue Dec 24, 2019

ModifyTablePropertiesCompression caused many stalls with more than 2 seconds on longevity-large-partition-4days-test #5513

Closed

slivne assigned bhalevy May 10, 2020

slivne modified the milestones: 4.x, 4.3 May 10, 2020

avikivity mentioned this issue May 11, 2020

CDC: reactor stalled up to 17000 ms after change cdc log table property #6098

Closed

avikivity mentioned this issue Dec 24, 2022

Significant fall down of operations per seconds during schema altering #7632

Closed

mykaul added the symptom/latency label Mar 2, 2023

michoecho mentioned this issue Mar 14, 2023

mvcc: make schema upgrades gentle, old version of the PR #13104

Closed

avikivity mentioned this issue Mar 23, 2023

1171 reactor stalls messages with >800ms in ics-longevity-toggle-strategy-large-partitions-24h-test #13296

Closed

bhalevy mentioned this issue Apr 16, 2023

logalloc::region_impl::object_descriptor::encode stalls (More than 100ms reactor stalls from memory allocation) - kernel issue? #8828

Closed

mykaul added Eng-1 and removed Eng-2 labels Apr 18, 2023

michoecho mentioned this issue May 4, 2023

mvcc: make schema upgrades gentle #13761

Merged

This was referenced May 17, 2023

Reactor stalled for 889 ms on shard 0 #13918

Closed

Coredump occurred after MV creation and during a member-node stop process #14006

Closed

scylladb-promoter closed this as completed in 80c8a6d May 25, 2023

scylladb-promoter added the Backport candidate label May 25, 2023

mykaul removed the Backport candidate label May 26, 2023

mykaul modified the milestones: 5.3, 5.4 Oct 19, 2023

avikivity mentioned this issue May 2, 2024

Unstall merge topology snapshot #18290

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema of partition entries in cache and memtable is updated atomically, causing high latency and possible OOM on large partitions #2577

Schema of partition entries in cache and memtable is updated atomically, causing high latency and possible OOM on large partitions #2577

tgrabiec commented Jul 12, 2017

avikivity commented Apr 22, 2020

slivne commented May 10, 2020

avikivity commented May 11, 2020

avikivity commented May 11, 2020

tgrabiec commented May 12, 2020

tgrabiec commented May 12, 2020

kostja commented May 16, 2020

avikivity commented May 16, 2020

kostja commented May 16, 2020

mykaul commented Dec 8, 2022

Footnotes

avikivity commented May 25, 2023

michoecho commented Jul 17, 2023

juliayakovlev commented Feb 15, 2024 •

edited

Logs:

juliayakovlev commented Mar 11, 2024

Logs:

soyacz commented May 5, 2024

Logs:

Schema of partition entries in cache and memtable is updated atomically, causing high latency and possible OOM on large partitions #2577

Schema of partition entries in cache and memtable is updated atomically, causing high latency and possible OOM on large partitions #2577

Comments

tgrabiec commented Jul 12, 2017

avikivity commented Apr 22, 2020

slivne commented May 10, 2020

avikivity commented May 11, 2020

avikivity commented May 11, 2020

tgrabiec commented May 12, 2020

tgrabiec commented May 12, 2020

kostja commented May 16, 2020

avikivity commented May 16, 2020

kostja commented May 16, 2020

mykaul commented Dec 8, 2022

Footnotes

avikivity commented May 25, 2023

michoecho commented Jul 17, 2023

juliayakovlev commented Feb 15, 2024 • edited

Packages

Issue description

Impact

How frequently does it reproduce?

Installation details

Logs:

juliayakovlev commented Mar 11, 2024

Packages

Issue description

Impact

How frequently does it reproduce?

Installation details

Logs:

soyacz commented May 5, 2024

Packages

Issue description

Impact

How frequently does it reproduce?

Installation details

Logs:

juliayakovlev commented Feb 15, 2024 •

edited