You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Node3 was decommissioned with Decommission_seed_node nemesis and new node was added node11. But during cleanup
on node11 issue: scylladb/scylladb#14004 happened with network exception. In several nemesis, node11 was choosen for new nemesis disrupt_restart_with_resharding. First steps of nemesis are stop scylla update scylla.yaml and start scylla. And according to sct.log, command was executed succesfully on node11.
But this command is not logged and scylla was not restarted :
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 3:strm] query_processor - Column definitions for scylla_bench.test changed, invalidating related prepared statements
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 0:strm] migration_manager - Gossiping my schema version 82c3b60c-101e-11ef-df50-1b54cd8053fe
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 0:strm] schema_tables - Schema version changed to 82c3b60c-101e-11ef-df50-1b54cd8053fe
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 0:strm] schema_tables - Altering scylla_bench.test id=0ef48140-0fef-11ef-a1d9-169281dba5bc version=83986280-101e-11ef-bbd9-8728581320a2
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 0:strm] query_processor - Column definitions for scylla_bench.test changed, invalidating related prepared statements
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 5:strm] query_processor - Column definitions for scylla_bench.test changed, invalidating related prepared statements
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 1:strm] query_processor - Column definitions for scylla_bench.test changed, invalidating related prepared statements
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 6:strm] query_processor - Column definitions for scylla_bench.test changed, invalidating related prepared statements
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 3:strm] query_processor - Column definitions for scylla_bench.test changed, invalidating related prepared statements
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 2:strm] query_processor - Column definitions for scylla_bench.test changed, invalidating related prepared statements
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 4:strm] query_processor - Column definitions for scylla_bench.test changed, invalidating related prepared statements
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 0:strm] migration_manager - Gossiping my schema version 83987c20-101e-11ef-930b-cceb725bebf8
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 0:strm] schema_tables - Schema version changed to 83987c20-101e-11ef-930b-cceb725bebf8
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 0:comp] compaction - [Compact keyspace1.standard1 7d2d5180-101e-11ef-b3fd-aed309b4fe72] Compacted 2 sstables to [/var/lib/scylla/data/keyspace1/standard1-fe44ce900fee11efa9cc2e9ed5cf889a/me-3gg2_0ejw_47j8w2qjd05txmp7xu-big-Data.db:level=0]. 1GB to 1GB (~86% of original) in 10905ms = 135MB/s. ~1052672 total partitions merged to 905165.
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 0:comp] compaction - [Compact keyspace1.standard1 83ae8290-101e-11ef-b3fd-aed309b4fe72] Compacting [/var/lib/scylla/data/keyspace1/standard1-fe44ce900fee11efa9cc2e9ed5cf889a/me-3gg2_0ejw_47j8w2qjd05txmp7xu-big-Data.db:level=0:origin=compaction,/var/lib/scylla/data/keyspace1/standard1-fe44ce900fee11efa9cc2e9ed5cf889a/me-3gg2_0eeq_4jbmo2qjd05txmp7xu-big-Data.db:level=0:origin=cleanup]
2024-05-12T05:22:23.581+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 6:strm] compaction_manager - Starting off-strategy compaction for keyspace1.standard1 compaction_group=0, 469 candidates were found
2024-05-12T05:23:23.582+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 2:strm] compaction_manager - Starting off-strategy compaction for keyspace1.standard1 compaction_group=0, 486 candidates were found
2024-05-12T05:24:23.585+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 5:strm] compaction - [Reshape keyspace1.standard1 8a7ce940-101e-11ef-a596-aed409b4fe72] Reshaped 484 sstables to [/var/lib/scylla/data/keyspace1/standard1-fe44ce900fee11efa9cc2e9ed5cf889a/me-3gg2_0ekj_07xhs2inife2kgzt6a-big-Data.db:level=0]. 5MB to 2MB (~48% of original) in 153ms = 34MB/s. ~61952 total partitions merged to 1720.
2024-05-12T05:24:23.585+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 5:strm] compaction_manager - Done with off-strategy compaction for keyspace1.standard1 compaction_group=0
2024-05-12T05:25:23.587+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 0:strm] query_processor - Column definitions for scylla_bench.test changed, invalidating related prepared statements
2024-05-12T05:25:23.587+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 5:strm] query_processor - Column definitions for scylla_bench.test changed, invalidating related prepared statements
2024-05-12T05:25:23.587+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 6:strm] query_processor - Column definitions for scylla_bench.test changed, invalidating related prepared statements
2024-05-12T05:25:23.587+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 2:strm] query_processor - Column definitions for scylla_bench.test changed, invalidating related prepared statements
2024-05-12T05:25:23.587+00:00 longevity-parallel-topology-schema--db-node-2a3bb187-11 !INFO | scylla[6772]: [shard 3:strm] query_processor - Column definitions for scylla_bench.test changed, invalidating related prepared statements
The possible reason is node11 was lost connection due to issue: scylladb/scylladb#14004 during previous disruptive nemesis and after that ssh connection was not reconnected properly.
Run disrupt_restart_with_resharding on new node after step1
Expected behavior: [What you expected to happen]
disrupt_restart_with_resharding should pass
Actual behavior: [What actually happened]
Failed with error: Exception: Resharding has not been started (murmur3_partitioner_ignore_msb_bits=15) Check the log for the details
because command for scylla restart was not exectued on node.
Impact
Nemesis disrupt_restart_with_resharding failed because scylla was not restart
How frequently does it reproduce?
rarely
Installation details
SCT Version: master:latest
Scylla version (or git commit hash): Scylla version 5.5.0~dev-0.20240510.28791aa2c1d3
The issue is due to the nemesis starting scylla and then immediately checking logs - there are currently two problems with that:
start_scylla doesn't expose verify_up_timeout parameter of start_scylla_server, which leads to lower timeout than expected
the log follower tries to check for log lines without any retries and timeouts, so if resharding starts slightly later (in this case it started 5 minutes after the nemesis failed, fyi the nemesis ran for a total of 2 minutes - which seems kinda low)
Issue description
Node3 was decommissioned with
Decommission_seed_node
nemesis and new node was added node11. But during cleanupon node11 issue: scylladb/scylladb#14004 happened with network exception. In several nemesis, node11 was choosen for new nemesis
disrupt_restart_with_resharding
. First steps of nemesis are stop scylla update scylla.yaml and start scylla. And according to sct.log, command was executed succesfully on node11.Sct reported that command was executed on node11:
But this command is not logged and scylla was not restarted :
The possible reason is node11 was lost connection due to issue: scylladb/scylladb#14004 during previous disruptive nemesis and after that ssh connection was not reconnected properly.
Steps to Reproduce
Expected behavior: [What you expected to happen]
disrupt_restart_with_resharding should pass
Actual behavior: [What actually happened]
Failed with error:
Exception: Resharding has not been started (murmur3_partitioner_ignore_msb_bits=15) Check the log for the details
because command for scylla restart was not exectued on node.
Impact
Nemesis
disrupt_restart_with_resharding
failed because scylla was not restartHow frequently does it reproduce?
rarely
Installation details
SCT Version: master:latest
Scylla version (or git commit hash): Scylla version 5.5.0~dev-0.20240510.28791aa2c1d3
Logs
The text was updated successfully, but these errors were encountered: