The "nodetool repair -pr system_auth" command takes 22 minutes (instead of 3 seconds) #15159

yarongilor · 2023-08-24T16:23:01Z

Issue description

This issue is a regression.
It is unknown if this issue is a regression.

At test start, a nodetool repair for system_auth table is executed on nodes.
It usually takes about 3 seconds. In several recent runs, it took more than 22 minutes.
SCT has a 10 minutes timeout so the test failed like:

< t:2023-08-12 03:25:19,693 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth "...
< t:2023-08-12 03:25:19,694 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth "...
< t:2023-08-12 03:25:19,698 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth "...
< t:2023-08-12 03:25:19,703 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth "...
< t:2023-08-12 03:45:20,120 f:remote_libssh_cmd_runner.py l:66   c:RemoteLibSSH2CmdRunner p:ERROR > Command: "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth "
< t:2023-08-12 03:45:20,121 f:decorators.py   l:72   c:sdcm.utils.decorators p:DEBUG > '_run': failed with 'RetryableNetworkException('Command did not complete within 1200 seconds!\n\nCommand: "/usr/bin/nodetool -u cassandra -pw \'cassandra\'  repair -pr system_auth "\n\nStdout:\n\n[2023-08-12 03:25:20,660] Starting repair command #5, repairing 1 ranges for keyspace system_auth (parallelism=SEQUENTIAL, full=true)\n\nStderr:\n\n\n\n')', retrying [#0]
< t:2023-08-12 03:45:20,128 f:remote_libssh_cmd_runner.py l:66   c:RemoteLibSSH2CmdRunner p:ERROR > Command: "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth "
< t:2023-08-12 03:45:20,128 f:decorators.py   l:72   c:sdcm.utils.decorators p:DEBUG > '_run': failed with 'RetryableNetworkException('Command did not complete within 1200 seconds!\n\nCommand: "/usr/bin/nodetool -u cassandra -pw \'cassandra\'  repair -pr system_auth "\n\nStdout:\n\n[2023-08-12 03:25:20,847] Starting repair command #1, repairing 1 ranges for keyspace system_auth (parallelism=SEQUENTIAL, full=true)\n\nStderr:\n\n\n\n')', retrying [#0]
< t:2023-08-12 03:45:20,176 f:remote_libssh_cmd_runner.py l:66   c:RemoteLibSSH2CmdRunner p:ERROR > Command: "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth "
< t:2023-08-12 03:45:20,176 f:decorators.py   l:72   c:sdcm.utils.decorators p:DEBUG > '_run': failed with 'RetryableNetworkException('Command did not complete within 1200 seconds!\n\nCommand: "/usr/bin/nodetool -u cassandra -pw \'cassandra\'  repair -pr system_auth "\n\nStdout:\n\n[2023-08-12 03:25:20,720] Starting repair command #5, repairing 1 ranges for keyspace system_auth (parallelism=SEQUENTIAL, full=true)\n\nStderr:\n\n\n\n')', retrying [#0]
< t:2023-08-12 03:45:20,314 f:remote_libssh_cmd_runner.py l:66   c:RemoteLibSSH2CmdRunner p:ERROR > Command: "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth "
< t:2023-08-12 03:45:20,314 f:decorators.py   l:72   c:sdcm.utils.decorators p:DEBUG > '_run': failed with 'RetryableNetworkException('Command did not complete within 1200 seconds!\n\nCommand: "/usr/bin/nodetool -u cassandra -pw \'cassandra\'  repair -pr system_auth "\n\nStdout:\n\n[2023-08-12 03:25:20,604] Starting repair command #5, repairing 1 ranges for keyspace system_auth (parallelism=SEQUENTIAL, full=true)\n\nStderr:\n\n\n\n')', retrying [#0]

Node setup failed for timeout during repair of system-auth:

< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR > Exception in setUp. Will call tearDown < t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR > Exception in 
setUp. Will call tearDown
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR > Traceback (most recent call last):
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >   File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 164, in wrapper
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >     return method(*args, **kwargs)
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >   File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 119, in inner
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >     res = func(*args, **kwargs)
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >   File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 825, in setUp
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >     self.set_system_auth_rf(db_cluster=db_cluster)
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >   File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 912, in set_system_auth_rf
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >     parallel_objects.run(_nodetool_repair)
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >   File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 444, in run
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >     raise ParallelObjectException(results=results)
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR > sdcm.utils.common.ParallelObjectException: Node longevity-50gb-12h-master-db-node-18cdaa5c-1 [52.48.20.193 | 10.4.10.40] (seed: True):
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >  Traceback (most recent call last):
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >   File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 428, in run
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >     result = future.result(time_out)
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >   File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 460, in result
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >     raise TimeoutError()
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR > concurrent.futures._base.TimeoutError
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR > Node longevity-50gb-12h-master-db-node-18cdaa5c-2 [54.194.113.110 | 10.4.8.155] (seed: True):
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >  Traceback (most recent call last):
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >   File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 428, in run
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >     result = future.result(time_out)
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >   File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 460, in result
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >     raise TimeoutError()
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR > concurrent.futures._base.TimeoutError
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR > Node longevity-50gb-12h-master-db-node-18cdaa5c-3 [34.244.252.187 | 10.4.8.141] (seed: True):
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >  Traceback (most recent call last):
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >   File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 428, in run
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >     result = future.result(time_out)
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >   File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 460, in result
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >     raise TimeoutError()
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR > concurrent.futures._base.TimeoutError
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR > Node longevity-50gb-12h-master-db-node-18cdaa5c-4 [34.242.216.29 | 10.4.11.104] (seed: True):
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >  Traceback (most recent call last):
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >   File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/common.py", line 428, in run
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >     result = future.result(time_out)
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >   File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 460, in result
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR >     raise TimeoutError()
< t:2023-08-12 03:50:19,698 f:tester.py       l:171  c:sdcm.tester          p:ERROR > concurrent.futures._base.TimeoutError
< t:2023-08-12 03:50:19,703 f:tester.py       l:229  c:silence              p:DEBUG > Finished 'Sending test end event'. No errors were silenced.

This repar is expected to take ~ 3 seconds, as seen in the previous run:

< t:2023-08-06 08:34:17,539 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth "...
< t:2023-08-06 08:34:17,541 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth "...
< t:2023-08-06 08:34:17,542 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth "...
< t:2023-08-06 08:34:17,548 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth "...
< t:2023-08-06 08:34:21,066 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth " finished with status 0
< t:2023-08-06 08:34:21,067 f:cluster.py      l:2623 c:sdcm.cluster_aws     p:DEBUG > Node longevity-50gb-12h-master-db-node-d0849e72-4 [54.171.224.8 | 10.4.9.53] (seed: True): Command '/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth ' duration -> 3.5151809980000053 s
< t:2023-08-06 08:34:21,073 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth " finished with status 0
< t:2023-08-06 08:34:21,073 f:cluster.py      l:2623 c:sdcm.cluster_aws     p:DEBUG > Node longevity-50gb-12h-master-db-node-d0849e72-3 [34.242.226.85 | 10.4.8.252] (seed: True): Command '/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth ' duration -> 3.5238472599999113 s
< t:2023-08-06 08:34:21,097 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth " finished with status 0
< t:2023-08-06 08:34:21,097 f:cluster.py      l:2623 c:sdcm.cluster_aws     p:DEBUG > Node longevity-50gb-12h-master-db-node-d0849e72-2 [54.78.180.130 | 10.4.8.14] (seed: True): Command '/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth ' duration -> 3.5486178639998798 s
< t:2023-08-06 08:34:21,101 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth " finished with status 0
< t:2023-08-06 08:34:21,101 f:cluster.py      l:2623 c:sdcm.cluster_aws     p:DEBUG > Node longevity-50gb-12h-master-db-node-d0849e72-1 [34.246.162.61 | 10.4.9.155] (seed: True): Command '/usr/bin/nodetool -u cassandra -pw 'cassandra'  repair -pr system_auth ' duration -> 3.5554634939999232 s

Impact

The basic impact is currently a very long duration for repair to complete (no sure how it is relevant to customer).

How frequently does it reproduce?

reproduced in several recent runs.

Installation details

Cluster size: 4 nodes (i4i.4xlarge)

Scylla Nodes used in this run:

longevity-50gb-12h-master-db-node-18cdaa5c-4 (34.242.216.29 | 10.4.11.104) (shards: 8)
longevity-50gb-12h-master-db-node-18cdaa5c-3 (34.244.252.187 | 10.4.8.141) (shards: 13)
longevity-50gb-12h-master-db-node-18cdaa5c-2 (54.194.113.110 | 10.4.8.155) (shards: 12)
longevity-50gb-12h-master-db-node-18cdaa5c-1 (52.48.20.193 | 10.4.10.40) (shards: 14)

OS / Image: ami-037a9a5f6d0cf7444 (aws: undefined_region)

Test: longevity-150gb-asymmetric-cluster-12h-test
Test id: 18cdaa5c-d2d0-49cf-aebf-273830fac739
Test name: scylla-master/longevity/longevity-150gb-asymmetric-cluster-12h-test
Test config file(s):

longevity-150GB-12h-autorization-LimitedMonkey.yaml

Logs and commands

Restore Monitor Stack command: $ hydra investigate show-monitor 18cdaa5c-d2d0-49cf-aebf-273830fac739
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 18cdaa5c-d2d0-49cf-aebf-273830fac739

Logs:

db-cluster-18cdaa5c.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/18cdaa5c-d2d0-49cf-aebf-273830fac739/20230812_035302/db-cluster-18cdaa5c.tar.gz
sct-runner-events-18cdaa5c.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/18cdaa5c-d2d0-49cf-aebf-273830fac739/20230812_035302/sct-runner-events-18cdaa5c.tar.gz
sct-18cdaa5c.log.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/18cdaa5c-d2d0-49cf-aebf-273830fac739/20230812_035302/sct-18cdaa5c.log.tar.gz
loader-set-18cdaa5c.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/18cdaa5c-d2d0-49cf-aebf-273830fac739/20230812_035302/loader-set-18cdaa5c.tar.gz
monitor-set-18cdaa5c.tar.gz - https://cloudius-jenkins-test.s3.amazonaws.com/18cdaa5c-d2d0-49cf-aebf-273830fac739/20230812_035302/monitor-set-18cdaa5c.tar.gz

Jenkins job URL
Argus

The text was updated successfully, but these errors were encountered:

yarongilor · 2023-08-24T16:23:10Z

Run 12 had the following log output:

2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  0] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: starting user-requested repair for keyspace system_auth, repair id 1, options {{trace -> false}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}, {primaryRange -> true}}
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  0] repair - primary-range repair
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  0] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Skipped sending repair_flush_hints_batchlog to nodes={10.4.11.104, 10.4.8.155, 10.4.8.141, 10.4.10.40}
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  0] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  3] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  2] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  6] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard 13] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard 12] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard 11] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  8] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  7] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard 10] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  5] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  9] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  4] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  1] repair - repair[9e45e542-2f21-4093-9e49-b2e7c7fd3d3f]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-12T03:25:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7688]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:25:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7688]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:25:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7688]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:26:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7693]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:26:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7693]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:26:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7693]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:26:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7698]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:26:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7698]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:26:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7698]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:27:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7706]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:27:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7706]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:27:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7706]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:27:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7711]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:27:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7711]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:27:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7711]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:28:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7716]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:28:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7716]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:28:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7716]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:28:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7721]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:28:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7721]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:28:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7721]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:29:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7730]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:29:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7730]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:29:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7730]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:29:24+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | amazon-ssm-agent[512]: 2023-08-12 03:29:24 WARN EC2RoleProvider Failed to connect to Systems Manager with instance profile role credentials. Err: retrieved credentials failed to report to ssm. RequestId: 8ed3a0d2-55b1-4234-ac1f-9601fd07cfa5 Error: AccessDeniedException: User: arn:aws:sts::797456418907:assumed-role/qa-scylla-manager-backup-role/i-043d9d1e169e73fb9 is not authorized to perform: ssm:UpdateInstanceInformation on resource: arn:aws:ec2:eu-west-1:797456418907:instance/i-043d9d1e169e73fb9 because no identity-based policy allows the ssm:UpdateInstanceInformation action
2023-08-12T03:29:24+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | amazon-ssm-agent[512]:       status code: 400, request id: 8ed3a0d2-55b1-4234-ac1f-9601fd07cfa5
2023-08-12T03:29:24+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | amazon-ssm-agent[512]: 2023-08-12 03:29:24 ERROR EC2RoleProvider Failed to connect to Systems Manager with SSM role credentials. error calling RequestManagedInstanceRoleToken: AccessDeniedException: Systems Manager's instance management role is not configured for account: 797456418907
2023-08-12T03:29:24+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | amazon-ssm-agent[512]:       status code: 400, request id: cabc9567-2498-426c-80e5-07ae53f6dc8b
2023-08-12T03:29:24+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | amazon-ssm-agent[512]: 2023-08-12 03:29:24 ERROR [CredentialRefresher] Retrieve credentials produced error: no valid credentials could be retrieved for ec2 identity
2023-08-12T03:29:24+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | amazon-ssm-agent[512]: 2023-08-12 03:29:24 INFO [CredentialRefresher] Sleeping for 17m35s before retrying retrieve credentials
2023-08-12T03:29:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7737]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:29:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7737]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:29:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7737]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:30:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7744]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:30:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7744]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:30:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7744]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:30:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7749]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:30:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7749]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:30:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7749]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:31:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7754]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:31:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7754]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:31:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7754]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:31:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7759]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:31:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7759]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:31:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7759]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:32:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7764]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:32:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7764]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:32:08+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7764]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:32:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7769]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:32:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7769]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:32:38+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7769]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:33:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7774]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:33:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7774]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:33:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7774]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:33:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7779]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:33:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7779]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:33:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7779]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:34:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7784]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:34:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7784]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:34:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7784]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:34:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7789]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:34:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7789]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:34:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7789]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:35:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7795]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:35:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7795]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:35:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7795]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:35:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7801]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:35:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7801]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:35:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7801]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:36:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7806]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:36:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7806]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:36:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7806]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:36:20+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sshd[4725]: Timeout, client not responding from user scyllaadm 10.4.11.37 port 35718
2023-08-12T03:36:20+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sshd[4702]: pam_unix(sshd:session): session closed for user scyllaadm
2023-08-12T03:36:20+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | systemd-logind[536]: Session 1 logged out. Waiting for processes to exit.
2023-08-12T03:36:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7813]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:36:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7813]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:36:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7813]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:37:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7818]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:37:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7818]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:37:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7818]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:37:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7823]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:37:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7823]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:37:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7823]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:38:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7828]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:38:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7828]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:38:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7828]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:38:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7833]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short
2023-08-12T03:38:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7833]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
2023-08-12T03:38:39+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | sudo[7833]: pam_unix(sudo:session): session closed for user root
2023-08-12T03:39:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1   !NOTICE | sudo[7839]: scyllaadm : PWD=/home/scyllaadm ; USER=root ; COMMAND=/usr/bin/coredumpctl -q --json=short

The successful run 13 had the following log output:

2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 0] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: starting user-requested repair for keyspace system_auth, repair id 1, options {{trace -> false}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}, {primaryRange -> true}}
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 0] repair - primary-range repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 0] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Skipped sending repair_flush_hints_batchlog to nodes={10.4.8.56, 10.4.8.43, 10.4.8.128, 10.4.11.62}
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 0] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=roles, table_id=5bc52802-de25-35ed-aeab-188eecebb090, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 1] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=roles, table_id=5bc52802-de25-35ed-aeab-188eecebb090, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 2] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=roles, table_id=5bc52802-de25-35ed-aeab-188eecebb090, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 5] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=roles, table_id=5bc52802-de25-35ed-aeab-188eecebb090, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 9] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=roles, table_id=5bc52802-de25-35ed-aeab-188eecebb090, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 6] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=roles, table_id=5bc52802-de25-35ed-aeab-188eecebb090, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 7] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=roles, table_id=5bc52802-de25-35ed-aeab-188eecebb090, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 8] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=roles, table_id=5bc52802-de25-35ed-aeab-188eecebb090, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 3] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=roles, table_id=5bc52802-de25-35ed-aeab-188eecebb090, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 4] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=roles, table_id=5bc52802-de25-35ed-aeab-188eecebb090, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 5] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 2 out of 4 tables in keyspace=system_auth, table=role_permissions, table_id=3afbe79f-2194-31a7-add7-f5ab90d8ec9c, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 2] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 2 out of 4 tables in keyspace=system_auth, table=role_permissions, table_id=3afbe79f-2194-31a7-add7-f5ab90d8ec9c, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 9] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 2 out of 4 tables in keyspace=system_auth, table=role_permissions, table_id=3afbe79f-2194-31a7-add7-f5ab90d8ec9c, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 4] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 2 out of 4 tables in keyspace=system_auth, table=role_permissions, table_id=3afbe79f-2194-31a7-add7-f5ab90d8ec9c, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 8] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 2 out of 4 tables in keyspace=system_auth, table=role_permissions, table_id=3afbe79f-2194-31a7-add7-f5ab90d8ec9c, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 7] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 2 out of 4 tables in keyspace=system_auth, table=role_permissions, table_id=3afbe79f-2194-31a7-add7-f5ab90d8ec9c, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 6] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 2 out of 4 tables in keyspace=system_auth, table=role_permissions, table_id=3afbe79f-2194-31a7-add7-f5ab90d8ec9c, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 3] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 2 out of 4 tables in keyspace=system_auth, table=role_permissions, table_id=3afbe79f-2194-31a7-add7-f5ab90d8ec9c, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 1] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 2 out of 4 tables in keyspace=system_auth, table=role_permissions, table_id=3afbe79f-2194-31a7-add7-f5ab90d8ec9c, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 0] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 2 out of 4 tables in keyspace=system_auth, table=role_permissions, table_id=3afbe79f-2194-31a7-add7-f5ab90d8ec9c, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 5] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 3 out of 4 tables in keyspace=system_auth, table=role_members, table_id=0ecdaa87-f8fb-3e60-88d1-74fb36fe5c0d, repair_reason=repair
2023-08-17T12:03:45+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 2] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 3 out of 4 tables in keyspace=system_auth, table=role_members, table_id=0ecdaa87-f8fb-3e60-88d1-74fb36fe5c0d, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 9] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 3 out of 4 tables in keyspace=system_auth, table=role_members, table_id=0ecdaa87-f8fb-3e60-88d1-74fb36fe5c0d, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 8] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 3 out of 4 tables in keyspace=system_auth, table=role_members, table_id=0ecdaa87-f8fb-3e60-88d1-74fb36fe5c0d, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 4] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 3 out of 4 tables in keyspace=system_auth, table=role_members, table_id=0ecdaa87-f8fb-3e60-88d1-74fb36fe5c0d, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 7] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 3 out of 4 tables in keyspace=system_auth, table=role_members, table_id=0ecdaa87-f8fb-3e60-88d1-74fb36fe5c0d, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 6] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 3 out of 4 tables in keyspace=system_auth, table=role_members, table_id=0ecdaa87-f8fb-3e60-88d1-74fb36fe5c0d, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 3] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 3 out of 4 tables in keyspace=system_auth, table=role_members, table_id=0ecdaa87-f8fb-3e60-88d1-74fb36fe5c0d, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 1] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 3 out of 4 tables in keyspace=system_auth, table=role_members, table_id=0ecdaa87-f8fb-3e60-88d1-74fb36fe5c0d, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 5] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 4 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 0] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 3 out of 4 tables in keyspace=system_auth, table=role_members, table_id=0ecdaa87-f8fb-3e60-88d1-74fb36fe5c0d, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 2] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 4 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 9] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 4 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 8] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 4 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 4] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 4 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 7] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 4 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 6] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 4 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 3] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 4 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 1] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 4 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 5] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_permissions, role_members, role_attributes}, ranges_nr=256, round_nr=1024, round_nr_fast_path_already_synced=1024, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=15360, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.0556295 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_nr={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_bytes_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} MiB/s, row_from_disk_rows_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 5] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: completed successfully, keyspace=system_auth
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 2] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_permissions, role_members, role_attributes}, ranges_nr=256, round_nr=1024, round_nr_fast_path_already_synced=1024, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=15360, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.0950493 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_nr={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_bytes_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} MiB/s, row_from_disk_rows_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 2] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: completed successfully, keyspace=system_auth
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 9] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_permissions, role_members, role_attributes}, ranges_nr=256, round_nr=1024, round_nr_fast_path_already_synced=1024, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=15360, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.1270905 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_nr={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_bytes_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} MiB/s, row_from_disk_rows_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 9] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: completed successfully, keyspace=system_auth
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 8] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_permissions, role_members, role_attributes}, ranges_nr=256, round_nr=1024, round_nr_fast_path_already_synced=1024, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=15360, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.14056 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_nr={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_bytes_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} MiB/s, row_from_disk_rows_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 8] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: completed successfully, keyspace=system_auth
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 4] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_permissions, role_members, role_attributes}, ranges_nr=256, round_nr=1024, round_nr_fast_path_already_synced=1024, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=15360, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.1680366 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_nr={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_bytes_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} MiB/s, row_from_disk_rows_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 4] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: completed successfully, keyspace=system_auth
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 7] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_permissions, role_members, role_attributes}, ranges_nr=256, round_nr=1024, round_nr_fast_path_already_synced=1024, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=15360, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.1875298 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_nr={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_bytes_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} MiB/s, row_from_disk_rows_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 7] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: completed successfully, keyspace=system_auth
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 6] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_permissions, role_members, role_attributes}, ranges_nr=256, round_nr=1024, round_nr_fast_path_already_synced=1024, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=15360, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.1947986 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_nr={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_bytes_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} MiB/s, row_from_disk_rows_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 6] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: completed successfully, keyspace=system_auth
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 0] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: Started to repair 4 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f, repair_reason=repair
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 3] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_permissions, role_members, role_attributes}, ranges_nr=256, round_nr=1024, round_nr_fast_path_already_synced=1024, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=15360, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.2394117 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_nr={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_bytes_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} MiB/s, row_from_disk_rows_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 3] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: completed successfully, keyspace=system_auth
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 1] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_permissions, role_members, role_attributes}, ranges_nr=256, round_nr=1024, round_nr_fast_path_already_synced=1024, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=15360, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.3180208 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_nr={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_bytes_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} MiB/s, row_from_disk_rows_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
2023-08-17T12:03:46+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 1] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: completed successfully, keyspace=system_auth
2023-08-17T12:03:47+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 0] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_permissions, role_members, role_attributes}, ranges_nr=256, round_nr=1024, round_nr_fast_path_already_synced=1024, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=15360, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.5279031 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_nr={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}}, row_from_disk_bytes_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} MiB/s, row_from_disk_rows_per_sec={{10.4.8.128, 0}, {10.4.8.43, 0}, {10.4.8.56, 0}, {10.4.11.62, 0}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={}
2023-08-17T12:03:47+00:00 longevity-50gb-12h-master-db-node-b9965f1b-1     !INFO | scylla[5517]:  [shard 0] repair - repair[c0593a4d-2555-46f8-a948-f4b04ba8d083]: completed successfully, keyspace=system_auth

mykaul · 2023-08-27T06:34:18Z

@yarongilor - what version is that?

yarongilor · 2023-08-27T07:15:49Z

@yarongilor - what version is that?

thanks @mykaul , it's:

ScyllaDB version with details: 5.4.0~dev-0.20230809.108e510a2391 with build-id 1c11038a8e78bc420070c7676c409e5fbe27d901

p.s. - SCT /Argus might experienced an issue retrieving the version because of this "too early" failure.

Run Details

Release: scylla-master
Group: Long Weekly Longevities - Tier1
Test: longevity-150gb-asymmetric-cluster-12h-test
Id: 18cdaa5c-d2d0-49cf-aebf-273830fac739
Start time: 2023-08-12 03:06:43
End time: 2023-08-12 03:52:50
Duration: 46 minutes
Started by: timer
Build job: [scylla-master/longevity/longevity-150gb-asymmetric-cluster-12h-test](https://jenkins.scylladb.com/job/scylla-master/job/longevity/job/longevity-150gb-asymmetric-cluster-12h-test/12/)
System Information
Backend: aws
Region: eu-west-1
Image id: ami-037a9a5f6d0cf7444
Instance type: i4i.4xlarge
Node amount: 4

ptrsmrn · 2023-08-28T19:14:42Z

I see some recurring error in messages.log (And other log files as well). It appears for the failed 12 run and not for successful 13 run (basing on the snippets provided in this issue), but it probably doesn't matter:

2023-08-12T03:47:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-4     !INFO | amazon-ssm-agent[514]: 2023-08-12 03:47:09 WARN EC2RoleProvider Failed to connect to Systems Manager with instance profile role credentials. Err: retrieved credentials failed to report to ssm. RequestId: 0b4411b6-0128-419a-b63c-b3ee133b5c3d Error: AccessDeniedException: User: arn:aws:sts::797456418907:assumed-role/qa-scylla-manager-backup-role/i-0abe28a20013c1b5b is not authorized to perform: ssm:UpdateInstanceInformation on resource: arn:aws:ec2:eu-west-1:797456418907:instance/i-0abe28a20013c1b5b because no identity-based policy allows the ssm:UpdateInstanceInformation action
2023-08-12T03:47:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-4     !INFO | amazon-ssm-agent[514]: 	status code: 400, request id: 0b4411b6-0128-419a-b63c-b3ee133b5c3d
2023-08-12T03:47:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-4     !INFO | amazon-ssm-agent[514]: 2023-08-12 03:47:09 ERROR EC2RoleProvider Failed to connect to Systems Manager with SSM role credentials. error calling RequestManagedInstanceRoleToken: AccessDeniedException: Systems Manager's instance management role is not configured for account: 797456418907
2023-08-12T03:47:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-4     !INFO | amazon-ssm-agent[514]: 	status code: 400, request id: c50aaae3-5527-4c9d-9e8a-fbeb1c864d86
2023-08-12T03:47:09+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-4     !INFO | amazon-ssm-agent[514]: 2023-08-12 03:47:09 ERROR [CredentialRefresher] Retrieve credentials produced error: no valid credentials could be retrieved for ec2 identity

ptrsmrn · 2023-08-28T19:23:45Z

There's also some exception in node 1, which doesn't look good, but maybe it's OK after all:

2023-08-12T03:25:18+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  0] cql_server - exception while processing connection: std::_Nested_exception<std::system_error> (error GnuTLS:-53, Error in the push function.): std::system_error (error system:32, sendmsg: Broken pipe)

ptrsmrn · 2023-08-28T19:27:22Z

Interesting, because at both 2023-08-12T03:25:21 and 2023-08-12T03:45:26 a repair for node 1 started on all 14 shards, but it seems like it only started and got stuck (?) I only see the following logs and nothing more:

2023-08-12T03:45:26+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  0] repair - repair[e2c95e1d-2446-4621-84d1-5adb55aa2b88]: starting user-requested repair for keyspace system_auth, repair id 2, options {{trace -> false}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}, {primaryRange -> true}}
2023-08-12T03:45:26+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  0] repair - primary-range repair
2023-08-12T03:45:26+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  0] repair - repair[e2c95e1d-2446-4621-84d1-5adb55aa2b88]: Skipped sending repair_flush_hints_batchlog to nodes={10.4.11.104, 10.4.8.155, 10.4.8.141, 10.4.10.40}
2023-08-12T03:45:26+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-1     !INFO | scylla[5498]:  [shard  0] repair - repair[e2c95e1d-2446-4621-84d1-5adb55aa2b88]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_attributes, table_id=6b8c7359-a843-33f2-a1d8-5dc6a187436f,
...
(last message is repeated for all shards, i.e. 14 times in total)

That being said I see that repair attempts happening before 2023-08-12T03:25:21 were successful.
Last successful repair happened at:

2023-08-12T03:24:02+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-4     !INFO | scylla[5596]:  [shard 0] repair - bootstrap_with_repair: finished with keyspace=system_auth, nr_ranges=256

The next one ended on 2 out of 4:

2023-08-12T03:25:21+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-4     !INFO | scylla[5596]:  [shard 0] repair - repair[da02aa4f-92b6-4937-b53a-71eae835dc9f]: Started to repair 2 out of 4 tables in keyspace=system_auth, table=roles, table_id=5bc52802-de25-35ed-aeab-188eecebb090, repair_reason=repair

And the last one ended on 1 out of 4:

2023-08-12T03:45:26+00:00 longevity-50gb-12h-master-db-node-18cdaa5c-4     !INFO | scylla[5596]:  [shard 7] repair - repair[80e5b71b-49b4-41a2-8f1d-d04ef717cf38]: Started to repair 1 out of 4 tables in keyspace=system_auth, table=role_permissions, table_id=3afbe79f-2194-31a7-add7-f5ab90d8ec9c, repair_reason=repair

ptrsmrn · 2023-08-28T19:55:56Z

I don't see anything else in the log files. More details could help understand why repair gets stuck.
@asias could you please advise?

ptrsmrn · 2023-08-30T19:03:02Z

@asias ping

asias · 2023-08-31T06:14:40Z

This repar is expected to take ~ 3 seconds, as seen in the previous run:

Was the previous run using the exact same SCT version and VM types (number of shards)?

If SCT version was updated, check if it contains the SCT change that modifies how we repair system_auth table and the RF of the system_auth table (e.g, simple strategy to network strategy, RF of the system_auth). I remember there was a bug in SCT in this area and it was fixed.

Note the number of shards was different on the nodes. This is going to make the mutation reader eviction issue more visible.

Another thing is that, I noticed, at 3:45 a secondrepair system_auth was started before the previous one started at 3:25 had finished. This looks like a SCT issue. There is no point starting the second repair before the first repair of system_auth finishes. Please fix it.

Aug 12 03:25:20 longevity-50gb-12h-master-db-node-18cdaa5c-2 scylla[5518]:  [shard  0] repair - repair[ab1ccf55-3a3e-4adf-90cc-b5cce8aace7e]: starting user-requested repair for keyspace system_auth, repair id 5, options {{trace -> false}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}, {primaryRange -> true}}

Aug 12 03:45:25 longevity-50gb-12h-master-db-node-18cdaa5c-2 scylla[5518]:  [shard  0] repair - repair[2ab42a79-40d0-4e3c-a064-cc71c1fe2ad9]: starting user-requested repair for keyspace system_auth, repair id 6, options {{trace -> false}, {jobThreads -> 1}, {incremental -> false}, {parallelism -> parallel}, {primaryRange -> true}}

Also, please increase the timeout in SCT, it is important to know if eventually the system_auth finishes or it got stuck.

mykaul · 2023-08-31T07:05:45Z

If SCT version was updated, check if it contains the SCT change that modifies how we repair system_auth table and the RF of the system_auth table (e.g, simple strategy to network strategy, RF of the system_auth). I remember there was a bug in SCT in this area and it was fixed.

Does it matter - what can justify 22m to repair it?

asias · 2023-08-31T07:44:32Z

If SCT version was updated, check if it contains the SCT change that modifies how we repair system_auth table and the RF of the system_auth table (e.g, simple strategy to network strategy, RF of the system_auth). I remember there was a bug in SCT in this area and it was fixed.

Does it matter - what can justify 22m to repair it?

It matters in the sense we want to know what was changed and what contributed to longer repair time. If the repair indeed finished eventually, the main factor is supposed to be the diff-shard config caused mutation reader eviction problem. Another factor is the network latency between nodes and high RF (not in this test, it only has 4 nodes).

roydahan · 2023-09-11T11:31:37Z

Is anyone looking at this issue?
@ptrsmrn / @asias ?

betsch85 · 2023-09-27T08:14:35Z

I found this thread by googling "Systems Manager's instance management role is not configured for account", weirdly, every time this message appears in the syslog, everything starts to take forever, so be aware that it might be caused by amazon-ssm-agent, not scylladb.

ignore the above

mykaul · 2023-09-27T08:29:07Z

@asias - can you please take a look at this?

asias · 2023-09-27T08:35:31Z

Did you mean the system manager config thing? I am not sure what it is about.

mykaul · 2023-09-27T08:36:45Z

I found this thread by googling "Systems Manager's instance management role is not configured for account", weirdly, every time this message appears in the syslog, everything starts to take forever, so be aware that it might be caused by amazon-ssm-agent, not scylladb.

I don't see the direct correlation here. If it were, it should have impacted all repairs.

ptrsmrn · 2023-10-05T22:05:31Z

@asias @yarongilor do you think we could try to reproduce this issue and collect more info/logs?

mykaul · 2023-10-12T11:51:39Z

@ptrsmrn - not sure if it's materially different than https://github.com/scylladb/scylla-enterprise/issues/3103#issuecomment-1758913004

ptrsmrn · 2023-10-14T20:24:05Z

@ptrsmrn - not sure if it's materially different than scylladb/scylla-enterprise#3103 (comment)

The root cause could be the same, but @asias pointed out a couple of things in his comment (#15159 (comment)), which have not been resolved and are something extra on top of https://github.com/scylladb/scylla-enterprise/issues/3103, so these issues don't seem to be 1:1.
@yarongilor could you please answer questions/resolve issues @asias mentioned in #15159 (comment) ?

*) Problem: We have seen in the field it takes longer than expected to repair system tables like system_auth which has a tiny amount of data but is replicated to all nodes in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The main reason for the slowness is that even if the amount of data is small, repair has to walk though all the token ranges, that is num_tokens * number_of_nodes_in_the_cluster. The overhead of the repair protocol for each token range dominates due to the small amount of data per token range. Another reason is the high network latency between DCs makes the RPC calls used to repair consume more time. *) Solution: To solve this problem, a small table optimization for repair is introduced in this patch. A new repair option is added to turn on this optimization. - No token range to repair is needed by the user. It will repair all token ranges automatically. - Users only need to send the repair rest api to one of the nodes in the cluster. It can be any of the nodes in the cluster. - It does not require the RF to be configured to replicate to all nodes in the cluster. This means it can work with any tables as long as the amount of data is low, e.g., less than 100MiB per node. *) Performance: 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 3} Before: ``` repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1537, round_nr=4612, round_nr_fast_path_already_synced=4611, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds, tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2, 0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848}, {127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s, tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178}, {127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1}, {127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286}, {127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286}, {127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026}, {127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X So, a more than 1000X difference is observed for this common usage of system table repair procedure. Fixes scylladb#15159

*) Problem: We have seen in the field it takes longer than expected to repair system tables like system_auth which has a tiny amount of data but is replicated to all nodes in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The main reason for the slowness is that even if the amount of data is small, repair has to walk though all the token ranges, that is num_tokens * number_of_nodes_in_the_cluster. The overhead of the repair protocol for each token range dominates due to the small amount of data per token range. Another reason is the high network latency between DCs makes the RPC calls used to repair consume more time. *) Solution: To solve this problem, a small table optimization for repair is introduced in this patch. A new repair option is added to turn on this optimization. - No token range to repair is needed by the user. It will repair all token ranges automatically. - Users only need to send the repair rest api to one of the nodes in the cluster. It can be any of the nodes in the cluster. - It does not require the RF to be configured to replicate to all nodes in the cluster. This means it can work with any tables as long as the amount of data is low, e.g., less than 100MiB per node. *) Performance: 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Before: ``` repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1537, round_nr=4612, round_nr_fast_path_already_synced=4611, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds, tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2, 0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848}, {127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s, tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178}, {127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1}, {127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286}, {127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286}, {127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026}, {127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X Same test as above except 5ms delay is added to simulate multiple dc network latency: The time to repair is reduced from 333s to 0.2s. 333.26758 s / 0.22625381s = 1472.98 So, a more than 1000X difference is observed for this common usage of system table repair procedure. Fixes scylladb#15159

*) Problem: We have seen in the field it takes longer than expected to repair system tables like system_auth which has a tiny amount of data but is replicated to all nodes in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The main reason for the slowness is that even if the amount of data is small, repair has to walk though all the token ranges, that is num_tokens * number_of_nodes_in_the_cluster. The overhead of the repair protocol for each token range dominates due to the small amount of data per token range. Another reason is the high network latency between DCs makes the RPC calls used to repair consume more time. *) Solution: To solve this problem, a small table optimization for repair is introduced in this patch. A new repair option is added to turn on this optimization. - No token range to repair is needed by the user. It will repair all token ranges automatically. - Users only need to send the repair rest api to one of the nodes in the cluster. It can be any of the nodes in the cluster. - It does not require the RF to be configured to replicate to all nodes in the cluster. This means it can work with any tables as long as the amount of data is low, e.g., less than 100MiB per node. *) Performance: 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Before: ``` repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1537, round_nr=4612, round_nr_fast_path_already_synced=4611, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds, tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2, 0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848}, {127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s, tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178}, {127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1}, {127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286}, {127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286}, {127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026}, {127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X Same test as above except 5ms delay is added to simulate multiple dc network latency: The time to repair is reduced from 333s to 0.2s. 333.26758 s / 0.22625381s = 1472.98 So, a more than 1000X difference is observed for this common usage of system table repair procedure. Fixes scylladb#16011 Refs scylladb#15159

*) Problem: We have seen in the field it takes longer than expected to repair system tables like system_auth which has a tiny amount of data but is replicated to all nodes in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The main reason for the slowness is that even if the amount of data is small, repair has to walk though all the token ranges, that is num_tokens * number_of_nodes_in_the_cluster. The overhead of the repair protocol for each token range dominates due to the small amount of data per token range. Another reason is the high network latency between DCs makes the RPC calls used to repair consume more time. *) Solution: To solve this problem, a small table optimization for repair is introduced in this patch. A new repair option is added to turn on this optimization. - No token range to repair is needed by the user. It will repair all token ranges automatically. - Users only need to send the repair rest api to one of the nodes in the cluster. It can be any of the nodes in the cluster. - It does not require the RF to be configured to replicate to all nodes in the cluster. This means it can work with any tables as long as the amount of data is low, e.g., less than 100MiB per node. *) Performance: 1) 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Before: ``` repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1537, round_nr=4612, round_nr_fast_path_already_synced=4611, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds, tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2, 0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848}, {127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s, tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178}, {127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1}, {127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286}, {127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286}, {127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026}, {127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X 2) 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Same test as above except 5ms delay is added to simulate multiple dc network latency: The time to repair is reduced from 333s to 0.2s. 333.26758 s / 0.22625381s = 1472.98 3) 3 DCs, each DC has 3 nodes, 9 nodes in the cluster. RF = {dc1: 3, dc2: 3, dc3: 3} , 10 ms network latency Before: ``` repair - repair[86124a4a-fd26-42ea-a078-437ca9e372df]: stats: repair_reason=repair, keyspace=system_auth, tables={role_attributes, role_members, roles}, ranges_nr=2305, round_nr=6916, round_nr_fast_path_already_synced=6915, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=276630, tx_hashes_nr=0, rx_hashes_nr=8, duration=986.34015 seconds, tx_row_nr=7, rx_row_nr=0, tx_row_bytes=1246, rx_row_bytes=0, row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_bytes_per_sec={{127.0.57.1, 1.72105e-07}, {127.0.57.2, 1.72105e-07}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}} MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 0.00101385}, {127.0.57.2, 0.00101385}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}} Rows/s, tx_row_nr_peer={{127.0.57.3, 1}, {127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1}, {127.0.57.8, 1}, {127.0.57.9, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[07ebd571-63cb-4ef6-9465-6e5f1e98f04f]: stats: repair_reason=repair, keyspace=system_auth, tables={role_attributes, role_members, roles}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=128, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.6052915 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3, 178}, {127.0.57.4, 178}, {127.0.57.5, 178}, {127.0.57.6, 178}, {127.0.57.7, 178}, {127.0.57.8, 178}, {127.0.57.9, 178}}, row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 1}, {127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1}, {127.0.57.8, 1}, {127.0.57.9, 1}}, row_from_disk_bytes_per_sec={{127.0.57.1, 0.00037793}, {127.0.57.2, 0.00037793}, {127.0.57.3, 0.00037793}, {127.0.57.4, 0.00037793}, {127.0.57.5, 0.00037793}, {127.0.57.6, 0.00037793}, {127.0.57.7, 0.00037793}, {127.0.57.8, 0.00037793}, {127.0.57.9, 0.00037793}} MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 2.22634}, {127.0.57.2, 2.22634}, {127.0.57.3, 2.22634}, {127.0.57.4, 2.22634}, {127.0.57.5, 2.22634}, {127.0.57.6, 2.22634}, {127.0.57.7, 2.22634}, {127.0.57.8, 2.22634}, {127.0.57.9, 2.22634}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to repair is reduced from 986s (16 minutes) to 1.6s *) Summary So, a more than 1000X difference is observed for this common usage of system table repair procedure. Fixes scylladb#16011 Refs scylladb#15159

repair: Introduce small table optimization *) Problem: We have seen in the field it takes longer than expected to repair system tables like system_auth which has a tiny amount of data but is replicated to all nodes in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The main reason for the slowness is that even if the amount of data is small, repair has to walk though all the token ranges, that is num_tokens * number_of_nodes_in_the_cluster. The overhead of the repair protocol for each token range dominates due to the small amount of data per token range. Another reason is the high network latency between DCs makes the RPC calls used to repair consume more time. *) Solution: To solve this problem, a small table optimization for repair is introduced in this patch. A new repair option is added to turn on this optimization. - No token range to repair is needed by the user. It will repair all token ranges automatically. - Users only need to send the repair rest api to one of the nodes in the cluster. It can be any of the nodes in the cluster. - It does not require the RF to be configured to replicate to all nodes in the cluster. This means it can work with any tables as long as the amount of data is low, e.g., less than 100MiB per node. *) Performance: 1) 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Before: ``` repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1537, round_nr=4612, round_nr_fast_path_already_synced=4611, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds, tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2, 0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848}, {127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s, tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178}, {127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1}, {127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286}, {127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286}, {127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026}, {127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X 2) 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Same test as above except 5ms delay is added to simulate multiple dc network latency: The time to repair is reduced from 333s to 0.2s. 333.26758 s / 0.22625381s = 1472.98 3) 3 DCs, each DC has 3 nodes, 9 nodes in the cluster. RF = {dc1: 3, dc2: 3, dc3: 3} , 10 ms network latency Before: ``` repair - repair[86124a4a-fd26-42ea-a078-437ca9e372df]: stats: repair_reason=repair, keyspace=system_auth, tables={role_attributes, role_members, roles}, ranges_nr=2305, round_nr=6916, round_nr_fast_path_already_synced=6915, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=276630, tx_hashes_nr=0, rx_hashes_nr=8, duration=986.34015 seconds, tx_row_nr=7, rx_row_nr=0, tx_row_bytes=1246, rx_row_bytes=0, row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_bytes_per_sec={{127.0.57.1, 1.72105e-07}, {127.0.57.2, 1.72105e-07}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}} MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 0.00101385}, {127.0.57.2, 0.00101385}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}} Rows/s, tx_row_nr_peer={{127.0.57.3, 1}, {127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1}, {127.0.57.8, 1}, {127.0.57.9, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[07ebd571-63cb-4ef6-9465-6e5f1e98f04f]: stats: repair_reason=repair, keyspace=system_auth, tables={role_attributes, role_members, roles}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=128, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.6052915 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3, 178}, {127.0.57.4, 178}, {127.0.57.5, 178}, {127.0.57.6, 178}, {127.0.57.7, 178}, {127.0.57.8, 178}, {127.0.57.9, 178}}, row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 1}, {127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1}, {127.0.57.8, 1}, {127.0.57.9, 1}}, row_from_disk_bytes_per_sec={{127.0.57.1, 0.00037793}, {127.0.57.2, 0.00037793}, {127.0.57.3, 0.00037793}, {127.0.57.4, 0.00037793}, {127.0.57.5, 0.00037793}, {127.0.57.6, 0.00037793}, {127.0.57.7, 0.00037793}, {127.0.57.8, 0.00037793}, {127.0.57.9, 0.00037793}} MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 2.22634}, {127.0.57.2, 2.22634}, {127.0.57.3, 2.22634}, {127.0.57.4, 2.22634}, {127.0.57.5, 2.22634}, {127.0.57.6, 2.22634}, {127.0.57.7, 2.22634}, {127.0.57.8, 2.22634}, {127.0.57.9, 2.22634}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to repair is reduced from 986s (16 minutes) to 1.6s *) Summary So, a more than 1000X difference is observed for this common usage of system table repair procedure. Fixes #16011 Refs #15159 Closes #15974 * github.com:scylladb/scylladb: repair: Introduce small table optimization repair: Convert put_row_diff_with_rpc_stream to use coroutine

mykaul · 2024-02-01T14:06:10Z

Closing for the time being - the small table optimization is in 6.0.

yarongilor mentioned this issue Aug 24, 2023

Node setup failed for timeout during repair of system-auth scylladb/scylla-cluster-tests#6511

Closed

2 tasks

mykaul assigned ptrsmrn Aug 27, 2023

mykaul added status/regression triage/master Looking for assignee and removed status/regression labels Aug 27, 2023

DoronArazii added the area/repair label Aug 29, 2023

DoronArazii assigned asias Aug 29, 2023

DoronArazii removed the triage/master Looking for assignee label Aug 31, 2023

asias mentioned this issue Nov 7, 2023

repair: Introduce small table optimization #15974

Merged

mykaul added this to the 6.0 milestone Nov 7, 2023

mykaul added P1 Urgent Field-Tier1 Urgent issues as per FieldEngineering request labels Nov 7, 2023

mykaul added Backport candidate backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed labels Nov 8, 2023

mykaul removed backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed Backport candidate backport/5.4 Issues that should be backported to 5.4 branch once they'll be fixed labels Jan 29, 2024

mykaul closed this as completed Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The "nodetool repair -pr system_auth" command takes 22 minutes (instead of 3 seconds) #15159

The "nodetool repair -pr system_auth" command takes 22 minutes (instead of 3 seconds) #15159

yarongilor commented Aug 24, 2023

Logs:

yarongilor commented Aug 24, 2023

mykaul commented Aug 27, 2023

yarongilor commented Aug 27, 2023 •

edited

ptrsmrn commented Aug 28, 2023 •

edited

ptrsmrn commented Aug 28, 2023

ptrsmrn commented Aug 28, 2023 •

edited

ptrsmrn commented Aug 28, 2023

ptrsmrn commented Aug 30, 2023

asias commented Aug 31, 2023

mykaul commented Aug 31, 2023

asias commented Aug 31, 2023

roydahan commented Sep 11, 2023

betsch85 commented Sep 27, 2023 •

edited

mykaul commented Sep 27, 2023

asias commented Sep 27, 2023

mykaul commented Sep 27, 2023

ptrsmrn commented Oct 5, 2023

mykaul commented Oct 12, 2023

ptrsmrn commented Oct 14, 2023

mykaul commented Feb 1, 2024

The "nodetool repair -pr system_auth" command takes 22 minutes (instead of 3 seconds) #15159

The "nodetool repair -pr system_auth" command takes 22 minutes (instead of 3 seconds) #15159

Comments

yarongilor commented Aug 24, 2023

Issue description

Impact

How frequently does it reproduce?

Installation details

Logs:

yarongilor commented Aug 24, 2023

mykaul commented Aug 27, 2023

yarongilor commented Aug 27, 2023 • edited

ptrsmrn commented Aug 28, 2023 • edited

ptrsmrn commented Aug 28, 2023

ptrsmrn commented Aug 28, 2023 • edited

ptrsmrn commented Aug 28, 2023

ptrsmrn commented Aug 30, 2023

asias commented Aug 31, 2023

mykaul commented Aug 31, 2023

asias commented Aug 31, 2023

roydahan commented Sep 11, 2023

betsch85 commented Sep 27, 2023 • edited

mykaul commented Sep 27, 2023

asias commented Sep 27, 2023

mykaul commented Sep 27, 2023

ptrsmrn commented Oct 5, 2023

mykaul commented Oct 12, 2023

ptrsmrn commented Oct 14, 2023

mykaul commented Feb 1, 2024

yarongilor commented Aug 27, 2023 •

edited

ptrsmrn commented Aug 28, 2023 •

edited

ptrsmrn commented Aug 28, 2023 •

edited

betsch85 commented Sep 27, 2023 •

edited