Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repairing a cluster after a restore causes severe reactor stalls throughout the cluster (due to expensive logging within do_repair_ranges() without yield) #14330

Closed
ShlomiBalalis opened this issue Jun 21, 2023 · 17 comments

Comments

@ShlomiBalalis
Copy link

scylla version: 5.2.1-0.20230508.f1c45553bc29 with build-id 88ac66b1719cc7c5b7e982aa34ba5dc95909b84a
Client version: 3.1.1-0.20230612.401edeb8
Server version: 3.1.1-0.20230612.401edeb8

At first, we execute a simple backup task:

< t:2023-06-21 02:28:41,156 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool backup -c ae6a4daa-956e-4b96-b66f-9c4766907416 --keyspace keyspace1  --location azure:manager-backup-tests-us-east1 " finished with status 0
< t:2023-06-21 02:28:41,156 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > sctool output: backup/6507886b-495d-4627-a3a4-3db78e7a5459
...
< t:2023-06-21 02:29:50,205 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool  -c ae6a4daa-956e-4b96-b66f-9c4766907416 progress backup/6507886b-495d-4627-a3a4-3db78e7a5459" finished with status 0
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > sctool output: Run:               55c90c0a-0fdb-11ee-b1a3-000d3a4dc75e
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Status:           DONE
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Start time:       21 Jun 23 02:28:40 UTC
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > End time: 21 Jun 23 02:28:47 UTC
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Duration: 6s
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Progress: 100%
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Snapshot Tag:     sm_20230621022841UTC
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Datacenters:      
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG >   - eastus
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > 
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > ╭──────────┬──────────┬──────────┬──────────┬──────────────┬────────╮
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > │ Host     │ Progress │     Size │  Success │ Deduplicated │ Failed │
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > ├──────────┼──────────┼──────────┼──────────┼──────────────┼────────┤
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > │ 10.0.0.5 │     100% │ 944.611M │ 944.611M │     944.611M │      0 │
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > │ 10.0.0.6 │     100% │ 944.613M │ 944.613M │     944.613M │      0 │
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > │ 10.0.0.7 │     100% │ 944.614M │ 944.614M │     944.614M │      0 │
< t:2023-06-21 02:29:50,205 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > ╰──────────┴──────────┴──────────┴──────────┴──────────────┴────────╯

Afterwards, we truncate the target keyspace and restore it:

< t:2023-06-21 02:29:53,755 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "cqlsh --no-color   --request-timeout=120 --connect-timeout=60  -e "TRUNCATE keyspace1.standard1" 10.0.0.5 9042" finished with status 0
...
< t:2023-06-21 02:29:58,036 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool restore -c ae6a4daa-956e-4b96-b66f-9c4766907416 --restore-tables --location azure:manager-backup-tests-us-east1  --snapshot-tag sm_20230621022841UTC" finished with status 0
< t:2023-06-21 02:29:58,037 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > sctool output: restore/38a7e5f6-8652-4bfd-8b38-067bf4b7ae80
...
< t:2023-06-21 02:31:12,187 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool  -c ae6a4daa-956e-4b96-b66f-9c4766907416 progress restore/38a7e5f6-8652-4bfd-8b38-067bf4b7ae80" finished with status 0
< t:2023-06-21 02:31:12,187 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > sctool output: Run:               839bf71f-0fdb-11ee-b1a4-000d3a4dc75e
< t:2023-06-21 02:31:12,187 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Status:           DONE - tombstone_gc mode reset and repair required (see restore docs)
< t:2023-06-21 02:31:12,187 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Start time:       21 Jun 23 02:29:57 UTC
< t:2023-06-21 02:31:12,187 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > End time: 21 Jun 23 02:31:07 UTC
< t:2023-06-21 02:31:12,187 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Duration: 1m10s
< t:2023-06-21 02:31:12,187 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Progress: 100% | 100%
< t:2023-06-21 02:31:12,187 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Snapshot Tag:     sm_20230621022841UTC
< t:2023-06-21 02:31:12,187 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > 
< t:2023-06-21 02:31:12,187 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > ╭───────────┬─────────────┬────────┬─────────┬────────────┬────────╮
< t:2023-06-21 02:31:12,187 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > │ Keyspace  │    Progress │   Size │ Success │ Downloaded │ Failed │
< t:2023-06-21 02:31:12,187 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > ├───────────┼─────────────┼────────┼─────────┼────────────┼────────┤
< t:2023-06-21 02:31:12,187 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > │ keyspace1 │ 100% | 100% │ 2.766G │  2.766G │     2.766G │      0 │
< t:2023-06-21 02:31:12,187 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > ╰───────────┴─────────────┴────────┴─────────┴────────────┴────────╯

Eventually, as part of the restore procedure, we repair the cluster once the restore task has ended. Once we did that, however, the entire cluster has suffered multiple severe reactor stalls:

< t:2023-06-21 02:31:18,574 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/usr/bin/nodetool  repair "...
< t:2023-06-21 02:32:13,843 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "/usr/bin/nodetool  repair " finished with status 0
< t:2023-06-21 02:32:13,843 f:cluster.py      l:2572 c:sdcm.cluster_azure   p:DEBUG > Node manager-regression-manager--db-node-0d2064b9-eastus-1 [4.236.176.91 | 10.0.0.5] (seed: True): Command '/usr/bin/nodetool  repair ' duration -> 55.26881028799926 s
< t:2023-06-21 02:32:13,845 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/usr/bin/nodetool  repair "...
< t:2023-06-21 02:32:51,780 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "/usr/bin/nodetool  repair " finished with status 0
< t:2023-06-21 02:32:51,780 f:cluster.py      l:2572 c:sdcm.cluster_azure   p:DEBUG > Node manager-regression-manager--db-node-0d2064b9-eastus-2 [4.236.176.117 | 10.0.0.6] (seed: False): Command '/usr/bin/nodetool  repair ' duration -> 37.93153537800026 s
< t:2023-06-21 02:32:51,781 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/usr/bin/nodetool  repair "...
< t:2023-06-21 02:33:38,466 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "/usr/bin/nodetool  repair " finished with status 0
< t:2023-06-21 02:33:38,467 f:cluster.py      l:2572 c:sdcm.cluster_azure   p:DEBUG > Node manager-regression-manager--db-node-0d2064b9-eastus-3 [4.236.176.206 | 10.0.0.7] (seed: False): Command '/usr/bin/nodetool  repair ' duration -> 46.68516009799987 s
2023-06-21 02:31:25.630 <2023-06-21 02:31:20.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=60993 node=manager-regression-manager--db-node-0d2064b9-eastus-1
2023-06-21T02:31:20+00:00 manager-regression-manager--db-node-eastus-1     !INFO | scylla[7666]: Reactor stalled for 733 ms on shard 0. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0xad053 0x7fb2b869875e 0x7fb2b868edee 0x7fb2b868ef23 0x33f4a34 0x33f490d 0x116b49c 0x3b1b605 0x53851cf 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b2fa58 0x3b26bcb 0x3b0fcba 0x5266e56
2023-06-21 02:33:01.596 <2023-06-21 02:31:32.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=81230 node=manager-regression-manager--db-node-0d2064b9-eastus-1
2023-06-21T02:31:32+00:00 manager-regression-manager--db-node-eastus-1     !INFO | scylla[7666]: Reactor stalled for 1647 ms on shard 3. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x264cf 0x5debd 0x8035b 0x1040e7 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
2023-06-21 02:33:01.115 <2023-06-21 02:31:31.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=80944 node=manager-regression-manager--db-node-0d2064b9-eastus-1
2023-06-21T02:31:31+00:00 manager-regression-manager--db-node-eastus-1     !INFO | scylla[7666]: Reactor stalled for 857 ms on shard 6. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x10de9b 0x104260 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:33:01.564 <2023-06-21 02:31:32.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=81219 node=manager-regression-manager--db-node-0d2064b9-eastus-1
2023-06-21T02:31:32+00:00 manager-regression-manager--db-node-eastus-1     !INFO | scylla[7666]: Reactor stalled for 1632 ms on shard 6. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x87f97 0x1044eb 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:33:01.566 <2023-06-21 02:31:32.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=81221 node=manager-regression-manager--db-node-0d2064b9-eastus-1
2023-06-21T02:31:32+00:00 manager-regression-manager--db-node-eastus-1     !INFO | scylla[7666]: Reactor stalled for 1640 ms on shard 4. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x87eea 0x1044ab 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:33:07.895 <2023-06-21 02:32:38.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=75974 node=manager-regression-manager--db-node-0d2064b9-eastus-2
2023-06-21T02:32:38+00:00 manager-regression-manager--db-node-eastus-2     !INFO | scylla[7693]: Reactor stalled for 699 ms on shard 4. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x87f97 0x1044eb 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:33:07.920 <2023-06-21 02:32:38.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=76008 node=manager-regression-manager--db-node-0d2064b9-eastus-2
2023-06-21T02:32:38+00:00 manager-regression-manager--db-node-eastus-2     !INFO | scylla[7693]: Reactor stalled for 799 ms on shard 6. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x52d14 0x5e3a4 0x79d5a 0x5a174 0x3737cfa 0x20c2fd4 0x20c2d9d 0x116b49c 0x3b1b605 0x53851cf 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:33:07.934 <2023-06-21 02:32:38.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=76019 node=manager-regression-manager--db-node-0d2064b9-eastus-2
2023-06-21T02:32:38+00:00 manager-regression-manager--db-node-eastus-2     !INFO | scylla[7693]: Reactor stalled for 823 ms on shard 3. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x5427f6f
?? ??:0
2023-06-21 02:33:07.935 <2023-06-21 02:32:38.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=76021 node=manager-regression-manager--db-node-0d2064b9-eastus-2
2023-06-21T02:32:38+00:00 manager-regression-manager--db-node-eastus-2     !INFO | scylla[7693]: Reactor stalled for 826 ms on shard 5. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x115b1df 0x115b2cf 0x7fb4f9c52a0c 0x33f4ad7 0x33f490d 0x116b49c 0x3b1b605 0x53851cf 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:33:07.936 <2023-06-21 02:32:38.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=76024 node=manager-regression-manager--db-node-0d2064b9-eastus-2
2023-06-21T02:32:38+00:00 manager-regression-manager--db-node-eastus-2     !INFO | scylla[7693]: Reactor stalled for 831 ms on shard 2. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x115ef1f 0x115bf69 0x33f4bb3 0x33f490d 0x116b49c 0x3b1b605 0x53851cf 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:33:07.949 <2023-06-21 02:32:38.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=76047 node=manager-regression-manager--db-node-0d2064b9-eastus-2
2023-06-21T02:32:38+00:00 manager-regression-manager--db-node-eastus-2     !INFO | scylla[7693]: Reactor stalled for 897 ms on shard 1. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0xa21df 0x7fb4f9c522a4 0x33f4afc 0x33f490d 0x116b49c 0x3b1b605 0x53851cf 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:33:08.022 <2023-06-21 02:32:38.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=76122 node=manager-regression-manager--db-node-0d2064b9-eastus-2
2023-06-21T02:32:38+00:00 manager-regression-manager--db-node-eastus-2     !INFO | scylla[7693]: Reactor stalled for 1120 ms on shard 4. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x10de9b 0x104260 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:33:08.163 <2023-06-21 02:32:38.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=76287 node=manager-regression-manager--db-node-0d2064b9-eastus-2
2023-06-21T02:32:38+00:00 manager-regression-manager--db-node-eastus-2     !INFO | scylla[7693]: Reactor stalled for 1619 ms on shard 6. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x84f76 0x5f6ce 0x8035b 0x1040e7 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:33:08.175 <2023-06-21 02:32:38.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=76295 node=manager-regression-manager--db-node-0d2064b9-eastus-2
2023-06-21T02:32:38+00:00 manager-regression-manager--db-node-eastus-2     !INFO | scylla[7693]: Reactor stalled for 1629 ms on shard 3. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x10de9b 0x104260 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:33:08.178 <2023-06-21 02:32:38.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=76303 node=manager-regression-manager--db-node-0d2064b9-eastus-2
2023-06-21T02:32:38+00:00 manager-regression-manager--db-node-eastus-2     !INFO | scylla[7693]: Reactor stalled for 1646 ms on shard 5. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x7fb4f9c522dd 0x33f4afc 0x33f490d 0x116b49c 0x3b1b605 0x53851cf 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:33:08.179 <2023-06-21 02:32:38.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=76305 node=manager-regression-manager--db-node-0d2064b9-eastus-2
2023-06-21T02:32:38+00:00 manager-regression-manager--db-node-eastus-2     !INFO | scylla[7693]: Reactor stalled for 1646 ms on shard 2. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x7fb4f9a7dd94 0x115bf95 0x36b2411 0x36b217d 0x116b49c 0x3b1b605 0x53851cf 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:33:08.200 <2023-06-21 02:32:39.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=76329 node=manager-regression-manager--db-node-0d2064b9-eastus-2
2023-06-21T02:32:39+00:00 manager-regression-manager--db-node-eastus-2     !INFO | scylla[7693]: Reactor stalled for 1709 ms on shard 1. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x10de9b 0x104260 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:33:08.260 <2023-06-21 02:32:39.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=76373 node=manager-regression-manager--db-node-0d2064b9-eastus-2
2023-06-21T02:32:39+00:00 manager-regression-manager--db-node-eastus-2     !INFO | scylla[7693]: Reactor stalled for 1801 ms on shard 4. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x115ef29 0x115bf69 0x33f4bb3 0x33f490d 0x116b49c 0x3b1b605 0x53851cf 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:34:06.581 <2023-06-21 02:33:23.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=78000 node=manager-regression-manager--db-node-0d2064b9-eastus-3
2023-06-21T02:33:23+00:00 manager-regression-manager--db-node-eastus-3     !INFO | scylla[7787]: Reactor stalled for 881 ms on shard 5. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x84f60 0x5f6ce 0x8035b 0x1040e7 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:34:06.582 <2023-06-21 02:33:23.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=78004 node=manager-regression-manager--db-node-0d2064b9-eastus-3
2023-06-21T02:33:23+00:00 manager-regression-manager--db-node-eastus-3     !INFO | scylla[7787]: Reactor stalled for 892 ms on shard 6. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x115ef19 0x115bf69 0x33f4bb3 0x33f490d 0x116b49c 0x3b1b605 0x53851cf 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:34:06.583 <2023-06-21 02:33:23.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=78006 node=manager-regression-manager--db-node-0d2064b9-eastus-3
2023-06-21T02:33:23+00:00 manager-regression-manager--db-node-eastus-3     !INFO | scylla[7787]: Reactor stalled for 894 ms on shard 1. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0xce2c1 0xcda26 0xcf62b 0x10400b 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:34:06.585 <2023-06-21 02:33:23.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=78011 node=manager-regression-manager--db-node-0d2064b9-eastus-3
2023-06-21T02:33:23+00:00 manager-regression-manager--db-node-eastus-3     !INFO | scylla[7787]: Reactor stalled for 898 ms on shard 4. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x116b4a4 0x3b1b605 0x53851cf 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:34:06.758 <2023-06-21 02:33:24.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=78272 node=manager-regression-manager--db-node-0d2064b9-eastus-3
2023-06-21T02:33:24+00:00 manager-regression-manager--db-node-eastus-3     !INFO | scylla[7787]: Reactor stalled for 1683 ms on shard 5. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x87eea 0x1044ab 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:34:06.762 <2023-06-21 02:33:24.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=78280 node=manager-regression-manager--db-node-0d2064b9-eastus-3
2023-06-21T02:33:24+00:00 manager-regression-manager--db-node-eastus-3     !INFO | scylla[7787]: Reactor stalled for 1698 ms on shard 6. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x9f60f 0x7f19d3a9b7d3 0x7f19d3a91de2 0x7f19d3a91f23 0x3b161b1 0x116b49c 0x3b1b605 0x53851cf 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:34:06.765 <2023-06-21 02:33:24.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=78287 node=manager-regression-manager--db-node-0d2064b9-eastus-3
2023-06-21T02:33:24+00:00 manager-regression-manager--db-node-eastus-3     !INFO | scylla[7787]: Reactor stalled for 1705 ms on shard 1. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x87eea 0x1044ab 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:34:06.767 <2023-06-21 02:33:24.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=78290 node=manager-regression-manager--db-node-0d2064b9-eastus-3
2023-06-21T02:33:24+00:00 manager-regression-manager--db-node-eastus-3     !INFO | scylla[7787]: Reactor stalled for 1712 ms on shard 4. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x116af7f 0x3b1b5f2 0x53851cf 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:34:09.451 <2023-06-21 02:32:04.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=119093 node=manager-regression-manager--db-node-0d2064b9-eastus-1
2023-06-21T02:32:04+00:00 manager-regression-manager--db-node-eastus-1     !INFO | scylla[7666]: Reactor stalled for 844 ms on shard 2. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:34:09.454 <2023-06-21 02:32:04.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=119101 node=manager-regression-manager--db-node-0d2064b9-eastus-1
2023-06-21T02:32:04+00:00 manager-regression-manager--db-node-eastus-1     !INFO | scylla[7666]: Reactor stalled for 863 ms on shard 1. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x87f97 0x1044eb 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:34:09.609 <2023-06-21 02:32:05.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=119362 node=manager-regression-manager--db-node-0d2064b9-eastus-1
2023-06-21T02:32:05+00:00 manager-regression-manager--db-node-eastus-1     !INFO | scylla[7666]: Reactor stalled for 1601 ms on shard 2. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x87eea 0x1044ab 0x104669 0x538520a 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf
?? ??:0
2023-06-21 02:34:09.614 <2023-06-21 02:32:05.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=0012f9ce-53bd-4348-bc5b-0ba257da687e: type=REACTOR_STALLED regex=Reactor stalled line_number=119373 node=manager-regression-manager--db-node-0d2064b9-eastus-1
2023-06-21T02:32:05+00:00 manager-regression-manager--db-node-eastus-1     !INFO | scylla[7666]: Reactor stalled for 1629 ms on shard 1. Backtrace: 0x4fddd33 0x4fdd120 0x4fde500 0x3cb1f 0x115ef1f 0x115bf69 0x33f4bb3 0x33f490d 0x116b49c 0x3b1b605 0x53851cf 0x3af67a9 0x3b1e1e2 0x3af8622 0x3af9f88 0x401579c 0x3b59ad1 0x3b1591a 0x4fee614 0x4fef897 0x5010681 0x4fc097a 0x8b12c 0x10cbbf

Logs:
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/0d2064b9-8c10-4821-ae86-be6b71d050af/20230621_031502/db-cluster-0d2064b9.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/0d2064b9-8c10-4821-ae86-be6b71d050af/20230621_031502/loader-set-0d2064b9.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/0d2064b9-8c10-4821-ae86-be6b71d050af/20230621_031502/monitor-set-0d2064b9.tar.gz
sct - https://cloudius-jenkins-test.s3.amazonaws.com/0d2064b9-8c10-4821-ae86-be6b71d050af/20230621_031502/sct-runner-0d2064b9.tar.gz

@mykaul
Copy link
Contributor

mykaul commented Jun 21, 2023

@ShlomiBalalis - please provide decoded stalls, at least for some. For example the last one, I've never seen it (but I may be unfamiliar with all of course):

[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:797
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:816
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1346
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1123
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1140
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1382
?? ??:0
seastar::internal::log_buf::inserter_iterator::operator=(char) at ././seastar/include/seastar/util/log-impl.hh:76
 (inlined by) seastar::internal::log_buf::inserter_iterator fmt::v9::detail::copy_str<char, char*, seastar::internal::log_buf::inserter_iterator>(char*, char*, seastar::internal::log_buf::inserter_iterator) at /usr/include/fmt/core.h:842
 (inlined by) fmt::v9::detail::iterator_buffer<seastar::internal::log_buf::inserter_iterator, char, fmt::v9::detail::buffer_traits>::flush() at /usr/include/fmt/core.h:985
 (inlined by) fmt::v9::detail::iterator_buffer<seastar::internal::log_buf::inserter_iterator, char, fmt::v9::detail::buffer_traits>::grow(unsigned long) at /usr/include/fmt/core.h:979
fmt::v9::detail::buffer<char>::try_reserve(unsigned long) at /usr/include/fmt/core.h:928
 (inlined by) void fmt::v9::detail::buffer<char>::append<char>(char const*, char const*) at /usr/include/fmt/format.h:775
 (inlined by) fmt::v9::appender fmt::v9::detail::copy_str<char, char const*>(char const*, char const*, fmt::v9::appender) at /usr/include/fmt/core.h:1667
 (inlined by) operator() at /usr/include/fmt/format.h:2194
 (inlined by) fmt::v9::appender fmt::v9::detail::write_padded<(fmt::v9::align::type)1, fmt::v9::appender, char, fmt::v9::detail::write<char, fmt::v9::appender>(fmt::v9::appender, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_specs<char> const&)::{lambda(fmt::v9::appender)#1}>(fmt::v9::appender, fmt::v9::basic_format_specs<char> const&, unsigned long, unsigned long, fmt::v9::detail::write<char, fmt::v9::appender>(fmt::v9::appender, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_specs<char> const&)::{lambda(fmt::v9::appender)#1}&&) at /usr/include/fmt/format.h:1665
 (inlined by) fmt::v9::appender fmt::v9::detail::write<char, fmt::v9::appender>(fmt::v9::appender, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_specs<char> const&) at /usr/include/fmt/format.h:2191
fmt::v9::appender fmt::v9::detail::write<char, fmt::v9::appender>(fmt::v9::appender, fmt::v9::basic_string_view<fmt::v9::type_identity<char>::type>, fmt::v9::basic_format_specs<char> const&, fmt::v9::detail::locale_ref) at /usr/include/fmt/format.h:2203
 (inlined by) decltype (({parm#2}.out)()) fmt::v9::formatter<fmt::v9::basic_string_view<char>, char, void>::format<fmt::v9::basic_format_context<fmt::v9::appender, char> >(fmt::v9::basic_string_view<char> const&, fmt::v9::basic_format_context<fmt::v9::appender, char>&) const at /usr/include/fmt/format.h:3727
 (inlined by) fmt::v9::appender fmt::v9::basic_ostream_formatter<char>::format<std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > >, fmt::v9::appender>(std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > > const&, fmt::v9::basic_format_context<fmt::v9::appender, char>&) const at /usr/include/fmt/ostream.h:149
void fmt::v9::detail::value<fmt::v9::basic_format_context<fmt::v9::appender, char> >::format_custom_arg<std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > >, fmt::v9::detail::fallback_formatter<std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > >, char, void> >(void*, fmt::v9::basic_format_parse_context<char, fmt::v9::detail::error_handler>&, fmt::v9::basic_format_context<fmt::v9::appender, char>&) at /usr/include/fmt/core.h:1314
fmt::v9::basic_format_arg<fmt::v9::basic_format_context<fmt::v9::appender, char> >::handle::format(fmt::v9::basic_format_parse_context<char, fmt::v9::detail::error_handler>&, fmt::v9::basic_format_context<fmt::v9::appender, char>&) const at /usr/include/fmt/core.h:1594
 (inlined by) fmt::v9::detail::default_arg_formatter<char>::operator()(fmt::v9::basic_format_arg<fmt::v9::basic_format_context<fmt::v9::appender, char> >::handle) at /usr/include/fmt/format.h:3379
 (inlined by) decltype ({parm#1}(0)) fmt::v9::visit_format_arg<fmt::v9::detail::default_arg_formatter<char>, fmt::v9::basic_format_context<fmt::v9::appender, char> >(fmt::v9::detail::default_arg_formatter<char>&&, fmt::v9::basic_format_arg<fmt::v9::basic_format_context<fmt::v9::appender, char> > const&) at /usr/include/fmt/core.h:1658
 (inlined by) on_replacement_field at /usr/include/fmt/format.h:4110
 (inlined by) _ZN3fmt2v96detail23parse_replacement_fieldIcRZNS1_10vformat_toIcEEvRNS1_6bufferIT_EENS0_17basic_string_viewIS5_EENS0_17basic_format_argsINS0_20basic_format_contextINSt11conditionalIXsr3std7is_sameINS0_13type_identityIS5_E4typeEcEE5valueENS0_8appenderESt20back_insert_iteratorINS4_ISF_EEEE4typeESF_EEEENS1_10locale_refEE14format_handlerEEPKS5_SS_SS_OT0_ at /usr/include/fmt/core.h:2653
_ZN3fmt2v96detail19parse_format_stringILb0EcZNS1_10vformat_toIcEEvRNS1_6bufferIT_EENS0_17basic_string_viewIS5_EENS0_17basic_format_argsINS0_20basic_format_contextINSt11conditionalIXsr3std7is_sameINS0_13type_identityIS5_E4typeEcEE5valueENS0_8appenderESt20back_insert_iteratorINS4_ISF_EEEE4typeESF_EEEENS1_10locale_refEE14format_handlerEEvNS8_IT0_EEOT1_ at /usr/include/fmt/core.h:2722
 (inlined by) _ZN3fmt2v96detail10vformat_toIcEEvRNS1_6bufferIT_EENS0_17basic_string_viewIS4_EENS0_17basic_format_argsINS0_20basic_format_contextINSt11conditionalIXsr3std7is_sameINS0_13type_identityIS4_E4typeEcEE5valueENS0_8appenderESt20back_insert_iteratorINS3_ISE_EEEE4typeESE_EEEENS1_10locale_refE at /usr/include/fmt/format.h:4136
 (inlined by) seastar::internal::log_buf::inserter_iterator fmt::v9::vformat_to<seastar::internal::log_buf::inserter_iterator, 0>(seastar::internal::log_buf::inserter_iterator, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::appender, char> >) at /usr/include/fmt/core.h:3215
 (inlined by) seastar::internal::log_buf::inserter_iterator fmt::v9::format_to<seastar::internal::log_buf::inserter_iterator, utils::tagged_uuid<tasks::task_id_tag>, int&, unsigned long, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > > const&, nonwrapping_interval<dht::token> const&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, char const*&, 0>(seastar::internal::log_buf::inserter_iterator, fmt::v9::basic_format_string<char, fmt::v9::type_identity<utils::tagged_uuid<tasks::task_id_tag> >::type, fmt::v9::type_identity<int&>::type, fmt::v9::type_identity<unsigned long>::type, fmt::v9::type_identity<unsigned int>::type, fmt::v9::type_identity<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&>::type, fmt::v9::type_identity<std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > > const&>::type, fmt::v9::type_identity<nonwrapping_interval<dht::token> const&>::type, fmt::v9::type_identity<std::vector<gms::inet_address, std::allocator<gms::inet_address> >&>::type, fmt::v9::type_identity<std::vector<gms::inet_address, std::allocator<gms::inet_address> >&>::type, fmt::v9::type_identity<char const*&>::type>, utils::tagged_uuid<tasks::task_id_tag>&&, int&, unsigned long&&, unsigned int&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > > const&, nonwrapping_interval<dht::token> const&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, char const*&) at /usr/include/fmt/core.h:3235
 (inlined by) operator() at ././seastar/include/seastar/util/log.hh:222
 (inlined by) seastar::logger::lambda_log_writer<seastar::logger::log<utils::tagged_uuid<tasks::task_id_tag>, int&, unsigned long, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > > const&, nonwrapping_interval<dht::token> const&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, char const*&>(seastar::log_level, seastar::logger::format_info, utils::tagged_uuid<tasks::task_id_tag>&&, int&, unsigned long&&, unsigned int&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > > const&, nonwrapping_interval<dht::token> const&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, char const*&)::{lambda(seastar::internal::log_buf::inserter_iterator)#1}>::operator()(seastar::internal::log_buf::inserter_iterator) at ././seastar/include/seastar/util/log.hh:108
operator() at ./build/release/seastar/./seastar/src/util/log.cc:323
 (inlined by) seastar::logger::do_log(seastar::log_level, seastar::logger::log_writer&) at ./build/release/seastar/./seastar/src/util/log.cc:343
void seastar::logger::log<utils::tagged_uuid<tasks::task_id_tag>, int&, unsigned long, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > > const&, nonwrapping_interval<dht::token> const&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, char const*&>(seastar::log_level, seastar::logger::format_info, utils::tagged_uuid<tasks::task_id_tag>&&, int&, unsigned long&&, unsigned int&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > > const&, nonwrapping_interval<dht::token> const&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, char const*&) at ././seastar/include/seastar/util/log.hh:227
 (inlined by) void seastar::logger::warn<utils::tagged_uuid<tasks::task_id_tag>, int&, unsigned long, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > > const&, nonwrapping_interval<dht::token> const&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, char const*&>(seastar::logger::format_info, utils::tagged_uuid<tasks::task_id_tag>&&, int&, unsigned long&&, unsigned int&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > > const&, nonwrapping_interval<dht::token> const&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, char const*&) at ././seastar/include/seastar/util/log.hh:335
 (inlined by) shard_repair_task_impl::repair_range(nonwrapping_interval<dht::token> const&, utils::tagged_uuid<table_id_tag>) at ./repair/repair.cc:626
operator() at ./repair/repair.cc:911
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<shard_repair_task_impl::do_repair_ranges()::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}>(nonwrapping_interval<dht::token>&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) auto seastar::futurize_invoke<shard_repair_task_impl::do_repair_ranges()::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}>(nonwrapping_interval<dht::token>&) at ././seastar/include/seastar/core/future.hh:2178
 (inlined by) operator()<seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock> > at ././seastar/include/seastar/core/semaphore.hh:682
seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<seastar::with_semaphore<seastar::named_semaphore_exception_factory, shard_repair_task_impl::do_repair_ranges()::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}, std::chrono::_V2::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&, unsigned long, std::invoke_result&&)::{lambda(auto:1)#1}, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2> >(nonwrapping_interval<dht::token>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2>&&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) std::invoke_result seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock> >::then_impl<seastar::with_semaphore<seastar::named_semaphore_exception_factory, shard_repair_task_impl::do_repair_ranges()::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}, std::chrono::_V2::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&, unsigned long, std::invoke_result&&)::{lambda(auto:1)#1}, seastar::future<void> >(nonwrapping_interval<dht::token>&) at ././seastar/include/seastar/core/future.hh:1613
 (inlined by) seastar::internal::future_result<seastar::with_semaphore<seastar::named_semaphore_exception_factory, shard_repair_task_impl::do_repair_ranges()::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}, std::chrono::_V2::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&, unsigned long, std::invoke_result&&)::{lambda(auto:1)#1}, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock> >::future_type seastar::internal::call_then_impl<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock> > >::run<seastar::with_semaphore<seastar::named_semaphore_exception_factory, shard_repair_task_impl::do_repair_ranges()::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}, std::chrono::_V2::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&, unsigned long, std::invoke_result&&)::{lambda(auto:1)#1}>(seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock> >&, nonwrapping_interval<dht::token>&) at ././seastar/include/seastar/core/future.hh:1246
 (inlined by) std::invoke_result seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock> >::then<seastar::with_semaphore<seastar::named_semaphore_exception_factory, shard_repair_task_impl::do_repair_ranges()::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}, std::chrono::_V2::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&, unsigned long, std::invoke_result&&)::{lambda(auto:1)#1}, seastar::future<void> >(nonwrapping_interval<dht::token>&) at ././seastar/include/seastar/core/future.hh:1532
 (inlined by) seastar::futurize<std::invoke_result<shard_repair_task_impl::do_repair_ranges()::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}>::type>::type seastar::with_semaphore<seastar::named_semaphore_exception_factory, shard_repair_task_impl::do_repair_ranges()::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}, std::chrono::_V2::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&, unsigned long, std::invoke_result&&) at ././seastar/include/seastar/core/semaphore.hh:681
 (inlined by) operator()<nonwrapping_interval<dht::token> &> at ./repair/repair.cc:910
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<shard_repair_task_impl::do_repair_ranges()::$_15&, nonwrapping_interval<dht::token>&>(shard_repair_task_impl::do_repair_ranges()::$_15&, nonwrapping_interval<dht::token>&) at ././seastar/include/seastar/core/future.hh:2147
 (inlined by) auto seastar::futurize_invoke<shard_repair_task_impl::do_repair_ranges()::$_15&, nonwrapping_interval<dht::token>&>(shard_repair_task_impl::do_repair_ranges()::$_15&, nonwrapping_interval<dht::token>&) at ././seastar/include/seastar/core/future.hh:2178
 (inlined by) parallel_for_each<__gnu_cxx::__normal_iterator<nonwrapping_interval<dht::token> *, std::vector<nonwrapping_interval<dht::token>, std::allocator<nonwrapping_interval<dht::token> > > >, __gnu_cxx::__normal_iterator<nonwrapping_interval<dht::token> *, std::vector<nonwrapping_interval<dht::token>, std::allocator<nonwrapping_interval<dht::token> > > >, (lambda at repair/repair.cc:909:55)> at ././seastar/include/seastar/coroutine/parallel_for_each.hh:121
 (inlined by) parallel_for_each<std::vector<nonwrapping_interval<dht::token>, std::allocator<nonwrapping_interval<dht::token> > > &, (lambda at repair/repair.cc:909:55)> at ././seastar/include/seastar/coroutine/parallel_for_each.hh:143
 (inlined by) shard_repair_task_impl::do_repair_ranges() at ./repair/repair.cc:909
shard_repair_task_impl::run() at ./repair/repair.cc:958
tasks::task_manager::task::impl::run_to_completion() at ./tasks/task_manager.cc:76
 (inlined by) tasks::task_manager::task::start() at ./tasks/task_manager.cc:157
start_repair_task(std::unique_ptr<tasks::task_manager::task::impl, std::default_delete<tasks::task_manager::task::impl> >, seastar::shared_ptr<repair_module>, tasks::task_info) at ./repair/repair.cc:896
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<seastar::lw_shared_ptr<tasks::task_manager::task> >::promise_type>::resume() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/coroutine:244
 (inlined by) seastar::internal::coroutine_traits_base<seastar::lw_shared_ptr<tasks::task_manager::task> >::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:78
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2509
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2946
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:3115
operator() at ./build/release/seastar/./seastar/src/core/reactor.cc:4326
 (inlined by) void std::__invoke_impl<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(std::__invoke_other, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:61
 (inlined by) std::enable_if<is_invocable_r_v<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>, void>::type std::__invoke_r<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&>(seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/invoke.h:111
 (inlined by) std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_94>::_M_invoke(std::_Any_data const&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:290
std::function<void ()>::operator()() const at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/std_function.h:591
 (inlined by) seastar::posix_thread::start_routine(void*) at ./build/release/seastar/./seastar/src/core/posix.cc:73
?? ??:0
?? ??:0

@ShlomiBalalis
Copy link
Author

The issue was reproduced in the AWS job as well:
scylla version: 2022.2.8-0.20230612.42d16ffb3a4c with build-id 39872d51924dbad16fdeba9b92a6bf036e463e9c
Client version: 3.1.1-0.20230612.401edeb8
Server version: 3.1.1-0.20230612.401edeb8

< t:2023-06-21 02:41:48,341 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool backup -c 752114f5-a229-461b-925d-864248af08e4 --keyspace keyspace1  --location s3:manager-backup-tests-us-east-1 " finished with status 0
< t:2023-06-21 02:41:48,341 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > sctool output: backup/fa7deb66-d6e5-484a-ad76-8df3dcd995d6
...
< t:2023-06-21 02:42:57,083 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool  -c 752114f5-a229-461b-925d-864248af08e4 progress backup/fa7deb66-d6e5-484a-ad76-8df3dcd995d6" finished with status 0
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > sctool output: Run:               2afbcbc2-0fdd-11ee-8782-0294f37910fd
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Status:           DONE
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Start time:       21 Jun 23 02:41:47 UTC
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > End time: 21 Jun 23 02:42:02 UTC
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Duration: 14s
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Progress: 100%
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Snapshot Tag:     sm_20230621024149UTC
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Datacenters:      
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG >   - us-eastscylla_node_east
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG >   - us-west-2scylla_node_west
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > 
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > ╭─────────────┬──────────┬──────────┬──────────┬──────────────┬────────╮
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > │ Host        │ Progress │     Size │  Success │ Deduplicated │ Failed │
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > ├─────────────┼──────────┼──────────┼──────────┼──────────────┼────────┤
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > │ 10.12.0.131 │     100% │ 945.714M │ 945.714M │     945.714M │      0 │
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > │ 10.12.1.34  │     100% │ 945.716M │ 945.716M │     945.716M │      0 │
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > │ 10.15.1.141 │     100% │ 945.717M │ 945.717M │     945.717M │      0 │
< t:2023-06-21 02:42:57,083 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > ╰─────────────┴──────────┴──────────┴──────────┴──────────────┴────────╯

Afterwards, we truncate the keyspace and execute a restore:

< t:2023-06-21 02:42:59,660 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "cqlsh --no-color   --request-timeout=120 --connect-timeout=60  -e "TRUNCATE keyspace1.standard1" 10.12.0.131 9042"...
< t:2023-06-21 02:43:00,661 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "cqlsh --no-color   --request-timeout=120 --connect-timeout=60  -e "TRUNCATE keyspace1.standard1" 10.12.0.131 9042" finished with status 0

< t:2023-06-21 02:43:06,141 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool restore -c 752114f5-a229-461b-925d-864248af08e4 --restore-tables --location s3:manager-backup-tests-us-east-1  --snapshot-tag sm_20230621024149UTC" finished with status 0
< t:2023-06-21 02:43:06,142 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > sctool output: restore/e0ed8a46-93fd-40eb-88f0-f9a73e791564
...
< t:2023-06-21 02:44:54,664 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "sudo sctool  -c 752114f5-a229-461b-925d-864248af08e4 progress restore/e0ed8a46-93fd-40eb-88f0-f9a73e791564" finished with status 0
< t:2023-06-21 02:44:54,665 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > sctool output: Run:               595b3f8a-0fdd-11ee-8783-0294f37910fd
< t:2023-06-21 02:44:54,665 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Status:           DONE - tombstone_gc mode reset and repair required (see restore docs)
< t:2023-06-21 02:44:54,665 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Start time:       21 Jun 23 02:43:05 UTC
< t:2023-06-21 02:44:54,665 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > End time: 21 Jun 23 02:44:48 UTC
< t:2023-06-21 02:44:54,665 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Duration: 1m42s
< t:2023-06-21 02:44:54,665 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Progress: 100% | 100%
< t:2023-06-21 02:44:54,665 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > Snapshot Tag:     sm_20230621024149UTC
< t:2023-06-21 02:44:54,665 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > 
< t:2023-06-21 02:44:54,665 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > ╭───────────┬─────────────┬────────┬─────────┬────────────┬────────╮
< t:2023-06-21 02:44:54,665 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > │ Keyspace  │    Progress │   Size │ Success │ Downloaded │ Failed │
< t:2023-06-21 02:44:54,665 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > ├───────────┼─────────────┼────────┼─────────┼────────────┼────────┤
< t:2023-06-21 02:44:54,665 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > │ keyspace1 │ 100% | 100% │ 2.770G │  2.770G │     2.770G │      0 │
< t:2023-06-21 02:44:54,665 f:cli.py          l:1110 c:sdcm.mgmt.cli        p:DEBUG > ╰───────────┴─────────────┴────────┴─────────┴────────────┴────────╯

And then we repair the cluster, and all of the nodes in the cluster suffer from reactor stalls. Only that in this run, the repairs took significantly longer (Over 80 minutes overall):

< t:2023-06-21 02:45:43,645 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/usr/bin/nodetool  repair "...
< t:2023-06-21 03:15:08,916 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "/usr/bin/nodetool  repair " finished with status 0
< t:2023-06-21 03:15:08,916 f:cluster.py      l:2572 c:sdcm.cluster_aws     p:DEBUG > Node manager-regression-manager--db-node-c8db16f3-1 [44.214.141.205 | 10.12.0.131] (seed: True): Command '/usr/bin/nodetool  repair ' duration -> 1765.270760808 s
< t:2023-06-21 03:19:37,708 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/usr/bin/nodetool  repair "...
< t:2023-06-21 03:43:06,447 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "/usr/bin/nodetool  repair " finished with status 0
< t:2023-06-21 03:43:06,447 f:cluster.py      l:2572 c:sdcm.cluster_aws     p:DEBUG > Node manager-regression-manager--db-node-c8db16f3-2 [44.195.66.107 | 10.12.1.34] (seed: False): Command '/usr/bin/nodetool  repair ' duration -> 1408.7381757499998 s
< t:2023-06-21 03:44:16,759 f:remote_base.py  l:520  c:RemoteLibSSH2CmdRunner p:DEBUG > Running command "/usr/bin/nodetool  repair "...
< t:2023-06-21 04:18:02,820 f:base.py         l:142  c:RemoteLibSSH2CmdRunner p:DEBUG > Command "/usr/bin/nodetool  repair " finished with status 0
< t:2023-06-21 04:18:02,820 f:cluster.py      l:2572 c:sdcm.cluster_aws     p:DEBUG > Node manager-regression-manager--db-node-c8db16f3-3 [35.91.34.190 | 10.15.1.141] (seed: False): Command '/usr/bin/nodetool  repair ' duration -> 2026.0596863109986 s
2023-06-21T02:45:04+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 807 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x10e52cf 0x2d30400 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x380fa86 0x3801b48 0x52624c1
2023-06-21 02:45:10.476 <2023-06-21 02:45:05.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=45844 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:45:05+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 1460 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x2f8fc74 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x380fa86 0x3801b48 0x52624c1
2023-06-21 02:45:13.998 <2023-06-21 02:45:07.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=58317 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:45:07+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 545 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x1019fb 0xfa721 0xfaa69 0x5388187 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x38227bb 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
?? ??:0
2023-06-21 02:45:14.175 <2023-06-21 02:45:07.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=58957 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:45:07+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 854 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x7f9668069f39 0x2d3032c 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x38227bb 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
?? ??:0
2023-06-21 02:48:04.434 <2023-06-21 02:46:56.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=72749 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:46:56+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 632 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x7f9668069ae8 0x7f966806a23b 0x2d30307 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 02:48:08.063 <2023-06-21 02:46:56.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=84721 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:46:56+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 1270 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x7f966806a373 0x1b38279 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 02:48:11.587 <2023-06-21 02:47:02.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=97169 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:47:02+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 887 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x7f9668069eaf 0x2441160 0x12a5955 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
2023-06-21 02:48:11.578 <2023-06-21 02:47:02.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=97121 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:47:02+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 594 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x10e52ee 0x2d30400 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
_start at ??:?
2023-06-21 02:49:08.632 <2023-06-21 02:48:32.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=110408 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:48:32+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 638 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x7f9668069ef6 0x2d3032c 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 02:49:12.045 <2023-06-21 02:48:33.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=122391 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:48:33+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 1416 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x10e5446 0x7f966806a373 0x2d30307 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 02:49:15.598 <2023-06-21 02:48:52.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=134922 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:48:52+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 861 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x83c7f 0x6be15 0x782f3 0x58d37 0x2ff624c 0x1b38267 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
2023-06-21 02:49:15.588 <2023-06-21 02:48:52.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=134874 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:48:52+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 569 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x10e53df 0x7f9668069ef6 0x2441160 0x12a5955 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
_start at ??:?
2023-06-21 02:52:51.358 <2023-06-21 02:52:38.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=198557 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:52:38+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 678 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x1019fb 0xfa721 0xfaa69 0x5388187 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
?? ??:0
2023-06-21 02:52:51.368 <2023-06-21 02:52:38.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=198606 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:52:38+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 971 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0xc0d5f 0xc0db6 0x2f8fc74 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
?? ??:0
2023-06-21 02:53:48.845 <2023-06-21 02:53:45.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=211983 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:53:45+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 640 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x37f0240 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 02:53:52.219 <2023-06-21 02:53:46.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=223929 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:53:46+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 1329 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x7f966806a22c 0x2d30307 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 02:57:55.230 <2023-06-21 02:57:52.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=237564 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:57:52+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 667 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x10e52de 0x2d30400 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 02:57:58.705 <2023-06-21 02:57:52.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=249551 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:57:52+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 1284 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x10e52f2 0x2d30400 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 02:58:02.313 <2023-06-21 02:57:55.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=262337 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:57:55+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 806 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x10e5437 0x7f966806a373 0x2d30307 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
2023-06-21 02:58:02.303 <2023-06-21 02:57:55.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=262288 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T02:57:55+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 532 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x4f966f2 0x5388101 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
_start at ??:?
2023-06-21 03:01:59.659 <2023-06-21 03:01:56.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=301274 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:01:56+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 608 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x10e52ee 0x2d30400 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:02:03.396 <2023-06-21 03:01:57.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=313112 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:01:57+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 1323 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x10e545b 0x7f9668069ef6 0x2d3032c 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:02:07.039 <2023-06-21 03:02:02.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=326003 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:02:02+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 622 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0xabd5c 0x7f9668059a81 0x7f966804ce67 0x7f966804d2a3 0x2f8fcce 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
_start at ??:?
2023-06-21 03:02:07.048 <2023-06-21 03:02:02.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=326051 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:02:02+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 919 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0xc0da5 0x12a58a6 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
seastar::noncopyable_function<void ()>::operator()() const at ./build/release/seastar/./seastar/include/seastar/util/noncopyable_function.hh:209
 (inlined by) seastar::thread_context::main() at ./build/release/seastar/./seastar/src/core/thread.cc:299
2023-06-21 03:03:00.993 <2023-06-21 03:02:37.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=339147 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:02:37+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 635 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x7f9668069eaf 0x2d3032c 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:03:04.369 <2023-06-21 03:02:38.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=351119 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:02:38+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 1275 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0xbf26f 0x1b381a4 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:05:22.202 <2023-06-21 03:05:19.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=401270 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:05:19+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 619 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0xbc01 0x10009 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:05:26.135 <2023-06-21 03:05:20.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=412912 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:05:20+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 1401 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x7f966806a23b 0x2d30307 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:05:30.248 <2023-06-21 03:05:25.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=427465 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:05:25+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 503 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x7f966806a256 0x2d30307 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
_start at ??:?
2023-06-21 03:05:30.258 <2023-06-21 03:05:25.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=427513 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:05:25+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 785 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0xa107f 0x7f966806a23b 0x2d30307 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
?? ??:0
2023-06-21 03:10:48.863 <2023-06-21 03:10:46.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=441094 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:10:46+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 624 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0xc2ecf 0xc507b 0xfa544 0xfaa69 0x5388187 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
2023-06-21 03:10:53.936 <2023-06-21 03:10:48.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=459024 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:10:48+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 715 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x7f9667eab78f 0x10e5455 0x7f9668069ef6 0x2d3032c 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:10:54.183 <2023-06-21 03:10:48.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=459867 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:10:48+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 1635 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x7f9667eab951 0x10e5386 0x10e542f 0x7f9668069ef6 0x2d3032c 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
2023-06-21 03:10:55.803 <2023-06-21 03:10:48.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=465546 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:10:48+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 1103 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0xa0473 0x7f966804d294 0x10e4081 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
?? ??:0
2023-06-21 03:10:55.813 <2023-06-21 03:10:48.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=465594 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:10:48+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 1397 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x7f9667eab95b 0x10e5386 0x10e542f 0x7f9668069ef6 0x2d3032c 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
seastar::noncopyable_function<void ()>::operator()() const at ./build/release/seastar/./seastar/include/seastar/util/noncopyable_function.hh:209
 (inlined by) seastar::thread_context::main() at ./build/release/seastar/./seastar/src/core/thread.cc:299
2023-06-21 03:12:42.629 <2023-06-21 03:12:39.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=478930 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:12:39+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 662 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x10e5766 0x10e5300 0x2d30400 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
2023-06-21 03:12:45.981 <2023-06-21 03:12:40.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=490705 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:12:40+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 1422 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x10e5766 0x11895 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
2023-06-21 03:12:49.896 <2023-06-21 03:12:42.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=503582 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:12:42+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 871 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x6d3f8 0x782f3 0x58d37 0x2ff624c 0x1b38267 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:12:49.886 <2023-06-21 03:12:42.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=503533 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:12:42+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 584 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x541c97f
?? ??:0
2023-06-21 03:15:18.095 <2023-06-21 03:15:10.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=540746 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:15:10+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 573 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x10e52de 0x2d30400 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x38227bb 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:15:18.273 <2023-06-21 03:15:10.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=541389 node=manager-regression-manager--db-node-c8db16f3-1
2023-06-21T03:15:10+00:00 manager-regression-manager--db-node-c8db16f3-1     !INFO | scylla[7846]: Reactor stalled for 859 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7f9668c8aa1f 0x10e5455 0x7f966806a373 0x2d30307 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x38227bb 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:18:51.672 <2023-06-21 03:18:45.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=59099 node=manager-regression-manager--db-node-c8db16f3-2
2023-06-21T03:18:45+00:00 manager-regression-manager--db-node-c8db16f3-2     !INFO | scylla[7767]: Reactor stalled for 543 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fe3edc73a1f 0x10e5768 0x10e5300 0x2d30400 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x38227bb 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:18:51.848 <2023-06-21 03:18:45.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=59731 node=manager-regression-manager--db-node-c8db16f3-2
2023-06-21T03:18:45+00:00 manager-regression-manager--db-node-c8db16f3-2     !INFO | scylla[7767]: Reactor stalled for 841 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fe3edc73a1f 0x381baeb 0x38197e7 0x38187f6 0x38227bb 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
?? ??:0
2023-06-21 03:27:20.301 <2023-06-21 03:27:09.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=99393 node=manager-regression-manager--db-node-c8db16f3-2
2023-06-21T03:27:09+00:00 manager-regression-manager--db-node-c8db16f3-2     !INFO | scylla[7767]: Reactor stalled for 923 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fe3edc73a1f 0x7fe3ece947f8 0x10e5455 0x7fe3ed053373 0x2d30307 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
2023-06-21 03:27:20.291 <2023-06-21 03:27:09.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=99344 node=manager-regression-manager--db-node-c8db16f3-2
2023-06-21T03:27:09+00:00 manager-regression-manager--db-node-c8db16f3-2     !INFO | scylla[7767]: Reactor stalled for 619 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fe3edc73a1f 0x10e5437 0x7fe3ed053373 0x2d30307 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
_start at ??:?
2023-06-21 03:32:30.648 <2023-06-21 03:32:27.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=112838 node=manager-regression-manager--db-node-c8db16f3-2
2023-06-21T03:32:27+00:00 manager-regression-manager--db-node-c8db16f3-2     !INFO | scylla[7767]: Reactor stalled for 684 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fe3edc73a1f 0xae88a 0xabd72 0x7fe3ed041e0e 0x7fe3ed035e73 0x7fe3ed0362a3 0x10e4081 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:32:34.013 <2023-06-21 03:32:28.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=124787 node=manager-regression-manager--db-node-c8db16f3-2
2023-06-21T03:32:28+00:00 manager-regression-manager--db-node-c8db16f3-2     !INFO | scylla[7767]: Reactor stalled for 1365 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fe3edc73a1f 0x303e824 0x303ec90 0x37ecbcf 0x37efdef 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:33:15.920 <2023-06-21 03:33:02.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=138417 node=manager-regression-manager--db-node-c8db16f3-2
2023-06-21T03:33:02+00:00 manager-regression-manager--db-node-c8db16f3-2     !INFO | scylla[7767]: Reactor stalled for 649 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fe3edc73a1f 0x1019fb 0xfa721 0xfaa69 0x5388187 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
?? ??:0
2023-06-21 03:33:19.947 <2023-06-21 03:33:04.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=150109 node=manager-regression-manager--db-node-c8db16f3-2
2023-06-21T03:33:04+00:00 manager-regression-manager--db-node-c8db16f3-2     !INFO | scylla[7767]: Reactor stalled for 1412 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fe3edc73a1f 0x7fe3ed05334e 0x2d30307 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
2023-06-21 03:33:23.636 <2023-06-21 03:33:06.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=163179 node=manager-regression-manager--db-node-c8db16f3-2
2023-06-21T03:33:06+00:00 manager-regression-manager--db-node-c8db16f3-2     !INFO | scylla[7767]: Reactor stalled for 640 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fe3edc73a1f 0x10e53e0 0x7fe3ed052ef6 0x10e40e7 0xf59a 0x11821 0x4f966cb 0x538813c 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x38227bb 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
?? ??:0
2023-06-21 03:43:21.527 <2023-06-21 03:43:07.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=239898 node=manager-regression-manager--db-node-c8db16f3-2
2023-06-21T03:43:07+00:00 manager-regression-manager--db-node-c8db16f3-2     !INFO | scylla[7767]: Reactor stalled for 559 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fe3edc73a1f 0x2d30312 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x38227bb 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:43:21.912 <2023-06-21 03:43:07.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=240462 node=manager-regression-manager--db-node-c8db16f3-2
2023-06-21T03:43:07+00:00 manager-regression-manager--db-node-c8db16f3-2     !INFO | scylla[7767]: Reactor stalled for 867 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fe3edc73a1f 0xf056 0x117c4 0x24410ab 0x12a5955 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x38227bb 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:43:17.000 <2023-06-21 03:43:11.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=35467 node=manager-regression-manager--db-node-c8db16f3-3
2023-06-21T03:43:11+00:00 manager-regression-manager--db-node-c8db16f3-3     !INFO | scylla[7909]: Reactor stalled for 691 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0x10e52de 0x2d30400 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x380fa86 0x3801b48 0x52624c1
2023-06-21 03:43:26.892 <2023-06-21 03:43:12.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=47579 node=manager-regression-manager--db-node-c8db16f3-3
2023-06-21T03:43:12+00:00 manager-regression-manager--db-node-c8db16f3-3     !INFO | scylla[7909]: Reactor stalled for 1362 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0x1189d 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x380fa86 0x3801b48 0x52624c1
2023-06-21 03:43:45.102 <2023-06-21 03:43:13.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=59342 node=manager-regression-manager--db-node-c8db16f3-3
2023-06-21T03:43:13+00:00 manager-regression-manager--db-node-c8db16f3-3     !INFO | scylla[7909]: Reactor stalled for 624 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0x7fae74302ae8 0x7fae74302ed4 0x2d3032c 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x38227bb 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:43:46.489 <2023-06-21 03:43:13.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=60116 node=manager-regression-manager--db-node-c8db16f3-3
2023-06-21T03:43:13+00:00 manager-regression-manager--db-node-c8db16f3-3     !INFO | scylla[7909]: Reactor stalled for 945 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0x7fae74302ed4 0x2d3032c 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x38227bb 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:46:37.195 <2023-06-21 03:44:30.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=73486 node=manager-regression-manager--db-node-c8db16f3-3
2023-06-21T03:44:30+00:00 manager-regression-manager--db-node-c8db16f3-3     !INFO | scylla[7909]: Reactor stalled for 648 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0x10e5762 0x10e5300 0x2d30400 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:46:40.726 <2023-06-21 03:44:31.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=86020 node=manager-regression-manager--db-node-c8db16f3-3
2023-06-21T03:44:31+00:00 manager-regression-manager--db-node-c8db16f3-3     !INFO | scylla[7909]: Reactor stalled for 1272 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0xfa97b 0xfaa69 0x5388187 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
?? ??:0
2023-06-21 03:55:59.071 <2023-06-21 03:55:55.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=111268 node=manager-regression-manager--db-node-c8db16f3-3
2023-06-21T03:55:55+00:00 manager-regression-manager--db-node-c8db16f3-3     !INFO | scylla[7909]: Reactor stalled for 590 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0x7fae74302ae8 0x7fae7430323b 0x2d30307 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:56:02.405 <2023-06-21 03:55:56.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=123116 node=manager-regression-manager--db-node-c8db16f3-3
2023-06-21T03:55:56+00:00 manager-regression-manager--db-node-c8db16f3-3     !INFO | scylla[7909]: Reactor stalled for 1290 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0x2650f 0xc3381 0xc3867 0xc507b 0xfa544 0xfaa69 0x5388187 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
2023-06-21 03:56:06.230 <2023-06-21 03:55:57.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=135731 node=manager-regression-manager--db-node-c8db16f3-3
2023-06-21T03:55:57+00:00 manager-regression-manager--db-node-c8db16f3-3     !INFO | scylla[7909]: Reactor stalled for 1036 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0x10e52ee 0x2d30400 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
2023-06-21 03:56:06.220 <2023-06-21 03:55:57.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=135681 node=manager-regression-manager--db-node-c8db16f3-3
2023-06-21T03:55:57+00:00 manager-regression-manager--db-node-c8db16f3-3     !INFO | scylla[7909]: Reactor stalled for 748 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0x7fae741447f8 0x10e5455 0x7fae74302ef6 0x2d3032c 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
_start at ??:?
2023-06-21 04:08:39.719 <2023-06-21 04:08:36.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=212619 node=manager-regression-manager--db-node-c8db16f3-3
2023-06-21T04:08:36+00:00 manager-regression-manager--db-node-c8db16f3-3     !INFO | scylla[7909]: Reactor stalled for 692 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0xbdcff 0xfa523 0xfaa69 0x5388187 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
2023-06-21 04:08:43.459 <2023-06-21 04:08:39.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=224801 node=manager-regression-manager--db-node-c8db16f3-3
2023-06-21T04:08:39+00:00 manager-regression-manager--db-node-c8db16f3-3     !INFO | scylla[7909]: Reactor stalled for 1413 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0x10e52cf 0x2d30400 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed
2023-06-21 04:08:46.944 <2023-06-21 04:08:41.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=237087 node=manager-regression-manager--db-node-c8db16f3-3
2023-06-21T04:08:41+00:00 manager-regression-manager--db-node-c8db16f3-3     !INFO | scylla[7909]: Reactor stalled for 575 ms on shard 1. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0x7fae74302ae8 0x7fae7430323b 0x2d30307 0xf59a 0x117c4 0x38048d6 0x5388149 0x3803fa0 0x37f022c 0x381e306 0x381bc16 0x38197e7 0x38187f6 0x38227bb 0x4ff3054 0x4ff4437 0x5013485 0x4fc6dea 0x92a4 0x100322
?? ??:0
2023-06-21 04:17:59.518 <2023-06-21 04:17:56.000>: (DatabaseLogEvent Severity.ERROR) period_type=one-time event_id=2c6895cc-69ee-44ac-b3d6-73fe26dbdf72: type=REACTOR_STALLED regex=Reactor stalled line_number=253329 node=manager-regression-manager--db-node-c8db16f3-3
2023-06-21T04:17:56+00:00 manager-regression-manager--db-node-c8db16f3-3     !INFO | scylla[7909]: Reactor stalled for 576 ms on shard 0. Backtrace: 0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0xae10e 0xabd72 0x7fae742ef94e 0x7fae742e5ea9 0x7fae742e62a3 0x10e4081 0xf59a 0x11821 0x4f966cb 0x538813c 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed

Logs:
critical - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/critical-c8db16f3.log.tar.gz
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/db-cluster-c8db16f3.tar.gz
debug - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/debug-c8db16f3.log.tar.gz
email_data - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/email_data-c8db16f3.json.tar.gz
error - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/error-c8db16f3.log.tar.gz
event - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/events-c8db16f3.log.tar.gz
left_processes - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/left_processes-c8db16f3.log.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/loader-set-c8db16f3.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/monitor-set-c8db16f3.tar.gz
normal - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/normal-c8db16f3.log.tar.gz
output - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/output-c8db16f3.log.tar.gz
event - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/raw_events-c8db16f3.log.tar.gz
sct - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/sct-c8db16f3.log.tar.gz
summary - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/summary-c8db16f3.log.tar.gz
warning - https://cloudius-jenkins-test.s3.amazonaws.com/c8db16f3-e663-443a-9fb7-03060b42b0ad/20230621_045221/warning-c8db16f3.log.tar.gz

@mykaul
Copy link
Contributor

mykaul commented Jun 21, 2023

@ShlomiBalalis - why is this under Manager? Should I move it to core?

@karol-kokoszka
Copy link

Scylla manager did a backup correct way, did a restore.
Nodetool completed the repair, and scylla shows some problems in logs.

Even if the issue is somehow connected to the way how we restore (but I doubt), we are not able to debug it without the input from scylla core defining why these errors appear.

@mykaul
Copy link
Contributor

mykaul commented Jun 21, 2023

Decode of
0x4fe3862 0x4fe24c0 0x4fe3770 0x7fae74f23a1f 0xae10e 0xabd72 0x7fae742ef94e 0x7fae742e5ea9 0x7fae742e62a3 0x10e4081 0xf59a 0x11821 0x4f966cb 0x538813c 0x3803fa0 0x37f022c 0x381e306 0x384800f 0x381e14b 0x4ff3054 0x4ff4437 0x4ff368c 0x4f99d98 0x4f99271 0x10cf665 0x10ccc1a 0x27b74 0x10cbaed from build 39872d51924dbad16fdeba9b92a6bf036e463e9c above (Enterprise) looks somewhat similar to the open source, only with different libraries version (fmtv7?):

[Backtrace #0]
void seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}>(seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}&&) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:59
 (inlined by) seastar::backtrace_buffer::append_backtrace_oneline() at ./build/release/seastar/./seastar/src/core/reactor.cc:774
 (inlined by) seastar::print_with_backtrace(seastar::backtrace_buffer&, bool) at ./build/release/seastar/./seastar/src/core/reactor.cc:793
seastar::internal::cpu_stall_detector::generate_trace() at ./build/release/seastar/./seastar/src/core/reactor.cc:1368
seastar::internal::cpu_stall_detector::maybe_report() at ./build/release/seastar/./seastar/src/core/reactor.cc:1110
 (inlined by) seastar::internal::cpu_stall_detector::on_signal() at ./build/release/seastar/./seastar/src/core/reactor.cc:1127
 (inlined by) seastar::reactor::block_notifier(int) at ./build/release/seastar/./seastar/src/core/reactor.cc:1351
?? ??:0
?? ??:0
?? ??:0
?? ??:0
?? ??:0
?? ??:0
basic_ostream at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/ostream:85
 (inlined by) void fmt::v7::detail::format_value<char, seastar::basic_sstring<char, unsigned int, 15u, true> >(fmt::v7::detail::buffer<char>&, seastar::basic_sstring<char, unsigned int, 15u, true> const&, fmt::v7::detail::locale_ref) at /usr/include/fmt/ostream.h:110
 (inlined by) fmt::v7::detail::buffer_appender<char> fmt::v7::detail::fallback_formatter<seastar::basic_sstring<char, unsigned int, 15u, true>, char, void>::format<fmt::v7::detail::buffer_appender<char> >(seastar::basic_sstring<char, unsigned int, 15u, true> const&, fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<char>, char>&) at /usr/include/fmt/ostream.h:138
 (inlined by) void fmt::v7::detail::value<fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<char>, char> >::format_custom_arg<seastar::basic_sstring<char, unsigned int, 15u, true>, fmt::v7::detail::fallback_formatter<seastar::basic_sstring<char, unsigned int, 15u, true>, char, void> >(void const*, fmt::v7::basic_format_parse_context<char, fmt::v7::detail::error_handler>&, fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<char>, char>&) at /usr/include/fmt/core.h:1110
?? ??:0
?? ??:0
std::enable_if<true, seastar::internal::log_buf::inserter_iterator>::type fmt::v7::vformat_to<seastar::internal::log_buf::inserter_iterator, fmt::v7::basic_string_view<char>, char, true>(seastar::internal::log_buf::inserter_iterator, fmt::v7::basic_string_view<char> const&, fmt::v7::basic_format_args<fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<fmt::v7::type_identity<char>::type>, fmt::v7::type_identity<char>::type> >) at /usr/include/fmt/core.h:1984
std::enable_if<true, seastar::internal::log_buf::inserter_iterator>::type fmt::v7::format_to<seastar::internal::log_buf::inserter_iterator, char [7], seastar::basic_sstring<char, unsigned int, 15u, true>&, true>(seastar::internal::log_buf::inserter_iterator, char const (&) [7], seastar::basic_sstring<char, unsigned int, 15u, true>&) at /usr/include/fmt/core.h:2005
 (inlined by) operator() at ./build/release/seastar/./seastar/src/util/log.cc:283
 (inlined by) seastar::logger::do_log(seastar::log_level, seastar::logger::log_writer&) at ./build/release/seastar/./seastar/src/util/log.cc:304
void seastar::logger::log<utils::UUID&, int&, unsigned long, unsigned int&, seastar::basic_sstring<char, unsigned int, 15u, true>&, std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > > const&, nonwrapping_interval<dht::token> const&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, char const*&>(seastar::log_level, seastar::logger::format_info, utils::UUID&, int&, unsigned long&&, unsigned int&, seastar::basic_sstring<char, unsigned int, 15u, true>&, std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > > const&, nonwrapping_interval<dht::token> const&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, char const*&) at ././seastar/include/seastar/util/log.hh:227
 (inlined by) void seastar::logger::warn<utils::UUID&, int&, unsigned long, unsigned int&, seastar::basic_sstring<char, unsigned int, 15u, true>&, std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > > const&, nonwrapping_interval<dht::token> const&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, char const*&>(seastar::logger::format_info, utils::UUID&, int&, unsigned long&&, unsigned int&, seastar::basic_sstring<char, unsigned int, 15u, true>&, std::vector<seastar::basic_sstring<char, unsigned int, 15u, true>, std::allocator<seastar::basic_sstring<char, unsigned int, 15u, true> > > const&, nonwrapping_interval<dht::token> const&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, char const*&) at ././seastar/include/seastar/util/log.hh:335
 (inlined by) operator()<std::vector<gms::inet_address>, std::vector<gms::inet_address> > at ./repair/repair.cc:641
seastar::future<void> std::__invoke_impl<seastar::future<void>, repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&>(std::__invoke_other, repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&) at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/invoke.h:61
 (inlined by) std::__invoke_result<repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&>::type std::__invoke<repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&>(repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&) at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/invoke.h:96
 (inlined by) decltype(auto) std::__apply_impl<repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&, std::tuple<std::vector<gms::inet_address, std::allocator<gms::inet_address> >, std::vector<gms::inet_address, std::allocator<gms::inet_address> > >&, 0ul, 1ul>(repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&, std::tuple<std::vector<gms::inet_address, std::allocator<gms::inet_address> >, std::vector<gms::inet_address, std::allocator<gms::inet_address> > >&, std::integer_sequence<unsigned long, 0ul, 1ul>) at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/tuple:1858
 (inlined by) decltype(auto) std::apply<repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&, std::tuple<std::vector<gms::inet_address, std::allocator<gms::inet_address> >, std::vector<gms::inet_address, std::allocator<gms::inet_address> > >&>(repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&, std::tuple<std::vector<gms::inet_address, std::allocator<gms::inet_address> >, std::vector<gms::inet_address, std::allocator<gms::inet_address> > >&) at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/tuple:1869
 (inlined by) auto seastar::internal::do_with_impl<std::vector<gms::inet_address, std::allocator<gms::inet_address> >, std::vector<gms::inet_address, std::allocator<gms::inet_address> >, repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13>(std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&&) at ././seastar/include/seastar/core/do_with.hh:96
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<seastar::future<void> (*&)(std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&&), std::vector<gms::inet_address, std::allocator<gms::inet_address> >, std::vector<gms::inet_address, std::allocator<gms::inet_address> >, repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13>(seastar::future<void> (*&)(std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&&), std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&&) at ././seastar/include/seastar/core/future.hh:2149
 (inlined by) auto seastar::futurize_invoke<seastar::future<void> (*&)(std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&&), std::vector<gms::inet_address, std::allocator<gms::inet_address> >, std::vector<gms::inet_address, std::allocator<gms::inet_address> >, repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13>(seastar::future<void> (*&)(std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&&), std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&&) at ././seastar/include/seastar/core/future.hh:2180
 (inlined by) auto seastar::do_with<std::vector<gms::inet_address, std::allocator<gms::inet_address> >, std::vector<gms::inet_address, std::allocator<gms::inet_address> >, repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13>(std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, std::vector<gms::inet_address, std::allocator<gms::inet_address> >&&, repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID)::$_13&&) at ././seastar/include/seastar/core/do_with.hh:131
 (inlined by) repair_info::repair_range(nonwrapping_interval<dht::token> const&, utils::UUID) at ./repair/repair.cc:614
operator() at ./repair/repair.cc:942
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}>(nonwrapping_interval<dht::token>&) at ././seastar/include/seastar/core/future.hh:2149
 (inlined by) auto seastar::futurize_invoke<do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}>(nonwrapping_interval<dht::token>&) at ././seastar/include/seastar/core/future.hh:2180
 (inlined by) operator()<seastar::semaphore_units<seastar::named_semaphore_exception_factory> > at ././seastar/include/seastar/core/semaphore.hh:682
seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<seastar::with_semaphore<seastar::named_semaphore_exception_factory, do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}, std::chrono::_V2::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&, unsigned long, std::invoke_result&&)::{lambda(auto:1)#1}, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2> >(nonwrapping_interval<dht::token>&, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2>&&) at ././seastar/include/seastar/core/future.hh:2149
 (inlined by) std::invoke_result seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock> >::then_impl<seastar::with_semaphore<seastar::named_semaphore_exception_factory, do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}, std::chrono::_V2::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&, unsigned long, std::invoke_result&&)::{lambda(auto:1)#1}, seastar::future<void> >(nonwrapping_interval<dht::token>&) at ././seastar/include/seastar/core/future.hh:1615
 (inlined by) seastar::internal::future_result<seastar::with_semaphore<seastar::named_semaphore_exception_factory, do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}, std::chrono::_V2::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&, unsigned long, std::invoke_result&&)::{lambda(auto:1)#1}, seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock> >::future_type seastar::internal::call_then_impl<seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock> > >::run<seastar::with_semaphore<seastar::named_semaphore_exception_factory, do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}, std::chrono::_V2::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&, unsigned long, std::invoke_result&&)::{lambda(auto:1)#1}>(seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock> >&, nonwrapping_interval<dht::token>&) at ././seastar/include/seastar/core/future.hh:1248
 (inlined by) std::invoke_result seastar::future<seastar::semaphore_units<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock> >::then<seastar::with_semaphore<seastar::named_semaphore_exception_factory, do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}, std::chrono::_V2::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&, unsigned long, std::invoke_result&&)::{lambda(auto:1)#1}, seastar::future<void> >(nonwrapping_interval<dht::token>&) at ././seastar/include/seastar/core/future.hh:1534
 (inlined by) seastar::futurize<std::invoke_result<do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}>::type>::type seastar::with_semaphore<seastar::named_semaphore_exception_factory, do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15::operator()<nonwrapping_interval<dht::token>&>(nonwrapping_interval<dht::token>&) const::{lambda()#1}, std::chrono::_V2::steady_clock>(seastar::basic_semaphore<seastar::named_semaphore_exception_factory, std::chrono::_V2::steady_clock>&, unsigned long, std::invoke_result&&) at ././seastar/include/seastar/core/semaphore.hh:681
 (inlined by) operator()<nonwrapping_interval<dht::token> &> at ./repair/repair.cc:941
 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15&, nonwrapping_interval<dht::token>&>(do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15&, nonwrapping_interval<dht::token>&) at ././seastar/include/seastar/core/future.hh:2149
 (inlined by) auto seastar::futurize_invoke<do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15&, nonwrapping_interval<dht::token>&>(do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15&, nonwrapping_interval<dht::token>&) at ././seastar/include/seastar/core/future.hh:2180
 (inlined by) parallel_for_each<__gnu_cxx::__normal_iterator<nonwrapping_interval<dht::token> *, std::vector<nonwrapping_interval<dht::token> > >, __gnu_cxx::__normal_iterator<nonwrapping_interval<dht::token> *, std::vector<nonwrapping_interval<dht::token> > >, (lambda at repair/repair.cc:940:59)> at ././seastar/include/seastar/coroutine/parallel_for_each.hh:117
 (inlined by) parallel_for_each<std::vector<nonwrapping_interval<dht::token> > &, (lambda at repair/repair.cc:940:59)> at ././seastar/include/seastar/coroutine/parallel_for_each.hh:139
 (inlined by) do_repair_ranges(seastar::lw_shared_ptr<repair_info>) at ./repair/repair.cc:940
std::experimental::coroutine_handle<void>::resume() const at ././seastar/include/seastar/core/std-coroutine.hh:99
 (inlined by) seastar::coroutine::parallel_for_each<do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15>::resume_or_set_callback() at ././seastar/include/seastar/coroutine/parallel_for_each.hh:100
 (inlined by) seastar::coroutine::parallel_for_each<do_repair_ranges(seastar::lw_shared_ptr<repair_info>)::$_15>::run_and_dispose() at ././seastar/include/seastar/coroutine/parallel_for_each.hh:166
seastar::reactor::run_tasks(seastar::reactor::task_queue&) at ./build/release/seastar/./seastar/src/core/reactor.cc:2345
 (inlined by) seastar::reactor::run_some_tasks() at ./build/release/seastar/./seastar/src/core/reactor.cc:2752
seastar::reactor::do_run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2921
seastar::reactor::run() at ./build/release/seastar/./seastar/src/core/reactor.cc:2804
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:265
seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) at ./build/release/seastar/./seastar/src/core/app-template.cc:156
scylla_main(int, char**) at ./main.cc:566
std::function<int (int, char**)>::operator()(int, char**) const at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/std_function.h:590
 (inlined by) main at ./main.cc:1732
?? ??:0
_start at ??:?

@ShlomiBalalis
Copy link
Author

Scylla manager did a backup correct way, did a restore. Nodetool completed the repair, and scylla shows some problems in logs.

I figured as much. I just wanted your input to be sure.

Even if the issue is somehow connected to the way how we restore (but I doubt), we are not able to debug it without the input from scylla core defining why these errors appear.

Shall I move the issue to the core, then?

please provide decoded stalls, at least for some. For example the last one, I've never seen it (but I may be unfamiliar with all of course):

I will decode each stall and add them to the issue in a separate file

@mykaul mykaul transferred this issue from scylladb/scylla-manager Jun 21, 2023
@ShlomiBalalis
Copy link
Author

scylla version: 5.2.1-0.20230508.f1c45553bc29 with build-id 88ac66b1719cc7c5b7e982aa34ba5dc95909b84a
Client version: 3.1.1-0.20230612.401edeb8
Server version: 3.1.1-0.20230612.401edeb8

All of the backtraces of the initial, azure run, decoded:
azure_run_decodings.txt

@mykaul
Copy link
Contributor

mykaul commented Jun 21, 2023

scylla version: 5.2.1-0.20230508.f1c45553bc29 with build-id 88ac66b1719cc7c5b7e982aa34ba5dc95909b84a
Client version: 3.1.1-0.20230612.401edeb8
Server version: 3.1.1-0.20230612.401edeb8

All of the backtraces of the initial, azure run, decoded: azure_run_decodings.txt

Thanks, they all look the same to me (same area), but someone else will have to verify that.

@ShlomiBalalis
Copy link
Author

scylla version: 5.2.1-0.20230508.f1c45553bc29 with build-id 88ac66b1719cc7c5b7e982aa34ba5dc95909b84a
Client version: 3.1.1-0.20230612.401edeb8
Server version: 3.1.1-0.20230612.401edeb8

All of the backtraces of the initial, azure run, decoded: azure_run_decodings.txt

Thanks, they all look the same to me (same area), but someone else will have to verify that.

They do, but there are slight differences here and there so I included them all (minus the duplicates)

@michoecho
Copy link
Contributor

michoecho commented Jun 21, 2023

The issue is clear:

do_repair_ranges() runs repair_range() on each range via parallel_for_each():

scylladb/repair/repair.cc

Lines 948 to 950 in 643e69a

co_await coroutine::parallel_for_each(ranges, [this, table_id] (auto&& range) {
return with_semaphore(rs.get_repair_module().range_parallelism_semaphore(), 1, [this, &range, table_id] {
return repair_range(range, table_id).then([this] {

But there are 25600 ranges, and parallel_for_each() does not preempt — it synchronously calls each of the passed objects to obtain futures. Only when all objects are completed or turned into futures, does parallel_for_each() return.

Normally this wouldn't be much of a problem (I guess, because we didn't bump into the issue before), but in this scenario each object passed to parallel_for_each prints an expensive log before returning a future:

Jun 21 02:31:20 manager-regression-manager--db-node-eastus-1 scylla[7666]:  [shard 0] repair - repair[8f7a1e0e-9db5-4bde-a830-aec4738ef20a]: Repair 19785 out of 25600 ranges,  shard=0, keyspace=ks003, table={table0073, table0097, table0057, table0034, table0061, table009, table0020, table0019, table0046, table0079, table0029, table002, table0022, table0035, table0096, table0069, table0081, table0058, table003, table0066, table0012, table0047, table0025, table0086, table0098, table0026, table0094, table0064, table0054, table0018, table0089, table0090, table0093, table0030, table0074, table0038, table0083, table0048, table0044, table0011, table0021, table0043, table0017, table0060, table0082, table004, table0027, table008, table0036, table0084, table0045, table0016, table0013, table0023, table0071, table0015, table0014, table0063, table0052, table0091, table0055, table007, table0088, table0092, table0010, table0037, table0042, table0059, table0095, table0099, table0078, table0033, table0072, table0065, table0056, table0028, table0080, table0039, table0031, table005, table001, table0051, table0062, table0053, table0032, table0041, table0077, table0087, table0050, table0068, table006, table0085, table0040, table000, table0067, table0075, table0070, table0049, table0076, table0024}, range=(-4829075541514926967, -4782389272508932393], peers={}, live_peers={}, status=skipped_no_followers

So there are 25600 expensive logs that have to be printed in a single reactor cycle, and this is the stall.

In normal operation, the first several objects would start repairs and the rest would immediately block on the limited concurrency semaphore, without doing much work — so even though all 25600 function calls would be executed simultaneously, it wouldn't cause a (big) stall (I guess). But in this scenario the ranges are skipped (and finish synchronously), so the semaphore is released immediately, and each item gets to run some expensive work before being turned into a future, which causes a stall.

Quick fix: add yield().then( immediately before the repair_range() call. This will limit the work done synchronously by the tasks and force them to go through the semaphore, bringing this scenario in line with the normal scenario.
True fix: add a variant of parallel_for_each which preempts between items.

@mykaul mykaul added the P1 Urgent label Jun 26, 2023
@mykaul
Copy link
Contributor

mykaul commented Jun 26, 2023

@eliransin - Michal diagnosed the issue and there's a quick fix and longer term items above - can you assign it?

@mykaul mykaul changed the title Repairing a cluster after a restore causes severe reactor stalls throughout the cluster Repairing a cluster after a restore causes severe reactor stalls throughout the cluster (due to expensive logging within do_repair_ranges() without yield) Jun 26, 2023
@DoronArazii DoronArazii added this to the 5.4 milestone Aug 6, 2023
@mykaul
Copy link
Contributor

mykaul commented Aug 9, 2023

@eliransin - ping?

@DoronArazii DoronArazii modified the milestone: 5.4 Sep 3, 2023
@mykaul
Copy link
Contributor

mykaul commented Oct 26, 2023

@eliransin - ping?

@mykaul mykaul modified the milestones: 5.4, 6.0 Oct 29, 2023
@eliransin eliransin assigned asias and unassigned eliransin Oct 29, 2023
@mykaul
Copy link
Contributor

mykaul commented Oct 29, 2023

@denesb - please see what needs to be done here - is it as 6.0 material? earlier? backlog?

@denesb
Copy link
Contributor

denesb commented Oct 30, 2023

@denesb - please see what needs to be done here - is it as 6.0 material? earlier? backlog?

This is such a simple fix, there is no point in pushing it out. I will send a PR with the fix today.

denesb added a commit to denesb/scylla that referenced this issue Oct 30, 2023
…nges

We have observed do_repair_ranges() receiving tens of thousands of
ranges to repairs on occasion. do_repair_ranges() repairs all ranges in
parallel, with parallel_for_each(). This is normally fine, as the lambda
inside parallel_for_each() takes a semaphore and this will result in
limited concurrency.
However, in some instances, it is possible that most of these ranges are
skipped. In this case the lambda will become synchronous, only logging a
message. This can cause stalls beacuse there are no opportunities to
yield. Solve this by adding an explicit yield to prevent this.

Fixes: scylladb#14330
@denesb
Copy link
Contributor

denesb commented Oct 30, 2023

Fix here: #15879.

@denesb denesb self-assigned this Oct 30, 2023
denesb added a commit to denesb/scylla that referenced this issue Oct 31, 2023
…nges

We have observed do_repair_ranges() receiving tens of thousands of
ranges to repairs on occasion. do_repair_ranges() repairs all ranges in
parallel, with parallel_for_each(). This is normally fine, as the lambda
inside parallel_for_each() takes a semaphore and this will result in
limited concurrency.
However, in some instances, it is possible that most of these ranges are
skipped. In this case the lambda will become synchronous, only logging a
message. This can cause stalls beacuse there are no opportunities to
yield. Solve this by adding an explicit yield to prevent this.

Fixes: scylladb#14330
avikivity pushed a commit that referenced this issue Nov 8, 2023
…nges

We have observed do_repair_ranges() receiving tens of thousands of
ranges to repairs on occasion. do_repair_ranges() repairs all ranges in
parallel, with parallel_for_each(). This is normally fine, as the lambda
inside parallel_for_each() takes a semaphore and this will result in
limited concurrency.
However, in some instances, it is possible that most of these ranges are
skipped. In this case the lambda will become synchronous, only logging a
message. This can cause stalls beacuse there are no opportunities to
yield. Solve this by adding an explicit yield to prevent this.

Fixes: #14330

Closes #15879

(cherry picked from commit 90a8489)
avikivity pushed a commit that referenced this issue Nov 8, 2023
…nges

We have observed do_repair_ranges() receiving tens of thousands of
ranges to repairs on occasion. do_repair_ranges() repairs all ranges in
parallel, with parallel_for_each(). This is normally fine, as the lambda
inside parallel_for_each() takes a semaphore and this will result in
limited concurrency.
However, in some instances, it is possible that most of these ranges are
skipped. In this case the lambda will become synchronous, only logging a
message. This can cause stalls beacuse there are no opportunities to
yield. Solve this by adding an explicit yield to prevent this.

Fixes: #14330

Closes #15879

(cherry picked from commit 90a8489)
@avikivity
Copy link
Member

Backported to 5.2, 5.4. Skipped 5.1, it's performance only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants