system_auth RF change to 3 fails: got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem) #9751

vponomaryov · 2021-12-07T12:25:55Z

Installation details
Kernel version: 5.4.0-1035-aws
Scylla version (or git commit hash): 4.7.dev-0.20211126.44f4ea38c with build-id a88083cca1208179dd36a3847af403f77ad794f0
Cluster size: 5 nodes (i3.2xlarge)
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0238a987fdfc50310 (aws: eu-west-1)

Scylla running with shards number (live nodes):
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 (3.248.218.248 | 10.0.2.196): 4 shards
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-2 (54.246.70.108 | 10.0.3.20): 6 shards
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 (34.244.105.91 | 10.0.3.96): 6 shards
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 (34.251.238.10 | 10.0.1.158): 6 shards
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 (54.246.48.89 | 10.0.0.44): 7 shards

Test: longevity-large-partition-asymmetric-cluster-3h Test name: longevity_test.LongevityTest.test_custom_time Test config file(s):

Issue description
Running on the asymmetric cluster change of system_auth RF to 3 fails with the following error on the Node 1:
got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem)

And on all other nodes with the following error:
Failed to read a fragment from the reader, keyspace=system_auth, table=roles, range=[{-8833127432019027370, end}, {-8822971771378640915, end}]: seastar::named_semaphore_timed_out (Semaphore timed out: _streaming_concurrency_sem)

More details:

Node 1:

Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 0] schema_tables - Altering keyspace system_auth
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 0] schema_tables - Schema version changed to beb82c8e-e511-3dc9-b498-eb7d3871caa8
...
Dec 07 01:51:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 2] repair - repair id [id=1, uuid=dabb88a7-3a77-4a4c-8a38-30cf7d323d84] on shard 2 completed successfully, keyspace=system_auth
...
Dec 07 01:51:32 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 3] repair - repair id [id=1, uuid=dabb88a7-3a77-4a4c-8a38-30cf7d323d84] on shard 3 completed successfully, keyspace=system_auth
...
Dec 07 02:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 0] repair - repair id [id=1, uuid=dabb88a7-3a77-4a4c-8a38-30cf7d323d84] on shard 0, keyspace=system_auth, cf=roles, range=(-8421410461930041608, -8417409559063327415], got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem)
...
Dec 07 02:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 1] repair - repair id [id=1, uuid=dabb88a7-3a77-4a4c-8a38-30cf7d323d84] on shard 1, keyspace=system_auth, cf=roles, range=(-3886665998130067779, -3884745563637348092], got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem)
...
Dec 07 03:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 0] repair - repair id [id=1, uuid=dabb88a7-3a77-4a4c-8a38-30cf7d323d84] on shard 0, keyspace=system_auth, cf=role_attributes, range=(-744873554611290834, -741035112301363350], got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem)
...
Dec 07 04:21:32 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 0] repair - repair[dabb88a7-3a77-4a4c-8a38-30cf7d323d84]: Started to shutdown off-strategy compaction updater
Dec 07 04:21:32 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 0] repair - repair[dabb88a7-3a77-4a4c-8a38-30cf7d323d84]: Finished to shutdown off-strategy compaction updater
Dec 07 04:21:32 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 0] repair - repair_tracker run for repair id [id=1, uuid=dabb88a7-3a77-4a4c-8a38-30cf7d323d84] failed: std::runtime_error ({shard 0: std::runtime_error (Failed to repair for keyspace=system_auth, cf=roles, range=(-9210611041549644410, -9179067137543748740]), shard 1: std::runtime_error (Failed to repair for keyspace=system_auth, cf=role_members, range=(-3997244064957500063, -3981095411310064477])})

Node 2:

Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-2 scylla[13981]:  [shard 0] schema_tables - Altering keyspace system_auth
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-2 scylla[13981]:  [shard 0] schema_tables - Schema version changed to beb82c8e-e511-3dc9-b498-eb7d3871caa8
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-2 scylla[13981]:  [shard 0] compaction - [Compact system_schema.keyspaces 31e15430-5700-11ec-844e-13d36e5eb614] Compacted 2 sstables to [/var/lib/scylla/data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/md-30-big-Data.db:level=0]. 11kB to 6kB (~52% of original) in 51ms = 118kB/s. ~256 total partitions merged to 6.
Dec 07 02:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-2 scylla[13981]:  [shard 0] reader_concurrency_semaphore - Semaphore _streaming_concurrency_sem with 10/10 count and 163840/199229440 memory resources: timed out, dumping permit diagnostics:
                                                                                      permits        count        memory        table/description/state
                                                                                      10        10        160K        system_auth.roles/repair-meta/active/unused
                                                                                      10        0        0B        system_auth.roles/shard-reader/waiting
                                                                                      
                                                                                      20        10        160K        total
                                                                                      
                                                                                      Total: 20 permits with 10 count and 160K memory resources
Dec 07 02:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-2 scylla[13981]:  [shard 0] repair - Failed to read a fragment from the reader, keyspace=system_auth, table=roles, range=[{-8833127432019027370, end}, {-8822971771378640915, end}]: seastar::named_semaphore_timed_out (Semaphore timed out: _streaming_concurrency_sem)

Node 3:

Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 0] schema_tables - Altering keyspace system_auth
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 0] schema_tables - Schema version changed to beb82c8e-e511-3dc9-b498-eb7d3871caa8
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 0] compaction - [Compact system_schema.keyspaces 31e0df00-5700-11ec-b417-37d90f9b64b9] Compacted 2 sstables to [/var/lib/scylla/data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/md-42-big-Data.db:level=0]. 11kB to 6kB (~51% of original) in 51ms = 118kB/s. ~256 total partitions merged to 6.
Dec 07 02:41:45 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 1] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 37d44ad0-5707-11ec-a43f-37da0f9b64b9] Compacting [/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-13-big-Data.db:level=0:origin=streaming,/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-7-big-Data.db:level=0:origin=streaming]
Dec 07 02:41:45 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 0] compaction - [Compact system_schema.columns 37d44ad0-5707-11ec-b417-37d90f9b64b9] Compacting [/var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc-30-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc-36-big-Data.db:level=0:origin=compaction]
Dec 07 02:41:45 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 4] compaction - [Compact system_distributed_everywhere.cdc_generation_descriptions_v2 37d44ad0-5707-11ec-87b3-37d50f9b64b9] Compacting [/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-16-big-Data.db:level=0:origin=streaming,/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-10-big-Data.db:level=0:origin=streaming]
Dec 07 02:41:45 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 4] compaction - [Compact system_distributed_everywhere.cdc_generation_descriptions_v2 37d44ad0-5707-11ec-87b3-37d50f9b64b9] Compacted 2 sstables to [/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-22-big-Data.db:level=0]. 69kB to 134kB (~193% of original) in 13ms = 10MB/s. ~256 total partitions merged to 2.
Dec 07 02:41:45 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 0] compaction - [Compact system_schema.columns 37d44ad0-5707-11ec-b417-37d90f9b64b9] Compacted 2 sstables to [/var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/md-42-big-Data.db:level=0]. 26kB to 18kB (~68% of original) in 13ms = 1MB/s. ~256 total partitions merged to 6.
Dec 07 02:41:45 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 1] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 37d44ad0-5707-11ec-a43f-37da0f9b64b9] Compacted 2 sstables to [/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-19-big-Data.db:level=0]. 91kB to 144kB (~157% of original) in 18ms = 8MB/s. ~256 total partitions merged to 2.
Dec 07 03:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 1] reader_concurrency_semaphore - Semaphore _streaming_concurrency_sem with 10/10 count and 163840/199229440 memory resources: timed out, dumping permit diagnostics:
                                                                                      permits        count        memory        table/description/state
                                                                                      5        5        80K        system_auth.role_members/repair-meta/active/unused
                                                                                      3        3        48K        system_auth.roles/repair-meta/active/unused
                                                                                      2        2        32K        system_auth.role_attributes/repair-meta/active/unused
                                                                                      3        0        0B        system_auth.role_attributes/shard-reader/waiting
                                                                                      6        0        0B        system_auth.role_members/shard-reader/waiting
                                                                                      4        0        0B        system_auth.roles/shard-reader/waiting
                                                                                      
                                                                                      23        10        160K        total
                                                                                      
                                                                                      Total: 23 permits with 10 count and 160K memory resources
Dec 07 03:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 1] repair - Failed to read a fragment from the reader, keyspace=system_auth, table=roles, range=[{4797201525887148681, end}, {4798014872356268772, end}]: seastar::named_semaphore_timed_out (Semaphore timed out: _streaming_concurrency_sem)

Node 4

Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 0] schema_tables - Altering keyspace system_auth
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 0] schema_tables - Schema version changed to beb82c8e-e511-3dc9-b498-eb7d3871caa8
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 0] compaction - [Compact system_schema.keyspaces 31e0b7f0-5700-11ec-b215-383abf074923] Compacted 3 sstables to [/var/lib/scylla/data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/md-54-big-Data.db:level=0]. 17kB to 6kB (~35% of original) in 61ms = 101kB/s. ~384 total partitions merged to 6.
Dec 07 02:45:19 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 0] compaction - [Compact system_schema.columns b729a4b0-5707-11ec-b215-383abf074923] Compacting [/var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc-54-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc-48-big-Data.db:level=0:origin=compaction]
Dec 07 02:45:19 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 4] compaction - [Compact system_distributed_everywhere.cdc_generation_descriptions_v2 b729f2d0-5707-11ec-9658-383bbf074923] Compacting [/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-16-big-Data.db:level=0:origin=streaming,/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-10-big-Data.db:level=0:origin=streaming]
Dec 07 02:45:19 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 1] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 b72a19e0-5707-11ec-9399-383cbf074923] Compacting [/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-7-big-Data.db:level=0:origin=streaming,/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-13-big-Data.db:level=0:origin=streaming]
Dec 07 02:45:19 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 4] compaction - [Compact system_distributed_everywhere.cdc_generation_descriptions_v2 b729f2d0-5707-11ec-9658-383bbf074923] Compacted 2 sstables to [/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-22-big-Data.db:level=0]. 69kB to 134kB (~192% of original) in 9ms = 14MB/s. ~256 total partitions merged to 2.
Dec 07 02:45:19 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 0] compaction - [Compact system_schema.columns b729a4b0-5707-11ec-b215-383abf074923] Compacted 2 sstables to [/var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/md-60-big-Data.db:level=0]. 26kB to 18kB (~68% of original) in 14ms = 1MB/s. ~256 total partitions merged to 6.
Dec 07 02:45:19 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 1] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 b72a19e0-5707-11ec-9399-383cbf074923] Compacted 2 sstables to [/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-19-big-Data.db:level=0]. 161kB to 163kB (~100% of original) in 17ms = 9MB/s. ~256 total partitions merged to 2.
Dec 07 04:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 0] reader_concurrency_semaphore - Semaphore _streaming_concurrency_sem with 10/10 count and 163840/199229440 memory resources: timed out, dumping permit diagnostics:
                                                                                      permits        count        memory        table/description/state
                                                                                      4        4        64K        system_auth.role_members/repair-meta/active/unused
                                                                                      3        3        48K        system_auth.roles/repair-meta/active/unused
                                                                                      3        3        48K        system_auth.role_attributes/repair-meta/active/unused
                                                                                      3        0        0B        system_auth.role_members/shard-reader/waiting
                                                                                      3        0        0B        system_auth.roles/shard-reader/evicted
                                                                                      1        0        0B        system_auth.role_members/shard-reader/evicted
                                                                                      3        0        0B        system_auth.role_attributes/shard-reader/evicted
                                                                                      
                                                                                      20        10        160K        total
                                                                                      
                                                                                      Total: 20 permits with 10 count and 160K memory resources
Dec 07 04:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 0] repair - Failed to read a fragment from the reader, keyspace=system_auth, table=role_members, range=[{3069961132509719081, end}, {3077107436340118180, end}]: seastar::named_semaphore_timed_out (Semaphore timed out: _streaming_concurrency_sem)

Node 5:

Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] schema_tables - Altering keyspace system_auth
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] schema_tables - Schema version changed to beb82c8e-e511-3dc9-b498-eb7d3871caa8
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] compaction - [Compact system_schema.keyspaces 31e28cb0-5700-11ec-af32-62308cdb0fb6] Compacted 3 sstables to [/var/lib/scylla/data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/md-84-big-Data.db:level=0]. 17kB to 6kB (~36% of original) in 62ms = 100kB/s. ~384 total partitions merged to 6.
Dec 07 02:17:37 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] rpc - client 162.142.125.194:46704: server connection dropped: read: Connection reset by peer
Dec 07 02:17:37 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 6] rpc - client 162.142.125.194:41236: wrong protocol magic: 160301007b010000
Dec 07 02:17:37 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 6] rpc - client 162.142.125.194:41236: server connection dropped: connection is closed
Dec 07 02:17:38 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] rpc - client 162.142.125.194:45178: wrong protocol magic: 474554202f204854
Dec 07 02:17:38 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] rpc - client 162.142.125.194:45178: server connection dropped: connection is closed
Dec 07 02:17:39 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] rpc - client 162.142.125.194:41776: server connection dropped: read: Connection reset by peer
Dec 07 02:48:53 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 2] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 36f9faf0-5708-11ec-8fbc-622e8cdb0fb6] Compacting [/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-16-big-Data.db:level=0:origin=streaming,/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-9-big-Data.db:level=0:origin=streaming]
Dec 07 02:48:53 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 4] compaction - [Compact system_distributed_everywhere.cdc_generation_descriptions_v2 36f9faf0-5708-11ec-8592-622d8cdb0fb6] Compacting [/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-11-big-Data.db:level=0:origin=streaming,/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-18-big-Data.db:level=0:origin=streaming]
Dec 07 02:48:53 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 2] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 36f9faf0-5708-11ec-8fbc-622e8cdb0fb6] Compacted 2 sstables to [/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-23-big-Data.db:level=0]. 161kB to 163kB (~100% of original) in 12ms = 13MB/s. ~256 total partitions merged to 2.
Dec 07 02:48:53 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 4] compaction - [Compact system_distributed_everywhere.cdc_generation_descriptions_v2 36f9faf0-5708-11ec-8592-622d8cdb0fb6] Compacted 2 sstables to [/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-25-big-Data.db:level=0]. 115kB to 170kB (~147% of original) in 14ms = 12MB/s. ~256 total partitions merged to 2.
Dec 07 02:51:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 1] reader_concurrency_semaphore - Semaphore _streaming_concurrency_sem with 10/10 count and 163840/170792058 memory resources: timed out, dumping permit diagnostics:
                                                                                      permits        count        memory        table/description/state
                                                                                      5        5        80K        system_auth.role_attributes/repair-meta/active/unused
                                                                                      3        3        48K        system_auth.role_members/repair-meta/active/unused
                                                                                      2        2        32K        system_auth.roles/repair-meta/active/unused
                                                                                      6        0        0B        system_auth.roles/shard-reader/waiting
                                                                                      6        0        0B        system_auth.role_attributes/shard-reader/waiting
                                                                                      4        0        0B        system_auth.role_members/shard-reader/waiting
                                                                                      
                                                                                      26        10        160K        total
                                                                                      
                                                                                      Total: 26 permits with 10 count and 160K memory resources
Dec 07 02:51:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 1] repair - Failed to read a fragment from the reader, keyspace=system_auth, table=role_members, range=[{4371617655742164722, end}, {4451096726970068417, end}]: seastar::named_semaphore_timed_out (Semaphore timed out: _streaming_concurrency_sem)
Dec 07 03:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] reader_concurrency_semaphore - Semaphore _streaming_concurrency_sem with 10/10 count and 163840/170792058 memory resources: timed out, dumping permit diagnostics:
                                                                                      permits        count        memory        table/description/state
                                                                                      8        8        128K        system_auth.role_attributes/repair-meta/active/unused
                                                                                      1        1        16K        system_auth.role_members/repair-meta/active/unused
                                                                                      1        1        16K        system_auth.roles/repair-meta/active/unused
                                                                                      1        0        0B        system_auth.role_members/shard-reader/waiting
                                                                                      8        0        0B        system_auth.role_attributes/shard-reader/waiting
                                                                                      1        0        0B        system_auth.roles/shard-reader/evicted
                                                                                      
                                                                                      20        10        160K        total
                                                                                      
                                                                                      Total: 20 permits with 10 count and 160K memory resources
Dec 07 03:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] repair - Failed to read a fragment from the reader, keyspace=system_auth, table=role_attributes, range=[{-744873554611290834, end}, {-741035112301363350, end}]: seastar::named_semaphore_timed_out (Semaphore timed out: _streaming_concurrency_sem)

Restore Monitor Stack command: $ hydra investigate show-monitor 9c1ecaf6-195b-4648-9fb5-5b28a471e6bb
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 9c1ecaf6-195b-4648-9fb5-5b28a471e6bb

Test id: 9c1ecaf6-195b-4648-9fb5-5b28a471e6bb

Logs:
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/9c1ecaf6-195b-4648-9fb5-5b28a471e6bb/20211207_042308/db-cluster-9c1ecaf6.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/9c1ecaf6-195b-4648-9fb5-5b28a471e6bb/20211207_042308/loader-set-9c1ecaf6.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/9c1ecaf6-195b-4648-9fb5-5b28a471e6bb/20211207_042308/monitor-set-9c1ecaf6.tar.gz
sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/9c1ecaf6-195b-4648-9fb5-5b28a471e6bb/20211207_042308/sct-runner-9c1ecaf6.tar.gz

Jenkins job URL

The text was updated successfully, but these errors were encountered:

slivne · 2021-12-08T12:15:39Z

@asias please have a look

asias · 2021-12-08T12:32:06Z

@denesb I think we probably have a reproducer for the reader dead lock. The error in the log suggested we spend 30 minutes to read a single fragment.

 426         auto timeout = db::timeout_clock::now() + std::chrono::minutes(30);
 427         _reader.set_timeout(timeout);   // reset to db::no_timeout in pause()
 428         return _reader().then_wrapped([this] (future<mutation_fragment_opt> f) {
 429             try {
 430                 auto mfopt = f.get0();
 431                 ++_reads_finished;
 432                 return mfopt;
 433             } catch (seastar::timed_out_error& e) {
 434                 rlogger.warn("Failed to read a fragment from the reader, keyspace={}, table={}, range={}: {}",
 435                     _schema->ks_name(), _schema->cf_name(), _range, e);
 436                 throw;
 437             } catch (...) {
 438                 throw;
 439             }
 440         });

vponomaryov · 2021-12-10T11:46:52Z

One more:

Installation details
Kernel version: 5.4.0-1035-aws
Scylla version (or git commit hash): 4.7.dev-0.20211126.44f4ea38c with build-id a88083cca1208179dd36a3847af403f77ad794f0
Cluster size: 5 nodes (i3.2xlarge)
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0238a987fdfc50310 (aws: eu-west-1)

Scylla running with shards number (live nodes):
longevity-large-partitions-3h-maste-db-node-1f9bbacb-1 (54.246.63.21 | 10.0.2.27): 4 shards
longevity-large-partitions-3h-maste-db-node-1f9bbacb-2 (54.217.62.85 | 10.0.0.177): 6 shards
longevity-large-partitions-3h-maste-db-node-1f9bbacb-3 (34.243.117.11 | 10.0.3.63): 5 shards
longevity-large-partitions-3h-maste-db-node-1f9bbacb-4 (54.75.60.157 | 10.0.0.142): 5 shards
longevity-large-partitions-3h-maste-db-node-1f9bbacb-5 (54.170.181.141 | 10.0.1.209): 5 shards

Test: longevity-large-partition-asymmetric-cluster-3h Test name: longevity_test.LongevityTest.test_custom_time Test config file(s):

Restore Monitor Stack command: $ hydra investigate show-monitor 1f9bbacb-e846-49db-8a1d-10657b1904f7
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 1f9bbacb-e846-49db-8a1d-10657b1904f7

Test id: 1f9bbacb-e846-49db-8a1d-10657b1904f7

Logs:
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/1f9bbacb-e846-49db-8a1d-10657b1904f7/20211209_130238/db-cluster-1f9bbacb.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/1f9bbacb-e846-49db-8a1d-10657b1904f7/20211209_130238/loader-set-1f9bbacb.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/1f9bbacb-e846-49db-8a1d-10657b1904f7/20211209_130238/monitor-set-1f9bbacb.tar.gz
sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/1f9bbacb-e846-49db-8a1d-10657b1904f7/20211209_130238/sct-runner-1f9bbacb.tar.gz

Jenkins job URL

denesb · 2021-12-14T13:59:55Z

Looks like there is a genuine problem here. @asias do you know when this started happening? Or is this that same phantom block that has been plaguing us for some time now?

asias · 2021-12-15T02:11:30Z

@vponomaryov can you tell us the history of this test? When did it start to fail first with which scylla commit. I do not see the longevity-large-partition-asymmetric-cluster-3h test in the past. Perhaps it is a recent sct test.

vponomaryov · 2021-12-15T10:44:00Z

@vponomaryov can you tell us the history of this test? When did it start to fail first with which scylla commit. I do not see the longevity-large-partition-asymmetric-cluster-3h test in the past. Perhaps it is a recent sct test.

Walked through different builds of that job and I can tell that it is racy bug.
The version where it was caught last time (4.7.dev-0.20211126.44f4ea38c with build-id a88083cca1208179dd36a3847af403f77ad794f0) had also builds not hitting it.

Another one where it was also present, but in a bit different look:
4.6.dev-0.20211101.9e8fc6358 with build-id 62d8c292f9ddc15e61ebd6cd609ac91b80e1532a
Failure from there:

Traceback (most recent call last):                                                                     
  File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 127, in wrapper                        
    return method(*args, **kwargs)                                                                     
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 113, in inner                
    res = func(*args, **kwargs)                                                                        
  File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 546, in setUp                          
    self.set_system_auth_rf()                                                                          
  File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 603, in set_system_auth_rf             
    node.run_nodetool(sub_cmd="repair", args="-- system_auth")                                         
  File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 2687, in run_nodetool                 
    result = self.remoter.run(cmd, timeout=timeout, ignore_status=ignore_status, verbose=verbose)   
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 613, in run                
    result = _run()                                                                                    
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 64, in inner                 
    return func(*args, **kwargs)                                                                       
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 604, in _run               
    return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)              
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 537, in _run_execute       
    result = connection.run(**command_kwargs)                                                          
  File "<decorator-gen-3>", line 2, in run                                                             
  File "/usr/local/lib/python3.10/site-packages/fabric/connection.py", line 30, in opens               
    return method(self, *args, **kwargs)                                                               
  File "/usr/local/lib/python3.10/site-packages/fabric/connection.py", line 723, in run                
    return self._run(self._remote_runner(), command, **kwargs)                                         
  File "/usr/local/lib/python3.10/site-packages/invoke/context.py", line 102, in _run                  
    return runner.run(command, **kwargs)                                                               
  File "/usr/local/lib/python3.10/site-packages/invoke/runners.py", line 380, in run                   
    return self._run_body(command, **kwargs)                                                           
  File "/usr/local/lib/python3.10/site-packages/invoke/runners.py", line 442, in _run_body             
    return self.make_promise() if self._asynchronous else self._finish()                               
  File "/usr/local/lib/python3.10/site-packages/invoke/runners.py", line 509, in _finish               
    raise UnexpectedExit(result)                                                                       
invoke.exceptions.UnexpectedExit: Encountered a bad command exit code!                                 
                                                                                                       
Command: '/usr/bin/nodetool  repair -- system_auth'                                                    
                                                                                                       
Exit code: 2                                                                                           
                                                                                                       
Stdout:                                                                                                
                                                                                                       
[2021-11-05 04:33:20,720] Starting repair command #1, repairing 1 ranges for keyspace system_auth (parallelism=SEQUENTIAL, full=true)
[2021-11-05 05:03:22,070] Repair session 1 failed                                                      
[2021-11-05 05:03:22,070] Repair session 1 finished                                                    
                                                                                                       
Stderr:                                                                                                
                                                                                                       
error: Repair job has failed with the error message: [2021-11-05 05:03:22,070] Repair session 1 failed
-- StackTrace --                                                                                       
java.lang.RuntimeException: Repair job has failed with the error message: [2021-11-05 05:03:22,070] Repair session 1 failed
    at org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:124)                         
    at org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
    at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
    at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
    at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
    at com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)

CI link: https://jenkins.scylladb.com/view/master/job/scylla-master/job/longevity/job/longevity-large-partition-asymmetric-cluster-3h/40

Couple more hits:
4.6.dev-0.20210922.bae9c042c2 with build-id 9c8634cb7c1451d444f62e0547f55bc794a2f3f7
CI link: https://jenkins.scylladb.com/view/master/job/scylla-master/job/longevity/job/longevity-large-partition-asymmetric-cluster-3h/33/

4.6.dev-0.20210922.bae9c042c2 with build-id 9c8634cb7c1451d444f62e0547f55bc794a2f3f7
CI link: https://jenkins.scylladb.com/view/master/job/scylla-master/job/longevity/job/longevity-large-partition-asymmetric-cluster-3h/32

So, build from the September 22 2021 already had this problem and there are no older builds in Jenkins to be checked.

soyacz · 2021-12-17T14:50:19Z

Happened again. Test details
Test: longevity_large_partition_test.LargePartitionLongevityTest.test_large_partition_longevity
Build number: 52
Backend: aws: eu-west-1
Kernel version: 5.11.0-1022-aws
Test-id: 8cc236e7-a1ba-4ff5-b386-feabff9ff8ea
Scylla version: 4.7.dev-0.20211215.3ac622bdd with build-id c19c7740f10f82f554c1b67da69f82fc48f9386f (ami-0d8ea5e3e5d4fbb40)
Instance type: i3.2xlarge
Number of scylladb nodes: 5

Logs:
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/8cc236e7-a1ba-4ff5-b386-feabff9ff8ea/20211216_120543/db-cluster-8cc236e7.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/8cc236e7-a1ba-4ff5-b386-feabff9ff8ea/20211216_120543/loader-set-8cc236e7.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/8cc236e7-a1ba-4ff5-b386-feabff9ff8ea/20211216_120543/monitor-set-8cc236e7.tar.gz
sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/8cc236e7-a1ba-4ff5-b386-feabff9ff8ea/20211216_120543/sct-runner-8cc236e7.tar.gz

denesb · 2021-12-21T09:45:18Z

The problem is that in the case of non-local reads each repair has to obtain 2 permits on the shard where the repair-meta lives:

One in either insert_repair_meta() or run();
One for the shard reader;

If the semaphore is contested, the repair metas can block their own shard readers from being admitted.

slivne · 2021-12-21T09:54:59Z

So what can we do ?

denesb · 2021-12-21T11:15:34Z

I'll cook a patch which ensures only a single blocking admission happens on each shard.

asias · 2021-12-30T02:28:29Z

I'll cook a patch which ensures only a single blocking admission happens on each shard.

Thanks.

BTW, if you send a PR instead of email patches, with one click people can find the relevant patches. Now, I have to search emails to find them. I do not know the subjects of the patches, so I have to search all the emails from you.

denesb · 2022-01-04T07:58:59Z

BTW, if you send a PR instead of email patches, with one click people can find the relevant patches. Now, I have to search emails to find them. I do not know the subjects of the patches, so I have to search all the emails from you.

I know, this is one of the good things PRs have going for them. They didn't work out for me for other reasons. I could have still posted the name of the patch, sorry for that.

juliayakovlev · 2022-01-12T09:52:06Z

Reproduced with 4.7.dev-0.20211230

Installation details
Kernel version: 5.11.0-1022-aws
Scylla version (or git commit hash): 4.7.dev-0.20211230.12fa68fe6 with build-id 938d162cf1be1483853bf5b5417f5daf11e5ac24
Cluster size: 5 nodes (i3.2xlarge)
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0668e251eb4f0f394 (aws: eu-west-1)

Scylla running with shards number (live nodes):
longevity-large-partitions-3h-maste-db-node-64072fe8-1 (54.170.181.144 | 10.0.0.78): 4 shards
longevity-large-partitions-3h-maste-db-node-64072fe8-2 (52.214.100.51 | 10.0.2.59): 7 shards
longevity-large-partitions-3h-maste-db-node-64072fe8-3 (34.244.15.50 | 10.0.1.77): 5 shards
longevity-large-partitions-3h-maste-db-node-64072fe8-4 (54.170.249.5 | 10.0.1.254): 5 shards
longevity-large-partitions-3h-maste-db-node-64072fe8-5 (34.242.237.2 | 10.0.2.233): 7 shards

Test: longevity-large-partition-asymmetric-cluster-3h
Test name: longevity_large_partition_test.LargePartitionLongevityTest.test_large_partition_longevity
Test config file(s):

Restore Monitor Stack command: $ hydra investigate show-monitor 64072fe8-2e5b-42de-9d5e-6fed13d3853e
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 64072fe8-2e5b-42de-9d5e-6fed13d3853e

Test id: 64072fe8-2e5b-42de-9d5e-6fed13d3853e

Logs:
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/64072fe8-2e5b-42de-9d5e-6fed13d3853e/20220109_073401/db-cluster-64072fe8.tar.gz
sct - https://cloudius-jenkins-test.s3.amazonaws.com/64072fe8-2e5b-42de-9d5e-6fed13d3853e/20220109_073401/sct-runner-64072fe8.tar.gz

Jenkins job URL

denesb · 2022-01-12T12:01:55Z

The fix is not merged yet.

fruch · 2022-01-16T10:02:33Z

Happens again:

Test id: ea890b5f-4c80-4967-94b5-8166d00a6643
Logs:
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/ea890b5f-4c80-4967-94b5-8166d00a6643/20220116_052923/db-cluster-ea890b5f.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/ea890b5f-4c80-4967-94b5-8166d00a6643/20220116_052923/loader-set-ea890b5f.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/ea890b5f-4c80-4967-94b5-8166d00a6643/20220116_052923/monitor-set-ea890b5f.tar.gz
sct - https://cloudius-jenkins-test.s3.amazonaws.com/ea890b5f-4c80-4967-94b5-8166d00a6643/20220116_052923/sct-runner-ea890b5f.tar.gz

Jenkins job URL

…s" from Botond " Repair obtains a permit for each repair-meta instance it creates. This permit is supposed to track all resources consumed by that repair as well as ensure concurrency limit is respected. However when the non-local reader path is used (shard config of master != shard config of follower), a second permit will be obtained -- for the shard reader of the multishard reader. This creates a situation where the repair-meta's permit can block the shard permit, creating a deadlock situation. This patch solves this by dropping the count resource on the repair-meta's permit when a non-local reader path is executed -- that is a multishard reader is created. Fixes: #9751 " * 'repair-double-permit-block/v4' of https://github.com/denesb/scylla: repair: make sure there is one permit per repair with count res reader_permit: add release_base_resource() (cherry picked from commit 52b7778)

denesb · 2022-01-17T14:07:32Z

Backported to 4.6. No other releases are affected.

If the permit was admitted, _base_resources was already accounted in _resource and therefore has to be deducted from it, otherwise the permit will think it leaked some resources on destruction. Test: dtest(repair_additional_test.py.test_repair_one_missing_row_diff_shard_count) Refs: #9751 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20220119132550.532073-1-bdenes@scylladb.com>

If the permit was admitted, _base_resources was already accounted in _resource and therefore has to be deducted from it, otherwise the permit will think it leaked some resources on destruction. Test: dtest(repair_additional_test.py.test_repair_one_missing_row_diff_shard_count) Refs: #9751 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20220119132550.532073-1-bdenes@scylladb.com> (cherry picked from commit a65b38a)

asias · 2022-06-20T02:57:55Z

@denesb Could you please confirm 2021.1.10 is not affected?

denesb · 2022-06-20T07:05:52Z

Yes, 2021.1 is not affected.

mykaul · 2023-03-21T16:16:57Z

All relevant backports completed.

roydahan added the triage/master Looking for assignee label Dec 8, 2021

slivne assigned asias Dec 8, 2021

slivne added the area/repair label Dec 8, 2021

roydahan added master/high and removed triage/master Looking for assignee labels Dec 19, 2021

denesb self-assigned this Dec 20, 2021

asias mentioned this issue Jan 17, 2022

Load_and_stream process did not report that loading is done #9902

Closed

scylladb-promoter closed this as completed in 52b7778 Jan 17, 2022

scylladb-promoter added the Backport candidate label Jan 17, 2022

denesb removed the Backport candidate label Jan 17, 2022

scylladb-promoter added the Backport candidate label Jul 21, 2022

DoronArazii added this to the 5.0 milestone Nov 8, 2022

ShlomiBalalis mentioned this issue Dec 14, 2022

system_auth repair constantly fails ad infinitum: Semaphore timed out: _streaming_concurrency_sem #12320

Closed

mykaul removed the Backport candidate label Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

system_auth RF change to 3 fails: got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem) #9751

system_auth RF change to 3 fails: got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem) #9751

vponomaryov commented Dec 7, 2021 •

edited

slivne commented Dec 8, 2021

asias commented Dec 8, 2021

vponomaryov commented Dec 10, 2021

denesb commented Dec 14, 2021

asias commented Dec 15, 2021

vponomaryov commented Dec 15, 2021

soyacz commented Dec 17, 2021

denesb commented Dec 21, 2021

slivne commented Dec 21, 2021

denesb commented Dec 21, 2021

asias commented Dec 30, 2021

denesb commented Jan 4, 2022

juliayakovlev commented Jan 12, 2022

denesb commented Jan 12, 2022

fruch commented Jan 16, 2022

denesb commented Jan 17, 2022

asias commented Jun 20, 2022

denesb commented Jun 20, 2022

mykaul commented Mar 21, 2023

system_auth RF change to 3 fails: got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem) #9751

system_auth RF change to 3 fails: got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem) #9751

Comments

vponomaryov commented Dec 7, 2021 • edited

slivne commented Dec 8, 2021

asias commented Dec 8, 2021

vponomaryov commented Dec 10, 2021

denesb commented Dec 14, 2021

asias commented Dec 15, 2021

vponomaryov commented Dec 15, 2021

soyacz commented Dec 17, 2021

denesb commented Dec 21, 2021

slivne commented Dec 21, 2021

denesb commented Dec 21, 2021

asias commented Dec 30, 2021

denesb commented Jan 4, 2022

juliayakovlev commented Jan 12, 2022

denesb commented Jan 12, 2022

fruch commented Jan 16, 2022

denesb commented Jan 17, 2022

asias commented Jun 20, 2022

denesb commented Jun 20, 2022

mykaul commented Mar 21, 2023

vponomaryov commented Dec 7, 2021 •

edited