Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system_auth RF change to 3 fails: got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem) #9751

Closed
vponomaryov opened this issue Dec 7, 2021 · 19 comments
Assignees
Milestone

Comments

@vponomaryov
Copy link
Contributor

vponomaryov commented Dec 7, 2021

Installation details
Kernel version: 5.4.0-1035-aws
Scylla version (or git commit hash): 4.7.dev-0.20211126.44f4ea38c with build-id a88083cca1208179dd36a3847af403f77ad794f0
Cluster size: 5 nodes (i3.2xlarge)
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0238a987fdfc50310 (aws: eu-west-1)

Scylla running with shards number (live nodes):
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 (3.248.218.248 | 10.0.2.196): 4 shards
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-2 (54.246.70.108 | 10.0.3.20): 6 shards
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 (34.244.105.91 | 10.0.3.96): 6 shards
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 (34.251.238.10 | 10.0.1.158): 6 shards
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 (54.246.48.89 | 10.0.0.44): 7 shards

Test: longevity-large-partition-asymmetric-cluster-3h Test name: longevity_test.LongevityTest.test_custom_time Test config file(s):

Issue description
Running on the asymmetric cluster change of system_auth RF to 3 fails with the following error on the Node 1:
got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem)

And on all other nodes with the following error:
Failed to read a fragment from the reader, keyspace=system_auth, table=roles, range=[{-8833127432019027370, end}, {-8822971771378640915, end}]: seastar::named_semaphore_timed_out (Semaphore timed out: _streaming_concurrency_sem)

More details:

Node 1:

Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 0] schema_tables - Altering keyspace system_auth
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 0] schema_tables - Schema version changed to beb82c8e-e511-3dc9-b498-eb7d3871caa8
...
Dec 07 01:51:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 2] repair - repair id [id=1, uuid=dabb88a7-3a77-4a4c-8a38-30cf7d323d84] on shard 2 completed successfully, keyspace=system_auth
...
Dec 07 01:51:32 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 3] repair - repair id [id=1, uuid=dabb88a7-3a77-4a4c-8a38-30cf7d323d84] on shard 3 completed successfully, keyspace=system_auth
...
Dec 07 02:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 0] repair - repair id [id=1, uuid=dabb88a7-3a77-4a4c-8a38-30cf7d323d84] on shard 0, keyspace=system_auth, cf=roles, range=(-8421410461930041608, -8417409559063327415], got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem)
...
Dec 07 02:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 1] repair - repair id [id=1, uuid=dabb88a7-3a77-4a4c-8a38-30cf7d323d84] on shard 1, keyspace=system_auth, cf=roles, range=(-3886665998130067779, -3884745563637348092], got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem)
...
Dec 07 03:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 0] repair - repair id [id=1, uuid=dabb88a7-3a77-4a4c-8a38-30cf7d323d84] on shard 0, keyspace=system_auth, cf=role_attributes, range=(-744873554611290834, -741035112301363350], got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem)
...
Dec 07 04:21:32 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 0] repair - repair[dabb88a7-3a77-4a4c-8a38-30cf7d323d84]: Started to shutdown off-strategy compaction updater
Dec 07 04:21:32 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 0] repair - repair[dabb88a7-3a77-4a4c-8a38-30cf7d323d84]: Finished to shutdown off-strategy compaction updater
Dec 07 04:21:32 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 scylla[13255]:  [shard 0] repair - repair_tracker run for repair id [id=1, uuid=dabb88a7-3a77-4a4c-8a38-30cf7d323d84] failed: std::runtime_error ({shard 0: std::runtime_error (Failed to repair for keyspace=system_auth, cf=roles, range=(-9210611041549644410, -9179067137543748740]), shard 1: std::runtime_error (Failed to repair for keyspace=system_auth, cf=role_members, range=(-3997244064957500063, -3981095411310064477])})

Node 2:

Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-2 scylla[13981]:  [shard 0] schema_tables - Altering keyspace system_auth
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-2 scylla[13981]:  [shard 0] schema_tables - Schema version changed to beb82c8e-e511-3dc9-b498-eb7d3871caa8
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-2 scylla[13981]:  [shard 0] compaction - [Compact system_schema.keyspaces 31e15430-5700-11ec-844e-13d36e5eb614] Compacted 2 sstables to [/var/lib/scylla/data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/md-30-big-Data.db:level=0]. 11kB to 6kB (~52% of original) in 51ms = 118kB/s. ~256 total partitions merged to 6.
Dec 07 02:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-2 scylla[13981]:  [shard 0] reader_concurrency_semaphore - Semaphore _streaming_concurrency_sem with 10/10 count and 163840/199229440 memory resources: timed out, dumping permit diagnostics:
                                                                                      permits        count        memory        table/description/state
                                                                                      10        10        160K        system_auth.roles/repair-meta/active/unused
                                                                                      10        0        0B        system_auth.roles/shard-reader/waiting
                                                                                      
                                                                                      20        10        160K        total
                                                                                      
                                                                                      Total: 20 permits with 10 count and 160K memory resources
Dec 07 02:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-2 scylla[13981]:  [shard 0] repair - Failed to read a fragment from the reader, keyspace=system_auth, table=roles, range=[{-8833127432019027370, end}, {-8822971771378640915, end}]: seastar::named_semaphore_timed_out (Semaphore timed out: _streaming_concurrency_sem)

Node 3:

Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 0] schema_tables - Altering keyspace system_auth
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 0] schema_tables - Schema version changed to beb82c8e-e511-3dc9-b498-eb7d3871caa8
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 0] compaction - [Compact system_schema.keyspaces 31e0df00-5700-11ec-b417-37d90f9b64b9] Compacted 2 sstables to [/var/lib/scylla/data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/md-42-big-Data.db:level=0]. 11kB to 6kB (~51% of original) in 51ms = 118kB/s. ~256 total partitions merged to 6.
Dec 07 02:41:45 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 1] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 37d44ad0-5707-11ec-a43f-37da0f9b64b9] Compacting [/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-13-big-Data.db:level=0:origin=streaming,/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-7-big-Data.db:level=0:origin=streaming]
Dec 07 02:41:45 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 0] compaction - [Compact system_schema.columns 37d44ad0-5707-11ec-b417-37d90f9b64b9] Compacting [/var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc-30-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc-36-big-Data.db:level=0:origin=compaction]
Dec 07 02:41:45 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 4] compaction - [Compact system_distributed_everywhere.cdc_generation_descriptions_v2 37d44ad0-5707-11ec-87b3-37d50f9b64b9] Compacting [/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-16-big-Data.db:level=0:origin=streaming,/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-10-big-Data.db:level=0:origin=streaming]
Dec 07 02:41:45 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 4] compaction - [Compact system_distributed_everywhere.cdc_generation_descriptions_v2 37d44ad0-5707-11ec-87b3-37d50f9b64b9] Compacted 2 sstables to [/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-22-big-Data.db:level=0]. 69kB to 134kB (~193% of original) in 13ms = 10MB/s. ~256 total partitions merged to 2.
Dec 07 02:41:45 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 0] compaction - [Compact system_schema.columns 37d44ad0-5707-11ec-b417-37d90f9b64b9] Compacted 2 sstables to [/var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/md-42-big-Data.db:level=0]. 26kB to 18kB (~68% of original) in 13ms = 1MB/s. ~256 total partitions merged to 6.
Dec 07 02:41:45 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 1] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 37d44ad0-5707-11ec-a43f-37da0f9b64b9] Compacted 2 sstables to [/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-19-big-Data.db:level=0]. 91kB to 144kB (~157% of original) in 18ms = 8MB/s. ~256 total partitions merged to 2.
Dec 07 03:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 1] reader_concurrency_semaphore - Semaphore _streaming_concurrency_sem with 10/10 count and 163840/199229440 memory resources: timed out, dumping permit diagnostics:
                                                                                      permits        count        memory        table/description/state
                                                                                      5        5        80K        system_auth.role_members/repair-meta/active/unused
                                                                                      3        3        48K        system_auth.roles/repair-meta/active/unused
                                                                                      2        2        32K        system_auth.role_attributes/repair-meta/active/unused
                                                                                      3        0        0B        system_auth.role_attributes/shard-reader/waiting
                                                                                      6        0        0B        system_auth.role_members/shard-reader/waiting
                                                                                      4        0        0B        system_auth.roles/shard-reader/waiting
                                                                                      
                                                                                      23        10        160K        total
                                                                                      
                                                                                      Total: 23 permits with 10 count and 160K memory resources
Dec 07 03:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 scylla[15884]:  [shard 1] repair - Failed to read a fragment from the reader, keyspace=system_auth, table=roles, range=[{4797201525887148681, end}, {4798014872356268772, end}]: seastar::named_semaphore_timed_out (Semaphore timed out: _streaming_concurrency_sem)

Node 4

Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 0] schema_tables - Altering keyspace system_auth
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 0] schema_tables - Schema version changed to beb82c8e-e511-3dc9-b498-eb7d3871caa8
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 0] compaction - [Compact system_schema.keyspaces 31e0b7f0-5700-11ec-b215-383abf074923] Compacted 3 sstables to [/var/lib/scylla/data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/md-54-big-Data.db:level=0]. 17kB to 6kB (~35% of original) in 61ms = 101kB/s. ~384 total partitions merged to 6.
Dec 07 02:45:19 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 0] compaction - [Compact system_schema.columns b729a4b0-5707-11ec-b215-383abf074923] Compacting [/var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc-54-big-Data.db:level=0:origin=memtable,/var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/mc-48-big-Data.db:level=0:origin=compaction]
Dec 07 02:45:19 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 4] compaction - [Compact system_distributed_everywhere.cdc_generation_descriptions_v2 b729f2d0-5707-11ec-9658-383bbf074923] Compacting [/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-16-big-Data.db:level=0:origin=streaming,/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-10-big-Data.db:level=0:origin=streaming]
Dec 07 02:45:19 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 1] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 b72a19e0-5707-11ec-9399-383cbf074923] Compacting [/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-7-big-Data.db:level=0:origin=streaming,/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-13-big-Data.db:level=0:origin=streaming]
Dec 07 02:45:19 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 4] compaction - [Compact system_distributed_everywhere.cdc_generation_descriptions_v2 b729f2d0-5707-11ec-9658-383bbf074923] Compacted 2 sstables to [/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-22-big-Data.db:level=0]. 69kB to 134kB (~192% of original) in 9ms = 14MB/s. ~256 total partitions merged to 2.
Dec 07 02:45:19 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 0] compaction - [Compact system_schema.columns b729a4b0-5707-11ec-b215-383abf074923] Compacted 2 sstables to [/var/lib/scylla/data/system_schema/columns-24101c25a2ae3af787c1b40ee1aca33f/md-60-big-Data.db:level=0]. 26kB to 18kB (~68% of original) in 14ms = 1MB/s. ~256 total partitions merged to 6.
Dec 07 02:45:19 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 1] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 b72a19e0-5707-11ec-9399-383cbf074923] Compacted 2 sstables to [/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-19-big-Data.db:level=0]. 161kB to 163kB (~100% of original) in 17ms = 9MB/s. ~256 total partitions merged to 2.
Dec 07 04:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 0] reader_concurrency_semaphore - Semaphore _streaming_concurrency_sem with 10/10 count and 163840/199229440 memory resources: timed out, dumping permit diagnostics:
                                                                                      permits        count        memory        table/description/state
                                                                                      4        4        64K        system_auth.role_members/repair-meta/active/unused
                                                                                      3        3        48K        system_auth.roles/repair-meta/active/unused
                                                                                      3        3        48K        system_auth.role_attributes/repair-meta/active/unused
                                                                                      3        0        0B        system_auth.role_members/shard-reader/waiting
                                                                                      3        0        0B        system_auth.roles/shard-reader/evicted
                                                                                      1        0        0B        system_auth.role_members/shard-reader/evicted
                                                                                      3        0        0B        system_auth.role_attributes/shard-reader/evicted
                                                                                      
                                                                                      20        10        160K        total
                                                                                      
                                                                                      Total: 20 permits with 10 count and 160K memory resources
Dec 07 04:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 scylla[17699]:  [shard 0] repair - Failed to read a fragment from the reader, keyspace=system_auth, table=role_members, range=[{3069961132509719081, end}, {3077107436340118180, end}]: seastar::named_semaphore_timed_out (Semaphore timed out: _streaming_concurrency_sem)

Node 5:

Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] schema_tables - Altering keyspace system_auth
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] schema_tables - Schema version changed to beb82c8e-e511-3dc9-b498-eb7d3871caa8
Dec 07 01:51:29 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] compaction - [Compact system_schema.keyspaces 31e28cb0-5700-11ec-af32-62308cdb0fb6] Compacted 3 sstables to [/var/lib/scylla/data/system_schema/keyspaces-abac5682dea631c5b535b3d6cffd0fb6/md-84-big-Data.db:level=0]. 17kB to 6kB (~36% of original) in 62ms = 100kB/s. ~384 total partitions merged to 6.
Dec 07 02:17:37 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] rpc - client 162.142.125.194:46704: server connection dropped: read: Connection reset by peer
Dec 07 02:17:37 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 6] rpc - client 162.142.125.194:41236: wrong protocol magic: 160301007b010000
Dec 07 02:17:37 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 6] rpc - client 162.142.125.194:41236: server connection dropped: connection is closed
Dec 07 02:17:38 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] rpc - client 162.142.125.194:45178: wrong protocol magic: 474554202f204854
Dec 07 02:17:38 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] rpc - client 162.142.125.194:45178: server connection dropped: connection is closed
Dec 07 02:17:39 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] rpc - client 162.142.125.194:41776: server connection dropped: read: Connection reset by peer
Dec 07 02:48:53 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 2] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 36f9faf0-5708-11ec-8fbc-622e8cdb0fb6] Compacting [/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-16-big-Data.db:level=0:origin=streaming,/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-9-big-Data.db:level=0:origin=streaming]
Dec 07 02:48:53 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 4] compaction - [Compact system_distributed_everywhere.cdc_generation_descriptions_v2 36f9faf0-5708-11ec-8592-622d8cdb0fb6] Compacting [/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-11-big-Data.db:level=0:origin=streaming,/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-18-big-Data.db:level=0:origin=streaming]
Dec 07 02:48:53 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 2] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 36f9faf0-5708-11ec-8fbc-622e8cdb0fb6] Compacted 2 sstables to [/var/lib/scylla/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/md-23-big-Data.db:level=0]. 161kB to 163kB (~100% of original) in 12ms = 13MB/s. ~256 total partitions merged to 2.
Dec 07 02:48:53 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 4] compaction - [Compact system_distributed_everywhere.cdc_generation_descriptions_v2 36f9faf0-5708-11ec-8592-622d8cdb0fb6] Compacted 2 sstables to [/var/lib/scylla/data/system_distributed_everywhere/cdc_generation_descriptions_v2-234d2227dd633d37ac5fc013e2ea9e6e/md-25-big-Data.db:level=0]. 115kB to 170kB (~147% of original) in 14ms = 12MB/s. ~256 total partitions merged to 2.
Dec 07 02:51:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 1] reader_concurrency_semaphore - Semaphore _streaming_concurrency_sem with 10/10 count and 163840/170792058 memory resources: timed out, dumping permit diagnostics:
                                                                                      permits        count        memory        table/description/state
                                                                                      5        5        80K        system_auth.role_attributes/repair-meta/active/unused
                                                                                      3        3        48K        system_auth.role_members/repair-meta/active/unused
                                                                                      2        2        32K        system_auth.roles/repair-meta/active/unused
                                                                                      6        0        0B        system_auth.roles/shard-reader/waiting
                                                                                      6        0        0B        system_auth.role_attributes/shard-reader/waiting
                                                                                      4        0        0B        system_auth.role_members/shard-reader/waiting
                                                                                      
                                                                                      26        10        160K        total
                                                                                      
                                                                                      Total: 26 permits with 10 count and 160K memory resources
Dec 07 02:51:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 1] repair - Failed to read a fragment from the reader, keyspace=system_auth, table=role_members, range=[{4371617655742164722, end}, {4451096726970068417, end}]: seastar::named_semaphore_timed_out (Semaphore timed out: _streaming_concurrency_sem)
Dec 07 03:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] reader_concurrency_semaphore - Semaphore _streaming_concurrency_sem with 10/10 count and 163840/170792058 memory resources: timed out, dumping permit diagnostics:
                                                                                      permits        count        memory        table/description/state
                                                                                      8        8        128K        system_auth.role_attributes/repair-meta/active/unused
                                                                                      1        1        16K        system_auth.role_members/repair-meta/active/unused
                                                                                      1        1        16K        system_auth.roles/repair-meta/active/unused
                                                                                      1        0        0B        system_auth.role_members/shard-reader/waiting
                                                                                      8        0        0B        system_auth.role_attributes/shard-reader/waiting
                                                                                      1        0        0B        system_auth.roles/shard-reader/evicted
                                                                                      
                                                                                      20        10        160K        total
                                                                                      
                                                                                      Total: 20 permits with 10 count and 160K memory resources
Dec 07 03:21:31 longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 scylla[19665]:  [shard 0] repair - Failed to read a fragment from the reader, keyspace=system_auth, table=role_attributes, range=[{-744873554611290834, end}, {-741035112301363350, end}]: seastar::named_semaphore_timed_out (Semaphore timed out: _streaming_concurrency_sem)

Restore Monitor Stack command: $ hydra investigate show-monitor 9c1ecaf6-195b-4648-9fb5-5b28a471e6bb
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 9c1ecaf6-195b-4648-9fb5-5b28a471e6bb

Test id: 9c1ecaf6-195b-4648-9fb5-5b28a471e6bb

Logs:
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/9c1ecaf6-195b-4648-9fb5-5b28a471e6bb/20211207_042308/db-cluster-9c1ecaf6.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/9c1ecaf6-195b-4648-9fb5-5b28a471e6bb/20211207_042308/loader-set-9c1ecaf6.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/9c1ecaf6-195b-4648-9fb5-5b28a471e6bb/20211207_042308/monitor-set-9c1ecaf6.tar.gz
sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/9c1ecaf6-195b-4648-9fb5-5b28a471e6bb/20211207_042308/sct-runner-9c1ecaf6.tar.gz

Jenkins job URL

@roydahan roydahan added the triage/master Looking for assignee label Dec 8, 2021
@slivne
Copy link
Contributor

slivne commented Dec 8, 2021

@asias please have a look

@asias
Copy link
Contributor

asias commented Dec 8, 2021

@denesb I think we probably have a reproducer for the reader dead lock. The error in the log suggested we spend 30 minutes to read a single fragment.

 426         auto timeout = db::timeout_clock::now() + std::chrono::minutes(30);
 427         _reader.set_timeout(timeout);   // reset to db::no_timeout in pause()
 428         return _reader().then_wrapped([this] (future<mutation_fragment_opt> f) {
 429             try {
 430                 auto mfopt = f.get0();
 431                 ++_reads_finished;
 432                 return mfopt;
 433             } catch (seastar::timed_out_error& e) {
 434                 rlogger.warn("Failed to read a fragment from the reader, keyspace={}, table={}, range={}: {}",
 435                     _schema->ks_name(), _schema->cf_name(), _range, e);
 436                 throw;
 437             } catch (...) {
 438                 throw;
 439             }
 440         });

@vponomaryov
Copy link
Contributor Author

One more:

Installation details
Kernel version: 5.4.0-1035-aws
Scylla version (or git commit hash): 4.7.dev-0.20211126.44f4ea38c with build-id a88083cca1208179dd36a3847af403f77ad794f0
Cluster size: 5 nodes (i3.2xlarge)
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0238a987fdfc50310 (aws: eu-west-1)

Scylla running with shards number (live nodes):
longevity-large-partitions-3h-maste-db-node-1f9bbacb-1 (54.246.63.21 | 10.0.2.27): 4 shards
longevity-large-partitions-3h-maste-db-node-1f9bbacb-2 (54.217.62.85 | 10.0.0.177): 6 shards
longevity-large-partitions-3h-maste-db-node-1f9bbacb-3 (34.243.117.11 | 10.0.3.63): 5 shards
longevity-large-partitions-3h-maste-db-node-1f9bbacb-4 (54.75.60.157 | 10.0.0.142): 5 shards
longevity-large-partitions-3h-maste-db-node-1f9bbacb-5 (54.170.181.141 | 10.0.1.209): 5 shards

Test: longevity-large-partition-asymmetric-cluster-3h Test name: longevity_test.LongevityTest.test_custom_time Test config file(s):

Restore Monitor Stack command: $ hydra investigate show-monitor 1f9bbacb-e846-49db-8a1d-10657b1904f7
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 1f9bbacb-e846-49db-8a1d-10657b1904f7

Test id: 1f9bbacb-e846-49db-8a1d-10657b1904f7

Logs:
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/1f9bbacb-e846-49db-8a1d-10657b1904f7/20211209_130238/db-cluster-1f9bbacb.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/1f9bbacb-e846-49db-8a1d-10657b1904f7/20211209_130238/loader-set-1f9bbacb.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/1f9bbacb-e846-49db-8a1d-10657b1904f7/20211209_130238/monitor-set-1f9bbacb.tar.gz
sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/1f9bbacb-e846-49db-8a1d-10657b1904f7/20211209_130238/sct-runner-1f9bbacb.tar.gz

Jenkins job URL

@denesb
Copy link
Contributor

denesb commented Dec 14, 2021

Looks like there is a genuine problem here. @asias do you know when this started happening? Or is this that same phantom block that has been plaguing us for some time now?

@asias
Copy link
Contributor

asias commented Dec 15, 2021

@vponomaryov can you tell us the history of this test? When did it start to fail first with which scylla commit. I do not see the longevity-large-partition-asymmetric-cluster-3h test in the past. Perhaps it is a recent sct test.

@vponomaryov
Copy link
Contributor Author

@vponomaryov can you tell us the history of this test? When did it start to fail first with which scylla commit. I do not see the longevity-large-partition-asymmetric-cluster-3h test in the past. Perhaps it is a recent sct test.

Walked through different builds of that job and I can tell that it is racy bug.
The version where it was caught last time (4.7.dev-0.20211126.44f4ea38c with build-id a88083cca1208179dd36a3847af403f77ad794f0) had also builds not hitting it.

Another one where it was also present, but in a bit different look:
4.6.dev-0.20211101.9e8fc6358 with build-id 62d8c292f9ddc15e61ebd6cd609ac91b80e1532a
Failure from there:

Traceback (most recent call last):                                                                     
  File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 127, in wrapper                        
    return method(*args, **kwargs)                                                                     
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 113, in inner                
    res = func(*args, **kwargs)                                                                        
  File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 546, in setUp                          
    self.set_system_auth_rf()                                                                          
  File "/home/ubuntu/scylla-cluster-tests/sdcm/tester.py", line 603, in set_system_auth_rf             
    node.run_nodetool(sub_cmd="repair", args="-- system_auth")                                         
  File "/home/ubuntu/scylla-cluster-tests/sdcm/cluster.py", line 2687, in run_nodetool                 
    result = self.remoter.run(cmd, timeout=timeout, ignore_status=ignore_status, verbose=verbose)   
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 613, in run                
    result = _run()                                                                                    
  File "/home/ubuntu/scylla-cluster-tests/sdcm/utils/decorators.py", line 64, in inner                 
    return func(*args, **kwargs)                                                                       
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 604, in _run               
    return self._run_execute(cmd, timeout, ignore_status, verbose, new_session, watchers)              
  File "/home/ubuntu/scylla-cluster-tests/sdcm/remote/remote_base.py", line 537, in _run_execute       
    result = connection.run(**command_kwargs)                                                          
  File "<decorator-gen-3>", line 2, in run                                                             
  File "/usr/local/lib/python3.10/site-packages/fabric/connection.py", line 30, in opens               
    return method(self, *args, **kwargs)                                                               
  File "/usr/local/lib/python3.10/site-packages/fabric/connection.py", line 723, in run                
    return self._run(self._remote_runner(), command, **kwargs)                                         
  File "/usr/local/lib/python3.10/site-packages/invoke/context.py", line 102, in _run                  
    return runner.run(command, **kwargs)                                                               
  File "/usr/local/lib/python3.10/site-packages/invoke/runners.py", line 380, in run                   
    return self._run_body(command, **kwargs)                                                           
  File "/usr/local/lib/python3.10/site-packages/invoke/runners.py", line 442, in _run_body             
    return self.make_promise() if self._asynchronous else self._finish()                               
  File "/usr/local/lib/python3.10/site-packages/invoke/runners.py", line 509, in _finish               
    raise UnexpectedExit(result)                                                                       
invoke.exceptions.UnexpectedExit: Encountered a bad command exit code!                                 
                                                                                                       
Command: '/usr/bin/nodetool  repair -- system_auth'                                                    
                                                                                                       
Exit code: 2                                                                                           
                                                                                                       
Stdout:                                                                                                
                                                                                                       
[2021-11-05 04:33:20,720] Starting repair command #1, repairing 1 ranges for keyspace system_auth (parallelism=SEQUENTIAL, full=true)
[2021-11-05 05:03:22,070] Repair session 1 failed                                                      
[2021-11-05 05:03:22,070] Repair session 1 finished                                                    
                                                                                                       
Stderr:                                                                                                
                                                                                                       
error: Repair job has failed with the error message: [2021-11-05 05:03:22,070] Repair session 1 failed
-- StackTrace --                                                                                       
java.lang.RuntimeException: Repair job has failed with the error message: [2021-11-05 05:03:22,070] Repair session 1 failed
    at org.apache.cassandra.tools.RepairRunner.progress(RepairRunner.java:124)                         
    at org.apache.cassandra.utils.progress.jmx.JMXNotificationProgressListener.handleNotification(JMXNotificationProgressListener.java:77)
    at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.dispatchNotification(ClientNotifForwarder.java:583)
    at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.doRun(ClientNotifForwarder.java:533)
    at com.sun.jmx.remote.internal.ClientNotifForwarder$NotifFetcher.run(ClientNotifForwarder.java:452)
    at com.sun.jmx.remote.internal.ClientNotifForwarder$LinearExecutor$1.run(ClientNotifForwarder.java:108)

CI link: https://jenkins.scylladb.com/view/master/job/scylla-master/job/longevity/job/longevity-large-partition-asymmetric-cluster-3h/40

Couple more hits:
4.6.dev-0.20210922.bae9c042c2 with build-id 9c8634cb7c1451d444f62e0547f55bc794a2f3f7
CI link: https://jenkins.scylladb.com/view/master/job/scylla-master/job/longevity/job/longevity-large-partition-asymmetric-cluster-3h/33/

4.6.dev-0.20210922.bae9c042c2 with build-id 9c8634cb7c1451d444f62e0547f55bc794a2f3f7
CI link: https://jenkins.scylladb.com/view/master/job/scylla-master/job/longevity/job/longevity-large-partition-asymmetric-cluster-3h/32

So, build from the September 22 2021 already had this problem and there are no older builds in Jenkins to be checked.

@soyacz
Copy link
Contributor

soyacz commented Dec 17, 2021

Happened again. Test details
Test: longevity_large_partition_test.LargePartitionLongevityTest.test_large_partition_longevity
Build number: 52
Backend: aws: eu-west-1
Kernel version: 5.11.0-1022-aws
Test-id: 8cc236e7-a1ba-4ff5-b386-feabff9ff8ea
Scylla version: 4.7.dev-0.20211215.3ac622bdd with build-id c19c7740f10f82f554c1b67da69f82fc48f9386f (ami-0d8ea5e3e5d4fbb40)
Instance type: i3.2xlarge
Number of scylladb nodes: 5

Logs:
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/8cc236e7-a1ba-4ff5-b386-feabff9ff8ea/20211216_120543/db-cluster-8cc236e7.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/8cc236e7-a1ba-4ff5-b386-feabff9ff8ea/20211216_120543/loader-set-8cc236e7.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/8cc236e7-a1ba-4ff5-b386-feabff9ff8ea/20211216_120543/monitor-set-8cc236e7.tar.gz
sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/8cc236e7-a1ba-4ff5-b386-feabff9ff8ea/20211216_120543/sct-runner-8cc236e7.tar.gz

@roydahan roydahan added master/high and removed triage/master Looking for assignee labels Dec 19, 2021
@denesb denesb self-assigned this Dec 20, 2021
@denesb
Copy link
Contributor

denesb commented Dec 21, 2021

The problem is that in the case of non-local reads each repair has to obtain 2 permits on the shard where the repair-meta lives:

  • One in either insert_repair_meta() or run();
  • One for the shard reader;

If the semaphore is contested, the repair metas can block their own shard readers from being admitted.

@slivne
Copy link
Contributor

slivne commented Dec 21, 2021

So what can we do ?

@denesb
Copy link
Contributor

denesb commented Dec 21, 2021

I'll cook a patch which ensures only a single blocking admission happens on each shard.

@asias
Copy link
Contributor

asias commented Dec 30, 2021

I'll cook a patch which ensures only a single blocking admission happens on each shard.

Thanks.

BTW, if you send a PR instead of email patches, with one click people can find the relevant patches. Now, I have to search emails to find them. I do not know the subjects of the patches, so I have to search all the emails from you.

@denesb
Copy link
Contributor

denesb commented Jan 4, 2022

BTW, if you send a PR instead of email patches, with one click people can find the relevant patches. Now, I have to search emails to find them. I do not know the subjects of the patches, so I have to search all the emails from you.

I know, this is one of the good things PRs have going for them. They didn't work out for me for other reasons. I could have still posted the name of the patch, sorry for that.

@juliayakovlev
Copy link

Reproduced with 4.7.dev-0.20211230

Installation details
Kernel version: 5.11.0-1022-aws
Scylla version (or git commit hash): 4.7.dev-0.20211230.12fa68fe6 with build-id 938d162cf1be1483853bf5b5417f5daf11e5ac24
Cluster size: 5 nodes (i3.2xlarge)
OS (RHEL/CentOS/Ubuntu/AWS AMI): ami-0668e251eb4f0f394 (aws: eu-west-1)

Scylla running with shards number (live nodes):
longevity-large-partitions-3h-maste-db-node-64072fe8-1 (54.170.181.144 | 10.0.0.78): 4 shards
longevity-large-partitions-3h-maste-db-node-64072fe8-2 (52.214.100.51 | 10.0.2.59): 7 shards
longevity-large-partitions-3h-maste-db-node-64072fe8-3 (34.244.15.50 | 10.0.1.77): 5 shards
longevity-large-partitions-3h-maste-db-node-64072fe8-4 (54.170.249.5 | 10.0.1.254): 5 shards
longevity-large-partitions-3h-maste-db-node-64072fe8-5 (34.242.237.2 | 10.0.2.233): 7 shards

Test: longevity-large-partition-asymmetric-cluster-3h
Test name: longevity_large_partition_test.LargePartitionLongevityTest.test_large_partition_longevity
Test config file(s):

Restore Monitor Stack command: $ hydra investigate show-monitor 64072fe8-2e5b-42de-9d5e-6fed13d3853e
Restore monitor on AWS instance using Jenkins job
Show all stored logs command: $ hydra investigate show-logs 64072fe8-2e5b-42de-9d5e-6fed13d3853e

Test id: 64072fe8-2e5b-42de-9d5e-6fed13d3853e

Logs:
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/64072fe8-2e5b-42de-9d5e-6fed13d3853e/20220109_073401/db-cluster-64072fe8.tar.gz
sct - https://cloudius-jenkins-test.s3.amazonaws.com/64072fe8-2e5b-42de-9d5e-6fed13d3853e/20220109_073401/sct-runner-64072fe8.tar.gz

Jenkins job URL

@denesb
Copy link
Contributor

denesb commented Jan 12, 2022

The fix is not merged yet.

denesb pushed a commit that referenced this issue Jan 17, 2022
…s" from Botond

"
Repair obtains a permit for each repair-meta instance it creates. This
permit is supposed to track all resources consumed by that repair as
well as ensure concurrency limit is respected. However when the
non-local reader path is used (shard config of master != shard config of
follower), a second permit will be obtained -- for the shard reader of
the multishard reader. This creates a situation where the repair-meta's
permit can block the shard permit, creating a deadlock situation.
This patch solves this by dropping the count resource on the
repair-meta's permit when a non-local reader path is executed -- that is
a multishard reader is created.

Fixes: #9751
"

* 'repair-double-permit-block/v4' of https://github.com/denesb/scylla:
  repair: make sure there is one permit per repair with count res
  reader_permit: add release_base_resource()

(cherry picked from commit 52b7778)
@denesb
Copy link
Contributor

denesb commented Jan 17, 2022

Backported to 4.6. No other releases are affected.

avikivity pushed a commit that referenced this issue Jan 19, 2022
If the permit was admitted, _base_resources was already accounted in
_resource and therefore has to be deducted from it, otherwise the permit
will think it leaked some resources on destruction.

Test:
dtest(repair_additional_test.py.test_repair_one_missing_row_diff_shard_count)

Refs: #9751
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20220119132550.532073-1-bdenes@scylladb.com>
avikivity pushed a commit that referenced this issue Jan 20, 2022
If the permit was admitted, _base_resources was already accounted in
_resource and therefore has to be deducted from it, otherwise the permit
will think it leaked some resources on destruction.

Test:
dtest(repair_additional_test.py.test_repair_one_missing_row_diff_shard_count)

Refs: #9751
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20220119132550.532073-1-bdenes@scylladb.com>
avikivity pushed a commit that referenced this issue Jan 20, 2022
If the permit was admitted, _base_resources was already accounted in
_resource and therefore has to be deducted from it, otherwise the permit
will think it leaked some resources on destruction.

Test:
dtest(repair_additional_test.py.test_repair_one_missing_row_diff_shard_count)

Refs: #9751
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20220119132550.532073-1-bdenes@scylladb.com>
(cherry picked from commit a65b38a)
@asias
Copy link
Contributor

asias commented Jun 20, 2022

@denesb Could you please confirm 2021.1.10 is not affected?

@denesb
Copy link
Contributor

denesb commented Jun 20, 2022

Yes, 2021.1 is not affected.

@mykaul
Copy link
Contributor

mykaul commented Mar 21, 2023

All relevant backports completed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests