Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_remove_node_during_mv_insert_4_nodes: sstable - failed reading index: seastar::named_semaphore_timed_out (Semaphore timed out: _read_concurrency_sem) #12603

Closed
bhalevy opened this issue Jan 22, 2023 · 4 comments
Assignees
Milestone

Comments

@bhalevy
Copy link
Member

bhalevy commented Jan 22, 2023

Regression seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/179/testReport/materialized_views_test/TestMaterializedViews/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split013___test_remove_node_during_mv_insert_4_nodes/
With scylla version aab5954

failed on teardown with "AssertionError: Unexpected errors found: [('node3', ['ERROR 2023-01-19 06:13:55,457 [shard 0] sstable - failed reading index for /jenkins/workspace/scylla-master/dtest-daily-release/scylla/.dtest/dtest-9ngjriog/test/node3/data/mview/users-51f7426097bf11ed978b68c684f7542b/me-42-big-Data.db: seastar::named_semaphore_timed_out (Semaphore timed out: _read_concurrency_sem)', 'ERROR 2023-01-19 06:13:55,460 [shard 0] sstable - failed reading index for /jenkins/workspace/scylla-master/dtest-daily-release/scylla/.dtest/dtest-9ngjriog/test/node3/data/mview/users-51f7426097bf11ed978b68c684f7542b/me-8-big-Data.db: seastar::named_semaphore_timed_out (Semaphore timed out: _read_concurrency_sem)' ...

@denesb please look into this asap

@bhalevy bhalevy added triage/master Looking for assignee P1 Urgent labels Jan 22, 2023
@DoronArazii DoronArazii added this to the 5.2 milestone Jan 22, 2023
@denesb
Copy link
Contributor

denesb commented Jan 24, 2023

There are reports like this in the logs of node3:

INFO  2023-01-19 06:13:55,453 [shard 0] reader_concurrency_semaphore - Semaphore _read_concurrency_sem with 99/100 count and 20408624/10192158 memory resources: timed out, dumping permit diagnostics:
permits	count	memory	table/description/state
99	99	19M	mview.users/shard-reader/waiting_for_memory
249	0	16K	mview.users/multishard-mutation-query/active/unused
150	0	0B	mview.users/shard-reader/waiting_for_admission

This seems to be a kind of deadlock: the top-level multishard reader is waiting for its shard readers. Its shard readers are all blocked on memory, the semaphore is waiting for the multishard readers to release some to make progress.
I think we need a special breakout condition here.

@denesb
Copy link
Contributor

denesb commented Jan 31, 2023

I have been unsuccessful so far in reproducing this. I saw 3 reports that all look like the same: 99 reads waiting for memory in what seems to be a deadlock, with lots of waiters for admission.
The reason I think there is a deadlock involved here is that no permit with count resource is in any active/* state. The multishard mutation reader (which is in active/* state) is not doing any actual reading, it is just merging the output of the shard readers, which all seem to be blocked.
I think the bug is possibly in the interaction between inactive reads and the read serialization.

@DoronArazii DoronArazii added type/bug and removed triage/master Looking for assignee labels Feb 1, 2023
@denesb
Copy link
Contributor

denesb commented Feb 1, 2023

After staring at the code hard enough for long enough it finally confessed and I reproduced this with a unit test, on a smaller scale:

Semaphore test_reader_concurrency_semaphore_blessed_read_goes_inactive with 2/3 count and 5120/3072 memory resources: user request, dumping permit diagnostics:
permits count   memory  table/description/state
1       1       3K      ks.cf/permit3/waiting_for_memory
1       1       2K      ks.cf/permit2/waiting_for_memory
1       0       0B      ks.cf/permit1/waiting_for_admission

3       2       5K      total

The problem happens when a permit becomes blessed, then it is registered inactive, then it is evicted and thus have to wait for readmission. If the situation is just right, the permit is queued on the admission queue, while still being the blessed permit. If the semaphore is above the serialize limit and reads don't release memory, the situation seen above can happen: all reads get queued on memory, with no progress because the blessed permit is waiting for admission. Deadlock.
The solution is to release the blessed status upon becoming inactive.

denesb added a commit to denesb/scylla that referenced this issue Feb 1, 2023
When the memory consumption of the semaphore reaches the configured
serialize threshold, all but the blessed permit is blocked from
consuming any more memory. This ensures that past this limit, only one
permit at a time can consume memory.
Such a blessed permit can be registered inactive. Before this patch, it
would still retain its belssed status when doing so. This could result
in this permit being req-eueud on the admission queue after a possible
eviction, potentially resulting in a complete deadlock of the semaphore:
* admission queue permits cannot be admitted because there is no memory
* admitter permits are all queued on memory, as none of them are blessed

This patch strips the blessed status from the permit when it is
registered as inactive. It also adds a unit test to verify this happens.

Fixes: scylladb#12603
denesb added a commit to denesb/scylla that referenced this issue Feb 1, 2023
When the memory consumption of the semaphore reaches the configured
serialize threshold, all but the blessed permit is blocked from
consuming any more memory. This ensures that past this limit, only one
permit at a time can consume memory.
Such a blessed permit can be registered inactive. Before this patch, it
would still retain its belssed status when doing so. This could result
in this permit being re-queued for admission if it was evicted in the
meanwhile, potentially resulting in a complete deadlock of the semaphore:
* admission queue permits cannot be admitted because there is no memory
* admitter permits are all queued on memory, as none of them are blessed

This patch strips the blessed status from the permit when it is
registered as inactive. It also adds a unit test to verify this happens.

Fixes: scylladb#12603
denesb added a commit to denesb/scylla that referenced this issue Feb 1, 2023
When the memory consumption of the semaphore reaches the configured
serialize threshold, all but the blessed permit is blocked from
consuming any more memory. This ensures that past this limit, only one
permit at a time can consume memory.
Such a blessed permit can be registered inactive. Before this patch, it
would still retain its blessed status when doing so. This could result
in this permit being re-queued for admission if it was evicted in the
meanwhile, potentially resulting in a complete deadlock of the semaphore:
* admission queue permits cannot be admitted because there is no memory
* admitter permits are all queued on memory, as none of them are blessed

This patch strips the blessed status from the permit when it is
registered as inactive. It also adds a unit test to verify this happens.

Fixes: scylladb#12603
denesb added a commit to denesb/scylla that referenced this issue Feb 2, 2023
When the memory consumption of the semaphore reaches the configured
serialize threshold, all but the blessed permit is blocked from
consuming any more memory. This ensures that past this limit, only one
permit at a time can consume memory.
Such a blessed permit can be registered inactive. Before this patch, it
would still retain its blessed status when doing so. This could result
in this permit being re-queued for admission if it was evicted in the
meanwhile, potentially resulting in a complete deadlock of the semaphore:
* admission queue permits cannot be admitted because there is no memory
* admitter permits are all queued on memory, as none of them are blessed

This patch strips the blessed status from the permit when it is
registered as inactive. It also adds a unit test to verify this happens.

Fixes: scylladb#12603
@avikivity
Copy link
Member

No vulnerable branches, not backporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants