Mitigate semaphore mismatch when possible #14770

Jadw1 · 2023-07-20T09:10:44Z

This issue is an OSS copy of https://github.com/scylladb/scylla-enterprise/issues/3182.

When semaphore mismatch is detected between two user's semaphores, the cached querier can be dropped instead of throwing an internal error.

Also the check is only present in multi-partition queires, it should be done in all queries.

DoronArazii · 2023-07-26T10:55:50Z

Adding 'backport candidate' (@Jadw1 next time please use "Fixes: #12345").
@scylladb/scylla-maint please consider a backport to all active 5.x + our supported enterprise versions

denesb · 2023-07-27T06:36:54Z

This is a potentially distrupting change. We should let it get into enterprise and get some testing with SL addition/removal under read load before backporting.

mykaul · 2023-08-08T14:38:48Z

This is a potentially distrupting change. We should let it get into enterprise and get some testing with SL addition/removal under read load before backporting.

How will we get it into Enterprise without backporting?

denesb · 2023-08-09T06:10:10Z

This is a potentially distrupting change. We should let it get into enterprise and get some testing with SL addition/removal under read load before backporting.

How will we get it into Enterprise without backporting?

Via the regular master -> enterprise merge.

…long to user' from Michał Jadwiszczak If semaphore mismatch occurs, check whether both semaphores belong to user. If so, log a warning, log a `querier_cache_scheduling_group_mismatches` stat and drop cached reader instead of throwing an error. Until now, semaphore mismatch was only checked in multi-partition queries. The PR pushes the check to `querier_cache` and perform it on all `lookup_*_querier` methods. The mismatch can happen if user's scheduling group changed during a query. We don't want to throw an error then, but drop and reset cached reader. This patch doesn't solve a problem with mismatched semaphores because of changes in service levels/scheduling groups but only mitigate it. Refers: scylladb/scylla-enterprise#3182 Refers: scylladb/scylla-enterprise#3050 Closes: #14770 Closes #14736 * github.com:scylladb/scylladb: querier_cache: add stats of scheduling group mismatches querier_cache: check semaphore mismatch during querier lookup querier_cache: add reference to `replica::database::is_user_semaphore()` replica:database: add method to determine if semaphore is user one (cherry picked from commit a8feb74)

denesb · 2023-08-09T07:36:24Z

Backported to 5.1 and 5.2. There were conflicts, but they were relatively easy to sort out.

Jadw1 self-assigned this Jul 20, 2023

Jadw1 mentioned this issue Jul 20, 2023

semaphore mismatch: don't throw an error if both semaphores belong to user #14736

Merged

scylladb-promoter closed this as completed in a8feb74 Jul 24, 2023

DoronArazii added Backport candidate backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed Requires-Backport-to-5.1 labels Jul 26, 2023

DoronArazii added this to the 5.4 milestone Jul 26, 2023

denesb removed Backport candidate backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed Requires-Backport-to-5.1 labels Aug 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mitigate semaphore mismatch when possible #14770

Mitigate semaphore mismatch when possible #14770

Jadw1 commented Jul 20, 2023

DoronArazii commented Jul 26, 2023

denesb commented Jul 27, 2023

mykaul commented Aug 8, 2023

denesb commented Aug 9, 2023

denesb commented Aug 9, 2023

Mitigate semaphore mismatch when possible #14770

Mitigate semaphore mismatch when possible #14770

Comments

Jadw1 commented Jul 20, 2023

DoronArazii commented Jul 26, 2023

denesb commented Jul 27, 2023

mykaul commented Aug 8, 2023

denesb commented Aug 9, 2023

denesb commented Aug 9, 2023