Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigate semaphore mismatch when possible #14770

Closed
Jadw1 opened this issue Jul 20, 2023 · 5 comments · Fixed by #14736
Closed

Mitigate semaphore mismatch when possible #14770

Jadw1 opened this issue Jul 20, 2023 · 5 comments · Fixed by #14736
Assignees
Milestone

Comments

@Jadw1
Copy link
Contributor

Jadw1 commented Jul 20, 2023

This issue is an OSS copy of https://github.com/scylladb/scylla-enterprise/issues/3182.

When semaphore mismatch is detected between two user's semaphores, the cached querier can be dropped instead of throwing an internal error.

Also the check is only present in multi-partition queires, it should be done in all queries.

@DoronArazii
Copy link

Adding 'backport candidate' (@Jadw1 next time please use "Fixes: #12345").
@scylladb/scylla-maint please consider a backport to all active 5.x + our supported enterprise versions

@DoronArazii DoronArazii added Backport candidate backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed Requires-Backport-to-5.1 labels Jul 26, 2023
@DoronArazii DoronArazii added this to the 5.4 milestone Jul 26, 2023
@denesb
Copy link
Contributor

denesb commented Jul 27, 2023

This is a potentially distrupting change. We should let it get into enterprise and get some testing with SL addition/removal under read load before backporting.

@mykaul
Copy link
Contributor

mykaul commented Aug 8, 2023

This is a potentially distrupting change. We should let it get into enterprise and get some testing with SL addition/removal under read load before backporting.

How will we get it into Enterprise without backporting?

@denesb
Copy link
Contributor

denesb commented Aug 9, 2023

This is a potentially distrupting change. We should let it get into enterprise and get some testing with SL addition/removal under read load before backporting.

How will we get it into Enterprise without backporting?

Via the regular master -> enterprise merge.

denesb added a commit that referenced this issue Aug 9, 2023
…long to user' from Michał Jadwiszczak

If semaphore mismatch occurs, check whether both semaphores belong
to user. If so, log a warning, log a `querier_cache_scheduling_group_mismatches` stat and drop cached reader instead of throwing an error.

Until now, semaphore mismatch was only checked in multi-partition queries.  The PR pushes the check to `querier_cache` and perform it on all `lookup_*_querier` methods.

The mismatch can happen if user's scheduling group changed during
a query. We don't want to throw an error then, but drop and reset
cached reader.

This patch doesn't solve a problem with mismatched semaphores because of changes in service levels/scheduling groups but only mitigate it.

Refers: scylladb/scylla-enterprise#3182
Refers: scylladb/scylla-enterprise#3050
Closes: #14770

Closes #14736

* github.com:scylladb/scylladb:
  querier_cache: add stats of scheduling group mismatches
  querier_cache: check semaphore mismatch during querier lookup
  querier_cache: add reference to `replica::database::is_user_semaphore()`
  replica:database: add method to determine if semaphore is user one

(cherry picked from commit a8feb74)
denesb added a commit that referenced this issue Aug 9, 2023
…long to user' from Michał Jadwiszczak

If semaphore mismatch occurs, check whether both semaphores belong
to user. If so, log a warning, log a `querier_cache_scheduling_group_mismatches` stat and drop cached reader instead of throwing an error.

Until now, semaphore mismatch was only checked in multi-partition queries.  The PR pushes the check to `querier_cache` and perform it on all `lookup_*_querier` methods.

The mismatch can happen if user's scheduling group changed during
a query. We don't want to throw an error then, but drop and reset
cached reader.

This patch doesn't solve a problem with mismatched semaphores because of changes in service levels/scheduling groups but only mitigate it.

Refers: scylladb/scylla-enterprise#3182
Refers: scylladb/scylla-enterprise#3050
Closes: #14770

Closes #14736

* github.com:scylladb/scylladb:
  querier_cache: add stats of scheduling group mismatches
  querier_cache: check semaphore mismatch during querier lookup
  querier_cache: add reference to `replica::database::is_user_semaphore()`
  replica:database: add method to determine if semaphore is user one

(cherry picked from commit a8feb74)
@denesb
Copy link
Contributor

denesb commented Aug 9, 2023

Backported to 5.1 and 5.2. There were conflicts, but they were relatively easy to sort out.

@denesb denesb removed Backport candidate backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed Requires-Backport-to-5.1 labels Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants