New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mitigate semaphore mismatch when possible #14770
Comments
This is a potentially distrupting change. We should let it get into enterprise and get some testing with SL addition/removal under read load before backporting. |
How will we get it into Enterprise without backporting? |
Via the regular master -> enterprise merge. |
…long to user' from Michał Jadwiszczak If semaphore mismatch occurs, check whether both semaphores belong to user. If so, log a warning, log a `querier_cache_scheduling_group_mismatches` stat and drop cached reader instead of throwing an error. Until now, semaphore mismatch was only checked in multi-partition queries. The PR pushes the check to `querier_cache` and perform it on all `lookup_*_querier` methods. The mismatch can happen if user's scheduling group changed during a query. We don't want to throw an error then, but drop and reset cached reader. This patch doesn't solve a problem with mismatched semaphores because of changes in service levels/scheduling groups but only mitigate it. Refers: scylladb/scylla-enterprise#3182 Refers: scylladb/scylla-enterprise#3050 Closes: #14770 Closes #14736 * github.com:scylladb/scylladb: querier_cache: add stats of scheduling group mismatches querier_cache: check semaphore mismatch during querier lookup querier_cache: add reference to `replica::database::is_user_semaphore()` replica:database: add method to determine if semaphore is user one (cherry picked from commit a8feb74)
…long to user' from Michał Jadwiszczak If semaphore mismatch occurs, check whether both semaphores belong to user. If so, log a warning, log a `querier_cache_scheduling_group_mismatches` stat and drop cached reader instead of throwing an error. Until now, semaphore mismatch was only checked in multi-partition queries. The PR pushes the check to `querier_cache` and perform it on all `lookup_*_querier` methods. The mismatch can happen if user's scheduling group changed during a query. We don't want to throw an error then, but drop and reset cached reader. This patch doesn't solve a problem with mismatched semaphores because of changes in service levels/scheduling groups but only mitigate it. Refers: scylladb/scylla-enterprise#3182 Refers: scylladb/scylla-enterprise#3050 Closes: #14770 Closes #14736 * github.com:scylladb/scylladb: querier_cache: add stats of scheduling group mismatches querier_cache: check semaphore mismatch during querier lookup querier_cache: add reference to `replica::database::is_user_semaphore()` replica:database: add method to determine if semaphore is user one (cherry picked from commit a8feb74)
Backported to 5.1 and 5.2. There were conflicts, but they were relatively easy to sort out. |
This issue is an OSS copy of https://github.com/scylladb/scylla-enterprise/issues/3182.
When semaphore mismatch is detected between two user's semaphores, the cached querier can be dropped instead of throwing an internal error.
Also the check is only present in multi-partition queires, it should be done in all queries.
The text was updated successfully, but these errors were encountered: