New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
system_auth RF change to 3 fails: got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem) #9751
Comments
@asias please have a look |
@denesb I think we probably have a reproducer for the reader dead lock. The error in the log suggested we spend 30 minutes to read a single fragment.
|
One more: Installation details Scylla running with shards number (live nodes): Test: Restore Monitor Stack command: Test id: Logs: |
Looks like there is a genuine problem here. @asias do you know when this started happening? Or is this that same phantom block that has been plaguing us for some time now? |
@vponomaryov can you tell us the history of this test? When did it start to fail first with which scylla commit. I do not see the longevity-large-partition-asymmetric-cluster-3h test in the past. Perhaps it is a recent sct test. |
Walked through different builds of that job and I can tell that it is racy bug. Another one where it was also present, but in a bit different look:
Couple more hits:
So, build from the September 22 2021 already had this problem and there are no older builds in Jenkins to be checked. |
Happened again. Test details Logs: |
The problem is that in the case of non-local reads each repair has to obtain 2 permits on the shard where the repair-meta lives:
If the semaphore is contested, the repair metas can block their own shard readers from being admitted. |
So what can we do ? |
I'll cook a patch which ensures only a single blocking admission happens on each shard. |
Thanks. BTW, if you send a PR instead of email patches, with one click people can find the relevant patches. Now, I have to search emails to find them. I do not know the subjects of the patches, so I have to search all the emails from you. |
I know, this is one of the good things PRs have going for them. They didn't work out for me for other reasons. I could have still posted the name of the patch, sorry for that. |
Reproduced with Installation details Scylla running with shards number (live nodes): Test: Restore Monitor Stack command: Test id: Logs: |
The fix is not merged yet. |
…s" from Botond " Repair obtains a permit for each repair-meta instance it creates. This permit is supposed to track all resources consumed by that repair as well as ensure concurrency limit is respected. However when the non-local reader path is used (shard config of master != shard config of follower), a second permit will be obtained -- for the shard reader of the multishard reader. This creates a situation where the repair-meta's permit can block the shard permit, creating a deadlock situation. This patch solves this by dropping the count resource on the repair-meta's permit when a non-local reader path is executed -- that is a multishard reader is created. Fixes: #9751 " * 'repair-double-permit-block/v4' of https://github.com/denesb/scylla: repair: make sure there is one permit per repair with count res reader_permit: add release_base_resource() (cherry picked from commit 52b7778)
Backported to 4.6. No other releases are affected. |
If the permit was admitted, _base_resources was already accounted in _resource and therefore has to be deducted from it, otherwise the permit will think it leaked some resources on destruction. Test: dtest(repair_additional_test.py.test_repair_one_missing_row_diff_shard_count) Refs: #9751 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20220119132550.532073-1-bdenes@scylladb.com>
If the permit was admitted, _base_resources was already accounted in _resource and therefore has to be deducted from it, otherwise the permit will think it leaked some resources on destruction. Test: dtest(repair_additional_test.py.test_repair_one_missing_row_diff_shard_count) Refs: #9751 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20220119132550.532073-1-bdenes@scylladb.com>
If the permit was admitted, _base_resources was already accounted in _resource and therefore has to be deducted from it, otherwise the permit will think it leaked some resources on destruction. Test: dtest(repair_additional_test.py.test_repair_one_missing_row_diff_shard_count) Refs: #9751 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20220119132550.532073-1-bdenes@scylladb.com> (cherry picked from commit a65b38a)
@denesb Could you please confirm 2021.1.10 is not affected? |
Yes, 2021.1 is not affected. |
All relevant backports completed. |
Installation details
Kernel version:
5.4.0-1035-aws
Scylla version (or git commit hash):
4.7.dev-0.20211126.44f4ea38c with build-id a88083cca1208179dd36a3847af403f77ad794f0
Cluster size: 5 nodes (i3.2xlarge)
OS (RHEL/CentOS/Ubuntu/AWS AMI):
ami-0238a987fdfc50310
(aws: eu-west-1)Scylla running with shards number (live nodes):
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-1 (3.248.218.248 | 10.0.2.196): 4 shards
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-2 (54.246.70.108 | 10.0.3.20): 6 shards
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-3 (34.244.105.91 | 10.0.3.96): 6 shards
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-4 (34.251.238.10 | 10.0.1.158): 6 shards
longevity-large-partitions-3h-maste-db-node-9c1ecaf6-5 (54.246.48.89 | 10.0.0.44): 7 shards
Test:
longevity-large-partition-asymmetric-cluster-3h
Test name:longevity_test.LongevityTest.test_custom_time
Test config file(s):Issue description
Running on the asymmetric cluster change of system_auth RF to 3 fails with the following error on the
Node 1
:got error in row level repair: std::runtime_error (Semaphore timed out: _streaming_concurrency_sem)
And on all other nodes with the following error:
Failed to read a fragment from the reader, keyspace=system_auth, table=roles, range=[{-8833127432019027370, end}, {-8822971771378640915, end}]: seastar::named_semaphore_timed_out (Semaphore timed out: _streaming_concurrency_sem)
More details:
Node 1:
Node 2:
Node 3:
Node 4
Node 5:
Restore Monitor Stack command:
$ hydra investigate show-monitor 9c1ecaf6-195b-4648-9fb5-5b28a471e6bb
Restore monitor on AWS instance using Jenkins job
Show all stored logs command:
$ hydra investigate show-logs 9c1ecaf6-195b-4648-9fb5-5b28a471e6bb
Test id:
9c1ecaf6-195b-4648-9fb5-5b28a471e6bb
Logs:
db-cluster - https://cloudius-jenkins-test.s3.amazonaws.com/9c1ecaf6-195b-4648-9fb5-5b28a471e6bb/20211207_042308/db-cluster-9c1ecaf6.tar.gz
loader-set - https://cloudius-jenkins-test.s3.amazonaws.com/9c1ecaf6-195b-4648-9fb5-5b28a471e6bb/20211207_042308/loader-set-9c1ecaf6.tar.gz
monitor-set - https://cloudius-jenkins-test.s3.amazonaws.com/9c1ecaf6-195b-4648-9fb5-5b28a471e6bb/20211207_042308/monitor-set-9c1ecaf6.tar.gz
sct-runner - https://cloudius-jenkins-test.s3.amazonaws.com/9c1ecaf6-195b-4648-9fb5-5b28a471e6bb/20211207_042308/sct-runner-9c1ecaf6.tar.gz
Jenkins job URL
The text was updated successfully, but these errors were encountered: