New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repair should handle abort_requested_exception mode gracefully #15710
Comments
@denesb please assign |
Also, I'm not sure if it's the same issue, but the symptoms are related, so there's a good chance they could be fixed together:
Note |
|
If abort is requsted during bootstrap then a node should exit normally. To achieve so, abort_requested_exception should be thrown as main handles it gracefully. In data_sync_repair_task_impl::run exceptions from all shards are wrapped together into std::runtime_exception and so they aren't handled as they are supposed to. Throw abort_requested_exception when shutdown was requested. Throw abort_requested_exception also if repair::task_manager_module::is_aborted, so that force_terminate_all_repair_sessions acts the same regardless the state of the repair. To maintain consistency do the same for user_requested_repair_task_impl. Fixes: scylladb#15710.
If abort is requsted during bootstrap then a node should exit normally. To achieve so, abort_requested_exception should be thrown as main handles it gracefully. In data_sync_repair_task_impl::run exceptions from all shards are wrapped together into std::runtime_exception and so they aren't handled as they are supposed to. Throw abort_requested_exception when shutdown was requested. Throw abort_requested_exception also if repair::task_manager_module::is_aborted, so that force_terminate_all_repair_sessions acts the same regardless the state of the repair. To maintain consistency do the same for user_requested_repair_task_impl. Fixes: scylladb#15710.
Looks like a cosmetic issue, not backporting. |
https://argus.scylladb.com/workspace?state=WyJkN2RkNDJmZS00ZTM3LTRiNWQtYmEwNS02ZDg2N2UzY2IyNzMiXQ
|
@avikivity, we have tests failing on this in 5.4 SCT |
Adding back the labels, so we don't lose track of this request. |
@scylladb/scylla-maint - please backport to 5.4. |
Already in 5.4. |
As seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-debug/258/testReport/bootstrap_test/TestBootstrap/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split004___test_cluster_become_unavailable_when_gracefully_kill_node_during_bootstrap/
The
abort_requested_exception
, infiltratesscylladb/main.cc
Lines 1904 to 1909 in 055f061
which is supposed to handle it gracefully, since it's wrapped in a
std::runtime_error
, apparently coming from repair (RBNO bootstrap)See https://jenkins.scylladb.com/job/scylla-master/job/dtest-debug/258/artifact/logs-full.debug.004/1697351776674_bootstrap_test.py%3A%3ATestBootstrap%3A%3Atest_cluster_become_unavailable_when_gracefully_kill_node_during_bootstrap/node3.log
The text was updated successfully, but these errors were encountered: