New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test_double_compaction_by_cleanup_and_major_compactions: scylla ran OOM #14966
Comments
@Deexie please look into this, it seems related to your latest changes around compaction tasks. Decoded backtrace:
|
Before compaction task executors started inheriting from compaction_task_impl, they were destructed immediately after compaction finished. Destructors of executors and their fields performed actions that affected global structures and statistics and had impact on compaction process. Currently, task executors are kept in memory much longer, as their are tracked by task manager. Thus, destructors are not called just after the compaction, which results in compaction stats not being updated, which causes e.g. infinite cleanup loop. Add stop_compaction_executor() method which is called at the end of compaction process and does what destructors used to. Fixes: scylladb#14966.
Before compaction task executors started inheriting from compaction_task_impl, they were destructed immediately after compaction finished. Destructors of executors and their fields performed actions that affected global structures and statistics and had impact on compaction process. Currently, task executors are kept in memory much longer, as their are tracked by task manager. Thus, destructors are not called just after the compaction, which results in compaction stats not being updated, which causes e.g. infinite cleanup loop. Add release_resources() method which is called at the end of compaction process and does what destructors used to. Fixes: scylladb#14966.
Before compaction task executors started inheriting from compaction_task_impl, they were destructed immediately after compaction finished. Destructors of executors and their fields performed actions that affected global structures and statistics and had impact on compaction process. Currently, task executors are kept in memory much longer, as their are tracked by task manager. Thus, destructors are not called just after the compaction, which results in compaction stats not being updated, which causes e.g. infinite cleanup loop. Add release_resources() method which is called at the end of compaction process and does what destructors used to. Fixes: scylladb#14966.
Before compaction task executors started inheriting from compaction_task_impl, they were destructed immediately after compaction finished. Destructors of executors and their fields performed actions that affected global structures and statistics and had impact on compaction process. Currently, task executors are kept in memory much longer, as their are tracked by task manager. Thus, destructors are not called just after the compaction, which results in compaction stats not being updated, which causes e.g. infinite cleanup loop. Add release_resources() method which is called at the end of compaction process and does what destructors used to. Fixes: scylladb#14966. Fixes: scylladb#15030.
This seems like an important fix but it doesn't apply cleanly to 5.2. @Deexie please provide a bacport PR if you think this should be backported. |
The change that introduced the bug isn't in 5.2 |
Ok, so there are no affected branches, removing backport candidate label. |
Seen first in https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/324/testReport/compaction_additional_test/TestCompactionAdditional/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split001___test_double_compaction_by_cleanup_and_major_compactions/
The node log seems to show infinite recursion:
https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-release/324/artifact/logs-full.release.001/1691200540172_compaction_additional_test.py%3A%3ATestCompactionAdditional%3A%3Atest_double_compaction_by_cleanup_and_major_compactions/node1.log
The text was updated successfully, but these errors were encountered: