New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
permit ks.mv_cf_view_13:shard-reader detected a leak of {count=0, memory=32768} resources #13539
Comments
All resource consumption happens via RAII objects, that also have a There is only one object that doesn't completely conform to RAII: |
This API is dangerous, all resource consumption should happen via RAII objects that guarantee that all consumed resources are appropriately released. At this poit, said API is just a low-level building block for higher-level, RAII objects. To ensure nobody thinks of using it for other purposes, make it private and make external users friends instead. Refs: scylladb#13539
I see a lot of |
After the kill limit is triggered, all |
Picked up by dev, removing "Master/Triage". |
I found this in at least one other run: https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/231/ So it is not a one-off fluke. Interestingly, the other failure also leaks the exact same amount. Not sure if this has any significance. I still haven't managed to zero in on the bug. I haven't manage to find any evidence that this is related to throwing Looking at the core I found inbound references to the reader permit which is getting destroyed (and has 0 references according to shared pointer accounting). I have not managed to identify any of these references I found, it could be coincidence (random blob looking like a pointer). |
Looks like the throwing |
I think I found the problem. As I learned not too long ago, a permit which is in the |
This API is dangerous, all resource consumption should happen via RAII objects that guarantee that all consumed resources are appropriately released. At this poit, said API is just a low-level building block for higher-level, RAII objects. To ensure nobody thinks of using it for other purposes, make it private and make external users friends instead. Refs: scylladb#13539
Successfully reproduced with a unit test. |
…mory requests When requesting memory via `reader_permit::request_memory()`, the requested amount is added to `_requested_memory` member of the permit impl. This is because multiple concurrent requests may be blocked and waiting at the same time. When the requests are fulfilled, the entire amount is consumed and individual requests track their requested amount with `resource_units` to release later. There is a corner-case related to this: if a reader permit is registered as inactive while it is waiting for memory, its active requests are killed with `std::bad_alloc`, but the `_requested_memory` fields is not cleared. If the read survives because the killed requests were part of a non-vital background read-ahead, a later memory request will also include amount from the failed requests. This extra amount wil not be released and hence will cause a resource leak when the permit is destroyed. Fix by detecting this corner case and clearing the `_requested_memory` field. Modify the existing unit test for the scenario of a permit waiting on memory being registered as inactive, to also cover this corner case, reproducing the bug. Fixes: scylladb#13539
Fix here: #13679 |
No branches affected, no backports needed. |
Seen in https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/232/artifact/logs-heavy.release.001/1681710153096_materialized_views_test.py%3A%3ATestMaterializedViews%3A%3Atest_mv_alter_with_synchronous_updates/node2.log
Decoded:
Coredump is here: https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/232/artifact/logs-heavy.release.001/1681710153096_materialized_views_test.py%3A%3ATestMaterializedViews%3A%3Atest_mv_alter_with_synchronous_updates/node2-reactor-1.1233.1681709804.core.gz
Reloc package: http://downloads.scylladb.com/unstable/scylla/master/relocatable/2023-04-17T03:05:10Z/scylla-unstripped-5.3.0~dev-0.20230417.c501163f955e.x86_64.tar.gz
The text was updated successfully, but these errors were encountered: