New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cached_file, used by index caching, will potentially cause a crash after OOM #14814
Comments
cc @tgrabiec |
It was found that cached_file dtor can hit the following assert after OOM cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.` cached_file's dtor iterates through all entries and evict those that are linked to LRU, under the assumption that all unused entries were linked to LRU. That's partially correct. get_page_ptr() may fetch more than 1 page due to read ahead, but it will only call cached_page::share() on the first page, the one that will be consumed now. share() is responsible for automatically placing the page into LRU once refcount drops to zero. If the read is aborted midway, before cached_file has a chance to hit the 2nd page (read ahead) in cache, it will remain there with refcount 0 and unlinked to LRU, in hope that a subsequent read will bring it out of that state. Our main user of cached_file is per-sstable index caching. If the scenario above happens, and the sstable and its associated cached_file is destroyed, before the 2nd page is hit, cached_file will not be able to clear all the cache because some of the pages are unused and not linked. A page is unused regardless of its presence on LRU, so we can adjust evict_range() to take into account that entries may be unused and didn't have the chance to have share() called on its behalf. A reproducer was added. Fixes scylladb#14814. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
It was found that cached_file dtor can hit the following assert after OOM cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.` cached_file's dtor iterates through all entries and evict those that are linked to LRU, under the assumption that all unused entries were linked to LRU. That's partially correct. get_page_ptr() may fetch more than 1 page due to read ahead, but it will only call cached_page::share() on the first page, the one that will be consumed now. share() is responsible for automatically placing the page into LRU once refcount drops to zero. If the read is aborted midway, before cached_file has a chance to hit the 2nd page (read ahead) in cache, it will remain there with refcount 0 and unlinked to LRU, in hope that a subsequent read will bring it out of that state. Our main user of cached_file is per-sstable index caching. If the scenario above happens, and the sstable and its associated cached_file is destroyed, before the 2nd page is hit, cached_file will not be able to clear all the cache because some of the pages are unused and not linked. A page retrieved via read ahead will be "put on hold" which means linking it into LRU without clearing the non-lsa buf. That's a way of saying we will likely read the page next but if it doesn't happen, e.g. due to an exception, the page won't sit in memory indefinitely. A reproducer was added. Fixes scylladb#14814. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
It was found that cached_file dtor can hit the following assert after OOM cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.` cached_file's dtor iterates through all entries and evict those that are linked to LRU, under the assumption that all unused entries were linked to LRU. That's partially correct. get_page_ptr() may fetch more than 1 page due to read ahead, but it will only call cached_page::share() on the first page, the one that will be consumed now. share() is responsible for automatically placing the page into LRU once refcount drops to zero. If the read is aborted midway, before cached_file has a chance to hit the 2nd page (read ahead) in cache, it will remain there with refcount 0 and unlinked to LRU, in hope that a subsequent read will bring it out of that state. Our main user of cached_file is per-sstable index caching. If the scenario above happens, and the sstable and its associated cached_file is destroyed, before the 2nd page is hit, cached_file will not be able to clear all the cache because some of the pages are unused and not linked. A page read ahead will be linked into LRU so it doesn't sit in memory indefinitely. Also allowing for cached_file dtor to clear all cache if some of those pages brought in advance aren't fetched later. A reproducer was added. Fixes scylladb#14814. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
It was found that cached_file dtor can hit the following assert after OOM cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.` cached_file's dtor iterates through all entries and evict those that are linked to LRU, under the assumption that all unused entries were linked to LRU. That's partially correct. get_page_ptr() may fetch more than 1 page due to read ahead, but it will only call cached_page::share() on the first page, the one that will be consumed now. share() is responsible for automatically placing the page into LRU once refcount drops to zero. If the read is aborted midway, before cached_file has a chance to hit the 2nd page (read ahead) in cache, it will remain there with refcount 0 and unlinked to LRU, in hope that a subsequent read will bring it out of that state. Our main user of cached_file is per-sstable index caching. If the scenario above happens, and the sstable and its associated cached_file is destroyed, before the 2nd page is hit, cached_file will not be able to clear all the cache because some of the pages are unused and not linked. A page read ahead will be linked into LRU so it doesn't sit in memory indefinitely. Also allowing for cached_file dtor to clear all cache if some of those pages brought in advance aren't fetched later. A reproducer was added. Fixes #14814. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14818 (cherry picked from commit 050ce9e)
It was found that cached_file dtor can hit the following assert after OOM cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.` cached_file's dtor iterates through all entries and evict those that are linked to LRU, under the assumption that all unused entries were linked to LRU. That's partially correct. get_page_ptr() may fetch more than 1 page due to read ahead, but it will only call cached_page::share() on the first page, the one that will be consumed now. share() is responsible for automatically placing the page into LRU once refcount drops to zero. If the read is aborted midway, before cached_file has a chance to hit the 2nd page (read ahead) in cache, it will remain there with refcount 0 and unlinked to LRU, in hope that a subsequent read will bring it out of that state. Our main user of cached_file is per-sstable index caching. If the scenario above happens, and the sstable and its associated cached_file is destroyed, before the 2nd page is hit, cached_file will not be able to clear all the cache because some of the pages are unused and not linked. A page read ahead will be linked into LRU so it doesn't sit in memory indefinitely. Also allowing for cached_file dtor to clear all cache if some of those pages brought in advance aren't fetched later. A reproducer was added. Fixes #14814. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14818 (cherry picked from commit 050ce9e)
Backported into 5.1 + 5.2 Removing labels. |
@mykaul @avikivity I wonder if we should have labels marking issues backport state as final, like: "Backport Done", "No backport required", or "Backport denied"? Today when an issue has no "Backport candidate" label I don't know if somehow it wasn't added by mistake or if it was removed when the backport completed or deemed unneeded or undesired for some reason. |
Yes I discussed it with @benipeled once. I don't know what to think about it now. |
This is exactly the state machine implementation I've been mentioning in the thread about automation and labels... I just think we need to properly design and document our phases first, rather than add labels. |
The logic added ~year ago - https://github.com/scylladb/jenkins/pull/26#issuecomment-1269698701 - the |
@avikivity, Actually we must start using these labels in order to implement https://github.com/scylladb/scylla-pkg/issues/3591, otherwise the enterprise-backport-bot will mark issues already handled on OS |
cached_file can fetch multiple pages (each 4k in size) at once due to read ahead,
when that happens, only the first page is wrapped in
ptr_type = std::unique_ptr<cached_page, cached_page_del>
.cached_page_del is an automatic deleter that will add page to LRU once unused (i.e. refcount drops to zero).
there's an optimistic assumption that pages not wrapped in ptr_type yet will be consumed eventually, and therefore they'll be added to LRU too once unused.
But if OOM happens before that, pages not consumed yet will remain unused and not linked to LRU
when the cached_file is destroyed after OOM, it will try to evict cache under the incorrect assumption that all unused entries are in LRU, but some of the unused entries are not linked to LRU,
so cached_file destructor will fail as follow:
cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion
_cache.empty()' failed.`can be easily reproduced by injecting allocation failure after 2nd page was introduced into cache, but before it's consumed.
The text was updated successfully, but these errors were encountered: