cached_file, used by index caching, will potentially cause a crash after OOM #14814

raphaelsc · 2023-07-25T02:36:12Z

cached_file can fetch multiple pages (each 4k in size) at once due to read ahead,

when that happens, only the first page is wrapped in ptr_type = std::unique_ptr<cached_page, cached_page_del>.

cached_page_del is an automatic deleter that will add page to LRU once unused (i.e. refcount drops to zero).

there's an optimistic assumption that pages not wrapped in ptr_type yet will be consumed eventually, and therefore they'll be added to LRU too once unused.

But if OOM happens before that, pages not consumed yet will remain unused and not linked to LRU

when the cached_file is destroyed after OOM, it will try to evict cache under the incorrect assumption that all unused entries are in LRU, but some of the unused entries are not linked to LRU,

so cached_file destructor will fail as follow:

cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.`

can be easily reproduced by injecting allocation failure after 2nd page was introduced into cache, but before it's consumed.

The text was updated successfully, but these errors were encountered:

bhalevy · 2023-07-25T04:21:49Z

cc @tgrabiec

It was found that cached_file dtor can hit the following assert after OOM cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.` cached_file's dtor iterates through all entries and evict those that are linked to LRU, under the assumption that all unused entries were linked to LRU. That's partially correct. get_page_ptr() may fetch more than 1 page due to read ahead, but it will only call cached_page::share() on the first page, the one that will be consumed now. share() is responsible for automatically placing the page into LRU once refcount drops to zero. If the read is aborted midway, before cached_file has a chance to hit the 2nd page (read ahead) in cache, it will remain there with refcount 0 and unlinked to LRU, in hope that a subsequent read will bring it out of that state. Our main user of cached_file is per-sstable index caching. If the scenario above happens, and the sstable and its associated cached_file is destroyed, before the 2nd page is hit, cached_file will not be able to clear all the cache because some of the pages are unused and not linked. A page is unused regardless of its presence on LRU, so we can adjust evict_range() to take into account that entries may be unused and didn't have the chance to have share() called on its behalf. A reproducer was added. Fixes scylladb#14814. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

It was found that cached_file dtor can hit the following assert after OOM cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.` cached_file's dtor iterates through all entries and evict those that are linked to LRU, under the assumption that all unused entries were linked to LRU. That's partially correct. get_page_ptr() may fetch more than 1 page due to read ahead, but it will only call cached_page::share() on the first page, the one that will be consumed now. share() is responsible for automatically placing the page into LRU once refcount drops to zero. If the read is aborted midway, before cached_file has a chance to hit the 2nd page (read ahead) in cache, it will remain there with refcount 0 and unlinked to LRU, in hope that a subsequent read will bring it out of that state. Our main user of cached_file is per-sstable index caching. If the scenario above happens, and the sstable and its associated cached_file is destroyed, before the 2nd page is hit, cached_file will not be able to clear all the cache because some of the pages are unused and not linked. A page retrieved via read ahead will be "put on hold" which means linking it into LRU without clearing the non-lsa buf. That's a way of saying we will likely read the page next but if it doesn't happen, e.g. due to an exception, the page won't sit in memory indefinitely. A reproducer was added. Fixes scylladb#14814. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

It was found that cached_file dtor can hit the following assert after OOM cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.` cached_file's dtor iterates through all entries and evict those that are linked to LRU, under the assumption that all unused entries were linked to LRU. That's partially correct. get_page_ptr() may fetch more than 1 page due to read ahead, but it will only call cached_page::share() on the first page, the one that will be consumed now. share() is responsible for automatically placing the page into LRU once refcount drops to zero. If the read is aborted midway, before cached_file has a chance to hit the 2nd page (read ahead) in cache, it will remain there with refcount 0 and unlinked to LRU, in hope that a subsequent read will bring it out of that state. Our main user of cached_file is per-sstable index caching. If the scenario above happens, and the sstable and its associated cached_file is destroyed, before the 2nd page is hit, cached_file will not be able to clear all the cache because some of the pages are unused and not linked. A page read ahead will be linked into LRU so it doesn't sit in memory indefinitely. Also allowing for cached_file dtor to clear all cache if some of those pages brought in advance aren't fetched later. A reproducer was added. Fixes scylladb#14814. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

It was found that cached_file dtor can hit the following assert after OOM cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.` cached_file's dtor iterates through all entries and evict those that are linked to LRU, under the assumption that all unused entries were linked to LRU. That's partially correct. get_page_ptr() may fetch more than 1 page due to read ahead, but it will only call cached_page::share() on the first page, the one that will be consumed now. share() is responsible for automatically placing the page into LRU once refcount drops to zero. If the read is aborted midway, before cached_file has a chance to hit the 2nd page (read ahead) in cache, it will remain there with refcount 0 and unlinked to LRU, in hope that a subsequent read will bring it out of that state. Our main user of cached_file is per-sstable index caching. If the scenario above happens, and the sstable and its associated cached_file is destroyed, before the 2nd page is hit, cached_file will not be able to clear all the cache because some of the pages are unused and not linked. A page read ahead will be linked into LRU so it doesn't sit in memory indefinitely. Also allowing for cached_file dtor to clear all cache if some of those pages brought in advance aren't fetched later. A reproducer was added. Fixes #14814. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14818 (cherry picked from commit 050ce9e)

DoronArazii · 2023-07-30T17:44:08Z

Backported into 5.1 + 5.2

Removing labels.

bhalevy · 2023-10-19T10:05:32Z

@mykaul @avikivity I wonder if we should have labels marking issues backport state as final, like: "Backport Done", "No backport required", or "Backport denied"?

Today when an issue has no "Backport candidate" label I don't know if somehow it wasn't added by mistake or if it was removed when the backport completed or deemed unneeded or undesired for some reason.

avikivity · 2023-10-19T10:09:15Z

Yes I discussed it with @benipeled once. I don't know what to think about it now.

mykaul · 2023-10-19T10:50:41Z

@mykaul @avikivity I wonder if we should have labels marking issues backport state as final, like: "Backport Done", "No backport required", or "Backport denied"?

Today when an issue has no "Backport candidate" label I don't know if somehow it wasn't added by mistake or if it was removed when the backport completed or deemed unneeded or undesired for some reason.

This is exactly the state machine implementation I've been mentioning in the thread about automation and labels... I just think we need to properly design and document our phases first, rather than add labels.

benipeled · 2023-10-19T12:20:41Z

@mykaul @avikivity I wonder if we should have labels marking issues backport state as final, like: "Backport Done", "No backport required", or "Backport denied"?

Today when an issue has no "Backport candidate" label I don't know if somehow it wasn't added by mistake or if it was removed when the backport completed or deemed unneeded or undesired for some reason.

The logic added ~year ago - https://github.com/scylladb/jenkins/pull/26#issuecomment-1269698701 - the backport-done and no-backport-require labels should be used in this case

benipeled · 2023-10-19T13:24:41Z

@avikivity, Actually we must start using these labels in order to implement https://github.com/scylladb/scylla-pkg/issues/3591, otherwise the enterprise-backport-bot will mark issues already handled on OS

raphaelsc added the type/bug label Jul 25, 2023

raphaelsc mentioned this issue Jul 25, 2023

cached_file: Evict unused pages that aren't linked to LRU yet #14818

Closed

DoronArazii added this to the 5.4 milestone Jul 25, 2023

scylladb-promoter closed this as completed in 050ce9e Jul 27, 2023

scylladb-promoter added the Backport candidate label Jul 27, 2023

DoronArazii added backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed Requires-Backport-to-5.1 labels Jul 27, 2023

DoronArazii assigned raphaelsc Jul 27, 2023

mykaul added the P2 High Priority label Jul 27, 2023

DoronArazii removed Backport candidate backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed Requires-Backport-to-5.1 labels Jul 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cached_file, used by index caching, will potentially cause a crash after OOM #14814

cached_file, used by index caching, will potentially cause a crash after OOM #14814

raphaelsc commented Jul 25, 2023

bhalevy commented Jul 25, 2023

DoronArazii commented Jul 30, 2023

bhalevy commented Oct 19, 2023

avikivity commented Oct 19, 2023

mykaul commented Oct 19, 2023

benipeled commented Oct 19, 2023

benipeled commented Oct 19, 2023

cached_file, used by index caching, will potentially cause a crash after OOM #14814

cached_file, used by index caching, will potentially cause a crash after OOM #14814

Comments

raphaelsc commented Jul 25, 2023

bhalevy commented Jul 25, 2023

DoronArazii commented Jul 30, 2023

bhalevy commented Oct 19, 2023

avikivity commented Oct 19, 2023

mykaul commented Oct 19, 2023

benipeled commented Oct 19, 2023

benipeled commented Oct 19, 2023