Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cached_file, used by index caching, will potentially cause a crash after OOM #14814

Closed
raphaelsc opened this issue Jul 25, 2023 · 7 comments
Closed
Assignees
Labels
P2 High Priority type/bug
Milestone

Comments

@raphaelsc
Copy link
Member

cached_file can fetch multiple pages (each 4k in size) at once due to read ahead,

when that happens, only the first page is wrapped in ptr_type = std::unique_ptr<cached_page, cached_page_del>.

cached_page_del is an automatic deleter that will add page to LRU once unused (i.e. refcount drops to zero).

there's an optimistic assumption that pages not wrapped in ptr_type yet will be consumed eventually, and therefore they'll be added to LRU too once unused.

But if OOM happens before that, pages not consumed yet will remain unused and not linked to LRU

when the cached_file is destroyed after OOM, it will try to evict cache under the incorrect assumption that all unused entries are in LRU, but some of the unused entries are not linked to LRU,

so cached_file destructor will fail as follow:

cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.`

can be easily reproduced by injecting allocation failure after 2nd page was introduced into cache, but before it's consumed.

@bhalevy
Copy link
Member

bhalevy commented Jul 25, 2023

cc @tgrabiec

raphaelsc added a commit to raphaelsc/scylla that referenced this issue Jul 25, 2023
It was found that cached_file dtor can hit the following assert
after OOM

cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.`

cached_file's dtor iterates through all entries and evict those
that are linked to LRU, under the assumption that all unused
entries were linked to LRU.

That's partially correct. get_page_ptr() may fetch more than 1
page due to read ahead, but it will only call cached_page::share()
on the first page, the one that will be consumed now.

share() is responsible for automatically placing the page into
LRU once refcount drops to zero.

If the read is aborted midway, before cached_file has a chance
to hit the 2nd page (read ahead) in cache, it will remain there
with refcount 0 and unlinked to LRU, in hope that a subsequent
read will bring it out of that state.

Our main user of cached_file is per-sstable index caching.
If the scenario above happens, and the sstable and its associated
cached_file is destroyed, before the 2nd page is hit, cached_file
will not be able to clear all the cache because some of the
pages are unused and not linked.

A page is unused regardless of its presence on LRU, so we can
adjust evict_range() to take into account that entries may
be unused and didn't have the chance to have share() called
on its behalf.

A reproducer was added.

Fixes scylladb#14814.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
@DoronArazii DoronArazii added this to the 5.4 milestone Jul 25, 2023
raphaelsc added a commit to raphaelsc/scylla that referenced this issue Jul 26, 2023
It was found that cached_file dtor can hit the following assert
after OOM

cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.`

cached_file's dtor iterates through all entries and evict those
that are linked to LRU, under the assumption that all unused
entries were linked to LRU.

That's partially correct. get_page_ptr() may fetch more than 1
page due to read ahead, but it will only call cached_page::share()
on the first page, the one that will be consumed now.

share() is responsible for automatically placing the page into
LRU once refcount drops to zero.

If the read is aborted midway, before cached_file has a chance
to hit the 2nd page (read ahead) in cache, it will remain there
with refcount 0 and unlinked to LRU, in hope that a subsequent
read will bring it out of that state.

Our main user of cached_file is per-sstable index caching.
If the scenario above happens, and the sstable and its associated
cached_file is destroyed, before the 2nd page is hit, cached_file
will not be able to clear all the cache because some of the
pages are unused and not linked.

A page retrieved via read ahead will be "put on hold" which
means linking it into LRU without clearing the non-lsa buf.
That's a way of saying we will likely read the page next but
if it doesn't happen, e.g. due to an exception, the page won't
sit in memory indefinitely.

A reproducer was added.

Fixes scylladb#14814.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
raphaelsc added a commit to raphaelsc/scylla that referenced this issue Jul 26, 2023
It was found that cached_file dtor can hit the following assert
after OOM

cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.`

cached_file's dtor iterates through all entries and evict those
that are linked to LRU, under the assumption that all unused
entries were linked to LRU.

That's partially correct. get_page_ptr() may fetch more than 1
page due to read ahead, but it will only call cached_page::share()
on the first page, the one that will be consumed now.

share() is responsible for automatically placing the page into
LRU once refcount drops to zero.

If the read is aborted midway, before cached_file has a chance
to hit the 2nd page (read ahead) in cache, it will remain there
with refcount 0 and unlinked to LRU, in hope that a subsequent
read will bring it out of that state.

Our main user of cached_file is per-sstable index caching.
If the scenario above happens, and the sstable and its associated
cached_file is destroyed, before the 2nd page is hit, cached_file
will not be able to clear all the cache because some of the
pages are unused and not linked.

A page read ahead will be linked into LRU so it doesn't sit in
memory indefinitely. Also allowing for cached_file dtor to
clear all cache if some of those pages brought in advance
aren't fetched later.

A reproducer was added.

Fixes scylladb#14814.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
@DoronArazii DoronArazii added backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed Requires-Backport-to-5.1 labels Jul 27, 2023
@mykaul mykaul added the P2 High Priority label Jul 27, 2023
tgrabiec pushed a commit that referenced this issue Jul 28, 2023
It was found that cached_file dtor can hit the following assert
after OOM

cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.`

cached_file's dtor iterates through all entries and evict those
that are linked to LRU, under the assumption that all unused
entries were linked to LRU.

That's partially correct. get_page_ptr() may fetch more than 1
page due to read ahead, but it will only call cached_page::share()
on the first page, the one that will be consumed now.

share() is responsible for automatically placing the page into
LRU once refcount drops to zero.

If the read is aborted midway, before cached_file has a chance
to hit the 2nd page (read ahead) in cache, it will remain there
with refcount 0 and unlinked to LRU, in hope that a subsequent
read will bring it out of that state.

Our main user of cached_file is per-sstable index caching.
If the scenario above happens, and the sstable and its associated
cached_file is destroyed, before the 2nd page is hit, cached_file
will not be able to clear all the cache because some of the
pages are unused and not linked.

A page read ahead will be linked into LRU so it doesn't sit in
memory indefinitely. Also allowing for cached_file dtor to
clear all cache if some of those pages brought in advance
aren't fetched later.

A reproducer was added.

Fixes #14814.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14818

(cherry picked from commit 050ce9e)
tgrabiec pushed a commit that referenced this issue Jul 28, 2023
It was found that cached_file dtor can hit the following assert
after OOM

cached_file_test: utils/cached_file.hh:379: cached_file::~cached_file(): Assertion _cache.empty()' failed.`

cached_file's dtor iterates through all entries and evict those
that are linked to LRU, under the assumption that all unused
entries were linked to LRU.

That's partially correct. get_page_ptr() may fetch more than 1
page due to read ahead, but it will only call cached_page::share()
on the first page, the one that will be consumed now.

share() is responsible for automatically placing the page into
LRU once refcount drops to zero.

If the read is aborted midway, before cached_file has a chance
to hit the 2nd page (read ahead) in cache, it will remain there
with refcount 0 and unlinked to LRU, in hope that a subsequent
read will bring it out of that state.

Our main user of cached_file is per-sstable index caching.
If the scenario above happens, and the sstable and its associated
cached_file is destroyed, before the 2nd page is hit, cached_file
will not be able to clear all the cache because some of the
pages are unused and not linked.

A page read ahead will be linked into LRU so it doesn't sit in
memory indefinitely. Also allowing for cached_file dtor to
clear all cache if some of those pages brought in advance
aren't fetched later.

A reproducer was added.

Fixes #14814.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14818

(cherry picked from commit 050ce9e)
@DoronArazii
Copy link

Backported into 5.1 + 5.2

Removing labels.

@DoronArazii DoronArazii removed Backport candidate backport/5.2 Issues that should be backported to 5.2 branch once they'll be fixed Requires-Backport-to-5.1 labels Jul 30, 2023
@bhalevy
Copy link
Member

bhalevy commented Oct 19, 2023

@mykaul @avikivity I wonder if we should have labels marking issues backport state as final, like: "Backport Done", "No backport required", or "Backport denied"?

Today when an issue has no "Backport candidate" label I don't know if somehow it wasn't added by mistake or if it was removed when the backport completed or deemed unneeded or undesired for some reason.

@avikivity
Copy link
Member

Yes I discussed it with @benipeled once. I don't know what to think about it now.

@mykaul
Copy link
Contributor

mykaul commented Oct 19, 2023

@mykaul @avikivity I wonder if we should have labels marking issues backport state as final, like: "Backport Done", "No backport required", or "Backport denied"?

Today when an issue has no "Backport candidate" label I don't know if somehow it wasn't added by mistake or if it was removed when the backport completed or deemed unneeded or undesired for some reason.

This is exactly the state machine implementation I've been mentioning in the thread about automation and labels... I just think we need to properly design and document our phases first, rather than add labels.

@benipeled
Copy link
Contributor

@mykaul @avikivity I wonder if we should have labels marking issues backport state as final, like: "Backport Done", "No backport required", or "Backport denied"?

Today when an issue has no "Backport candidate" label I don't know if somehow it wasn't added by mistake or if it was removed when the backport completed or deemed unneeded or undesired for some reason.

The logic added ~year ago - https://github.com/scylladb/jenkins/pull/26#issuecomment-1269698701 - the backport-done and no-backport-require labels should be used in this case

@benipeled
Copy link
Contributor

@avikivity, Actually we must start using these labels in order to implement https://github.com/scylladb/scylla-pkg/issues/3591, otherwise the enterprise-backport-bot will mark issues already handled on OS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 High Priority type/bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants