Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page faults: fix handling of OOM conditions #1934

Merged
merged 2 commits into from
Sep 7, 2023

Conversation

francescolavra
Copy link
Member

This change set fixes two issues that have been introduced by #1904:

  1. A call to storage_sync() may suspend the current context, e.g. to acquire the filesystem mutex. Therefore, this function cannot be called directly from the code that handles a page fault exception; instead, an asynchronous thunk should be used. This change modifies the mmap_anon_page() closure function so that it calls storage_sync() first, and the demand_anonymous_page() function so that it invokes the above closure via async_apply_bh(). Closes issue(fs): assertion !frame_is_full(ctx->frame) failed at /nanos/src/kernel/mutex.c:109 (IP 0xffffffff800460b4)  #1933.
  2. Commit 3e73a8f implemented an OOM killer that sends a SIGKILL to the user program if enough OOM page faults have occurred in a short amount of time. However, the OOM killer is only triggered if an OOM condition is detected synchronously during a call to do_demand_page(). Since commit 2b3e2d4, a failure to allocate an anonymous page no longer causes do_demand_page() to return an error status: instead, an OOM condition is only handled asynchronously, after flushing the page cache to disk. In addition, failure to allocate pages for file-backed mappings is also handled asynchronously, and as such is not detected by the OOM killer. This change moves the OOM killer logic to a new demand_page_done() function, which is called by both the Unix fault handler (to detect any OOM condition that may be reported by the do_demand_page() function) and the pending fault completion (to properly handle OOM conditions detected asynchronously). The pagecache_map_page() function has been modified to return an error status that is recognized by the OOM killer as an OOM condition. This fixes CI test failures that have been occurring when running the ruby_alloc e2e test.

Copy link
Collaborator

@sanderssj sanderssj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

A call to storage_sync() may suspend the current context, e.g. to
acquire the filesystem mutex. Therefore, this function cannot be
called directly from the code that handles a page fault exception;
instead, an asynchronous thunk should be used.
This change modifies the mmap_anon_page() closure function so that
it calls storage_sync() first, and the demand_anonymous_page()
function so that it invokes the above closure via async_apply_bh().

Closes #1933.
Commit 3e73a8f implemented an OOM
killer that sends a SIGKILL to the user program if enough OOM page
faults have occurred in a short amount of time. However, the OOM
killer is only triggered if an OOM condition is detected
synchronously during a call to do_demand_page(). Since commit
2b3e2d4, a failure to allocate an
anonymous page no longer causes do_demand_page() to return an error
status: instead, an OOM condition is only handled asynchronously,
after flushing the page cache to disk. In addition, failure to
allocate pages for file-backed mappings is also handled
asynchronously, and as such is not detected by the OOM killer.
This change moves the OOM killer logic to a new demand_page_done()
function, which is called by both the Unix fault handler (to detect
any OOM condition that may be reported by the do_demand_page()
function) and the pending fault completion (to properly handle OOM
conditions detected asynchronously). The pagecache_map_page()
function has been modified to return an error status that is
recognized by the OOM killer as an OOM condition.
This fixes CI test failures that have been occurring when running
the ruby_alloc e2e test.
@francescolavra francescolavra merged commit cbb30d3 into master Sep 7, 2023
5 checks passed
@francescolavra francescolavra deleted the fix/mmap_anon_page branch September 7, 2023 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants