Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS IO can stall for a second or more under memory pressure #3616

Closed
siebenmann opened this issue Jul 20, 2015 · 4 comments
Closed

ZFS IO can stall for a second or more under memory pressure #3616

siebenmann opened this issue Jul 20, 2015 · 4 comments
Milestone

Comments

@siebenmann
Copy link
Contributor

In arc_get_data_buf, if arc_is_overflowing() is true we signal the reclaim thread and wait for arc_reclaim_waiters_cv. On my machine with git tip ZFS on Linux, I'm periodically seeing multiple processes stalling in this wait for a second or more; this has the effect of drastically slowing down their IO.

This CV is primarily signalled from two places. In arc_evict_state_impl(), it is signalled on !arc_is_overflowing(); in arc_adapt_thread() (which winds up calling arc_evict_state_impl()) it is signalled on 'arc_size <= arc_c || arc_evicted = 0'. However, these two conditions are not quite the same because arc_is_overflowing() adds an overflow block to arc_c. So if arc_evict_state_impl() evicts some space but not the full overflow amount, it will not signal the CV but we'll wind up with arc_evicted > 0, arc_adapt_thread() may not signal the CV either because arc_size is now a bit over arc_c, and arc_adapt_thread() will then wind up sleeping for a second before it tries again and maybe signals the CV this time around. In the mean time, arc_get_data_buf() callers are waiting.

This issue was introduced in ca0bf58, a port of the upstream 'Illumos 5497 - lock contention on arcs_mtx'. The issue is not present in the upstream Illumos code because there the entire cv_timedwait_sig() section is also in the 'arc_size <= arc_c || arc_evicted == 0' conditional, so arc_adapt_thread() always signals if it's going to wait (and otherwise it immediately tries to shrink again).

(I don't know if the best fix is to make the code work like the Illumos code or to change the condition in arc_adapt_thread() to also be '!arc_is_overflowing()' or something like that. The Illumos code clearly pushes ARC shrinking more, since it will keep trying to evict things until nothing is evicted.)

@dweeezil
Copy link
Contributor

I was trying a bit too hard to retain the pre-ca0bf58 code structure. I've worked up a patch which adopts the illumos structure in dweeezil/zfs@9970711.

@kernelOfTruth
Copy link
Contributor

@dweeezil Excellent ! Thanks a bunch,

let's see if this could help @sempervictus and @DeHackEd with their problems, too

@dweeezil
Copy link
Contributor

I'm going to run some metadata-intensive tests in tight memory environments to make sure the arc size doesn't blow up.

EDIT: @siebenmann, your analysis certainly seems reasonable. There was so much involved with that patch that I don't remember exactly what I was thinking when working on arc_adapt_thread() other than wanting to retain the overall structure from ZoL.

@behlendorf
Copy link
Contributor

@siebenmann nice analysis on this issue. I'd suggest we focus on finalizing #3533 which in my mind is the second half of the ca0bf58 changes. It brings ZoL ARC back in sync with the illumos ARC in all the core areas leaving just a few places where we needed to diverge. Specifically, it includes this fix which nicely explains why it resolved @sempervictus's issue.

@behlendorf behlendorf added this to the 0.6.5 milestone Jul 20, 2015
@siebenmann siebenmann mentioned this issue Jul 22, 2015
19 tasks
kernelOfTruth pushed a commit to kernelOfTruth/zfs that referenced this issue Jul 23, 2015
…no_grow

5376 arc_kmem_reap_now() should not result in clearing arc_no_grow
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5376
  illumos/illumos-gate@2ec99e3

Porting Notes:

The good news is that many of the recent changes made upstream to the
ARC tackled issues previously observed by ZoL with similar solutions.
The bad news is those solution weren't identical to the ones we applied.
This patch is designed to split the difference and apply as much of the
upstream work as possible.

* The arc_available_memory() function was removed previous in ZoL but
due to the upstream changes it makes sense to add it back.  This function
has been customized for Linux so that it can be used to determine a low
memory.  This provides the same basic functionality as the illumos version
allowing us to minimize changes through the rest of the code base.  The
exact mechanism used to detect a low memory state remains unchanged so
this change isn't a significant as it might first appear.

* This patch includes the long standing fix for arc_shrink() which was
originally proposed in openzfs#2167.  Since there were related changes to this
function it made sense to include that work.

* The arc_init() function has been re-factored.  As before it sets sane
default values for the ARC but then calls arc_tuning_update() to apply
user specific tuning made via module options.  The arc_tuning_update()
function is then called periodically by the arc_reclaim_thread() to
apply changes to the tunings made during normal operation.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#3616
Closes openzfs#2167
janlam7 pushed a commit to janlam7/zfs that referenced this issue Jul 25, 2015
…no_grow

5376 arc_kmem_reap_now() should not result in clearing arc_no_grow
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Approved by: Dan McDonald <danmcd@omniti.com>

References:
  https://www.illumos.org/issues/5376
  illumos/illumos-gate@2ec99e3

Porting Notes:

The good news is that many of the recent changes made upstream to the
ARC tackled issues previously observed by ZoL with similar solutions.
The bad news is those solution weren't identical to the ones we applied.
This patch is designed to split the difference and apply as much of the
upstream work as possible.

* The arc_available_memory() function was removed previous in ZoL but
due to the upstream changes it makes sense to add it back.  This function
has been customized for Linux so that it can be used to determine a low
memory.  This provides the same basic functionality as the illumos version
allowing us to minimize changes through the rest of the code base.  The
exact mechanism used to detect a low memory state remains unchanged so
this change isn't a significant as it might first appear.

* This patch includes the long standing fix for arc_shrink() which was
originally proposed in openzfs#2167.  Since there were related changes to this
function it made sense to include that work.

* The arc_init() function has been re-factored.  As before it sets sane
default values for the ARC but then calls arc_tuning_update() to apply
user specific tuning made via module options.  The arc_tuning_update()
function is then called periodically by the arc_reclaim_thread() to
apply changes to the tunings made during normal operation.

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#3616
Closes openzfs#2167
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants