Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale the dbuf cache with arc_c instead of arc_c_max #6561

Merged
merged 1 commit into from Sep 29, 2017

Conversation

chrisrd
Copy link
Contributor

@chrisrd chrisrd commented Aug 25, 2017

Description

Scale the size of the dbuf cache according to the ARC target size rather than use a static size based on the maximum ARC size.

Motivation and Context

Commit d3c2ae1 introduced a dbuf cache with a default size of the minimum of 100M or 1/32 maximum ARC size. (These figures may be adjusted using dbuf_cache_max_bytes and dbuf_cache_max_shift.)

On a 1GB box, the ARC defaults to maximum 370GB and minimum 16G, and the dbuf cache size comes out at 12G. I.e. the dbuf cache is an significant proportion of the minimum ARC size. With other overheads involved this actually means the ARC can't get down to the minimum.

This patch dynamically scales the dbuf cache to the target ARC size instead of statically scaling it to the maximum ARC size. (The scale is still set by dbuf_cache_max_shift and the maximum size is still fixed by dbuf_cache_max_bytes.) Using the target ARC size rather than the current ARC size is done to help the ARC reach the target rather than simply focusing on the current size.

Relates to #6506

How Has This Been Tested?

Tests per #6506:

  • fresh module load
  • 2 x directories with 32769 empty files
  • find dirs -mtime +100
  • echo 3 > /proc/sys/vm/drop_caches

Various arcstats:

a: stock 9b84076
b: patched 9b84076

                       fresh    post-find    post-drop
a: c             517,185,536  517,185,536  113,351,288        
b: c             517,185,536  517,185,536  127,592,224

a: c_min          33,554,432   33,554,432   33,554,432        
b: c_min          33,554,432   33,554,432   33,554,432

a: size            2,026,488  330,604,488   92,910,384
b: size            2,097,696  330,619,392   26,472,472

a: arc_meta_used   2,026,488  330,604,488   92,910,384
b: arc_meta_used   2,097,696  330,619,392   26,472,472

a: dbuf_size          42,976   64,609,992   18,403,208
b: dbuf_size          46,768   64,612,520    4,473,296

a: dnode_size        126,048  158,995,008   45,673,008
b: dnode_size        130,896  158,999,856   11,134,240

a: metadata_size   1,809,408   71,885,824   16,996,864
b: metadata_size   1,866,240   71,891,456    6,053,376

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux code style requirements.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • All commit messages are properly formatted and contain Signed-off-by.
  • Change has been approved by a ZFS on Linux member.

Copy link
Contributor

@loli10K loli10K left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM but i think we should add some new test case(s).

@chrisrd
Copy link
Contributor Author

chrisrd commented Aug 25, 2017

I had a quick look at how zfs-tests is put together, and unfortunately I can't see any similar-enough existing tests to modify for this purpose. It looks like there's nothing in there checking for memory usage regressions etc. and it could take some significant infrastructure to add it in (e.g. reloading modules to get to a known state).

I'm afraid I don't have the bandwidth for that at this time ...especially as it's now beers o'clock on a Friday! 🍻

@loli10K
Copy link
Contributor

loli10K commented Aug 25, 2017

I'm afraid I don't have the bandwidth for that at this time ...especially as it's now beers o'clock on a Friday! 🍻

No need to rush, but merging this without a test case feels wrong: this issue has been flying under the radar for quite some time (~1 year) because we were missing a test case: i think we need to add it.

Copy link
Contributor

@tuxoko tuxoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisrd
Can you add dbuf_cache_size to arc_evictable_memory? Otherwise, looks good.

@dweeezil
Copy link
Contributor

@chrisrd The numbers in this part of the commit comment don't make sense:

On a 1GB box, the ARC defaults to maximum 370GB and minimum 16G, and the dbuf cache size comes out at 12G. I.e. the dbuf cache is an significant proportion of the minimum ARC size. With other overheads involved this actually means the ARC can't get down to the minimum.

With 1GB of RAM, the default arc_c_max is 512MiB and arc_c_min is 32MiB. Therefore, dbuf_cache_max_bytes is 16MiB.

Since it seems the newly-introduced dbuf cache is the cause of the problem in #6505, I have to wonder if a better solution would be to register a shrinker for the dbuf cache. Or does the whole "filesystem shrinker" schem rely in some way on there being only a single shrinker?

@chrisrd
Copy link
Contributor Author

chrisrd commented Aug 28, 2017

@loli10K whilst I think it would be beneficial to start adding zfs-test modules for memory issues such as this, I think I'll leave starting that project to someone else at this point.

@tuxoko I'm not sure exactly what value might be added to arc_evictable_memory: the dbuf cache is is only evictable down to it's minimum size (so maybe arc_evictable_memory += dbuf_cache_cur_size - dbuf_cache_min_size?), and even then, with this patch, it's only evictable in response to arc_c being lowered, i.e. really a 2nd order effect.

@dweezil you're right of course! My commit message was confusing ARC and metadata min/max sizes. And I mis-quoted "GB" instead of "MB" as well. Sigh. I'll update the pull request with that part of the commit message changed to:

On a 1GB box the ARC maximum size defaults to c_max 493M which gives a
dbuf cache default minimum size of 15.4M, and the ARC metadata defaults
to minimum 16M. I.e. the dbuf cache is an significant proportion of the
minimum metadata size. With other overheads involved this actually means
the ARC metadata doesn't get down to the minimum.

We could potentially do something about explicitly shrinking the dbuf cache by adding something into zfs_prune() which is called for echo 1 > /proc/sys/vm/drop_caches as well as general memory pressure. However zfs_prune() is per-superblock so we'd end up trying to shrink the (global) dbuf cache for each sb - not really a problem for a few or even perhaps tens of mounted filesystems, but not so nice for thousands of mounted filesystems (see also #6223). However I'm not sure if it's really worth it if, with this patch, the dbuf cache is already responsive "enough" to memory pressure.

What I think would be a nice addition would be a way of monitoring the effectiveness of the dbuf cache, e.g. hit/miss ratio, so you can see if changing the dbuf cache size makes a significant difference to a specific workload. But I can't really see where to instrument the hit/miss ratio. If anyone has any idea, feel free to point it out for me! Or even better, submit a patch.

@tuxoko
Copy link
Contributor

tuxoko commented Aug 28, 2017

@chrisrd

@tuxoko I'm not sure exactly what value might be added to arc_evictable_memory: the dbuf cache is is only evictable down to it's minimum size (so maybe arc_evictable_memory += dbuf_cache_cur_size - dbuf_cache_min_size?)

Yeah, that should do it.

and even then, with this patch, it's only evictable in response to arc_c being lowered, i.e. really a 2nd order effect.

That should be fine, the shrinker shouldn't normally free all the stuff at once, but we should return a more accurate value so it will shrink with a more accurate proportion. Though, we might need to see if dbuf_cache will free fast enough.

@behlendorf
Copy link
Contributor

@chrisrd could you please rebase this against master. I do think we need something like this, and probably some additional adjustments too. I'd like to do some local testing with this patch.

@chrisrd
Copy link
Contributor Author

chrisrd commented Sep 27, 2017

@behlendorf Rebased

@codecov
Copy link

codecov bot commented Sep 27, 2017

Codecov Report

Merging #6561 into master will decrease coverage by 13.54%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #6561       +/-   ##
===========================================
- Coverage   73.57%   60.02%   -13.55%     
===========================================
  Files         294      294               
  Lines       93863    93867        +4     
  Branches        0    28198    +28198     
===========================================
- Hits        69059    56344    -12715     
+ Misses      24804    24569      -235     
- Partials        0    12954    +12954
Flag Coverage Δ
#kernel 57.57% <100%> (-16.2%) ⬇️
#user 50.9% <100%> (-12.05%) ⬇️
Impacted Files Coverage Δ
module/zfs/dbuf.c 75.24% <100%> (-19.37%) ⬇️
module/zfs/arc.c 68.01% <100%> (-17.5%) ⬇️
lib/libspl/include/sys/stat.h 50% <0%> (-50%) ⬇️
lib/libspl/include/synch.h 40% <0%> (-40%) ⬇️
include/sys/dmu.h 62.5% <0%> (-37.5%) ⬇️
module/icp/api/kcf_ctxops.c 56.52% <0%> (-34.79%) ⬇️
module/zfs/vdev_raidz_math_avx512bw.c 16.66% <0%> (-33.34%) ⬇️
lib/libspl/strnlen.c 66.66% <0%> (-33.34%) ⬇️
lib/libuutil/uu_ident.c 28.57% <0%> (-32.15%) ⬇️
module/zfs/gzip.c 53.84% <0%> (-30.77%) ⬇️
... and 238 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 269db7a...91391d2. Read the comment docs.

@chrisrd
Copy link
Contributor Author

chrisrd commented Sep 28, 2017

Repushed to see if we can clear the error on buildbot/Ubuntu 14.04 i686

Commit d3c2ae1 introduced a dbuf cache with a default size of the
minimum of 100M or 1/32 maximum ARC size. (These figures may be adjusted
using dbuf_cache_max_bytes and dbuf_cache_max_shift.) The dbuf cache
is counted as metadata for the purposes of ARC size calculations.

On a 1GB box the ARC maximum size defaults to c_max 493M which gives a
dbuf cache default minimum size of 15.4M, and the ARC metadata defaults
to minimum 16M. I.e. the dbuf cache is an significant proportion of the
minimum metadata size. With other overheads involved this actually means
the ARC metadata doesn't get down to the minimum.

This patch dynamically scales the dbuf cache to the target ARC size
instead of statically scaling it to the maximum ARC size. (The scale is
still set by dbuf_cache_max_shift and the maximum size is still fixed by
dbuf_cache_max_bytes.) Using the target ARC size rather than the current
ARC size is done to help the ARC reach the target rather than simply
focusing on the current size.

Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes good sense and works as expected in my local testing. I'll get it merged.

@tuxoko's recommendation to include the dbuf_cache_size in the arc_evictable_memory() is a good idea but let's do it in a followup PR.

@behlendorf behlendorf merged commit e71cade into openzfs:master Sep 29, 2017
@behlendorf behlendorf added this to PR Needed in 0.7.3 Oct 2, 2017
aerusso pushed a commit to aerusso/zfs that referenced this pull request Oct 11, 2017
Commit d3c2ae1 introduced a dbuf cache with a default size of the
minimum of 100M or 1/32 maximum ARC size. (These figures may be adjusted
using dbuf_cache_max_bytes and dbuf_cache_max_shift.) The dbuf cache
is counted as metadata for the purposes of ARC size calculations.

On a 1GB box the ARC maximum size defaults to c_max 493M which gives a
dbuf cache default minimum size of 15.4M, and the ARC metadata defaults
to minimum 16M. I.e. the dbuf cache is an significant proportion of the
minimum metadata size. With other overheads involved this actually means
the ARC metadata doesn't get down to the minimum.

This patch dynamically scales the dbuf cache to the target ARC size
instead of statically scaling it to the maximum ARC size. (The scale is
still set by dbuf_cache_max_shift and the maximum size is still fixed by
dbuf_cache_max_bytes.) Using the target ARC size rather than the current
ARC size is done to help the ARC reach the target rather than simply
focusing on the current size.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Issue openzfs#6506
Closes openzfs#6561
tonyhutter pushed a commit that referenced this pull request Oct 16, 2017
Commit d3c2ae1 introduced a dbuf cache with a default size of the
minimum of 100M or 1/32 maximum ARC size. (These figures may be adjusted
using dbuf_cache_max_bytes and dbuf_cache_max_shift.) The dbuf cache
is counted as metadata for the purposes of ARC size calculations.

On a 1GB box the ARC maximum size defaults to c_max 493M which gives a
dbuf cache default minimum size of 15.4M, and the ARC metadata defaults
to minimum 16M. I.e. the dbuf cache is an significant proportion of the
minimum metadata size. With other overheads involved this actually means
the ARC metadata doesn't get down to the minimum.

This patch dynamically scales the dbuf cache to the target ARC size
instead of statically scaling it to the maximum ARC size. (The scale is
still set by dbuf_cache_max_shift and the maximum size is still fixed by
dbuf_cache_max_bytes.) Using the target ARC size rather than the current
ARC size is done to help the ARC reach the target rather than simply
focusing on the current size.

Reviewed-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Chris Dunlop <chris@onthe.net.au>
Issue #6506
Closes #6561
@tonyhutter tonyhutter moved this from PR Needed to Merged to 0.7.3 in 0.7.3 Oct 16, 2017
@chrisrd chrisrd deleted the zol-6506 branch February 27, 2018 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
0.7.3
Merged to 0.7.3
Development

Successfully merging this pull request may close these issues.

None yet

6 participants