Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent L2ARC #9582

Open
wants to merge 1 commit into
base: master
from
Open

Conversation

@gamanakis
Copy link
Contributor

gamanakis commented Nov 13, 2019

This commit makes the L2ARC persistent across reboots. It is largely
based on issue 3525 in Illumos by Saso Kiselkov, which was later
ported by Yuxuan Shui to zfsonlinux (#2672) with modifications by
Jorgen Lundman, and rebased to current master with fixes by me.

Co-authored-by: Saso Kiselkov skiselkov@gmail.com
Co-authored-by: Jorgen Lundman lundman@lundman.net
Co-authored-by: George Amanakis gamanakis@gmail.com
Ported-by: Yuxuan Shui yshuiv7@gmail.com
Signed-off-by: George Amanakis gamanakis@gmail.com

Motivation and Context

This feature implements a light-weight persistent L2ARC metadata
structure that allows L2ARC contents to be recovered after a reboot.
This significantly eases the impact a reboot has on read performance
on systems with large caches.

Closes #3744

Description

The following changes were made:

  • fix the log block payload size in arc_impl.h
  • fix memory leaks of abd_t objects
  • fix cancelling of L2ARC rebuild in l2arc_remove_vdev()
  • implement encryption
  • add functional tests
  • fix a bug in l2arc_blk_restore()
  • fix the calculation of l2arc_log_blk_overhead()
  • simplify the end-of-list detection in l2arc_rebuild()

How Has This Been Tested?

Setup:

truncate -s 10G a.img
truncate -s 10G b.img
losetup -f ~/a.img; losetup -f ~/b.img
zpool create -f tst /dev/loop0 cache /dev/loop1
zfs create -o encryption=on -o keysource=passphrase tst/enc
fio --ioengine=libaio --direct=1 --name=test --bs=2M --size=3G --readwrite=randread --runtime=7200 --time_based --iodepth=64 --directory=/tst
fio --ioengine=libaio --direct=1 --name=test --bs=2M --size=3G --readwrite=randread --runtime=7200 --time_based --iodepth=64 --directory=/tst/enc

Monitoring with:

zpool iostat -v2
watch "egrep 'log|rebuild|hdr' /proc/spl/kstat/zfs/arcstats" 
dmesg -Hw

Also running in a production server (VMs with ZVOLs, Samba, NFS, LXC) with unencrypted ZFS since 11/22/2019.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@Ornias1993

This comment has been minimized.

Copy link
Contributor

Ornias1993 commented Nov 13, 2019

You could change the commit message to:
"This commit makes the L2ARC persistent across reboots. It is largely
based on 3525 commit on Illumos"

The rest is already in the Authored and ported by tags.

That way it complies with the stylerule a little more.

edit
I've since come to realise i'm a retard and the style rule is 75 PER LINE, not 75 total. Please ignore this.

Copy link
Contributor

Ornias1993 left a comment

While i'm not versed in ZoL enough to review some/most of the actual function calls.
I do want to note it's very neatly structured and byond-well commented.

@Ornias1993

This comment has been minimized.

Copy link
Contributor

Ornias1993 commented Nov 13, 2019

Quick work fixing the commit message!
Seems only some space-tabs issues and the sort are left now :)
http://build.zfsonlinux.org/builders/Ubuntu%2017.10%20x86_64%20%28STYLE%29/builds/18872/steps/shell_2/logs/stdio

@gamanakis

This comment has been minimized.

Copy link
Contributor Author

gamanakis commented Nov 13, 2019

@Ornias1993 I accidentally deleted your comment regarding disabling persistent L2ARC. I guess we could make it an option the user can set, ie a zfs property?

@Ornias1993

This comment has been minimized.

Copy link
Contributor

Ornias1993 commented Nov 13, 2019

@gamanakis No worries that was me, I didnt notice I already submitted it and afterwards scrolled up and noted you already had a quite easy/nice hook to disable persistent L2Arc :)

(not a property yet though, so maybe that can be added indeed )

@ahrens

This comment has been minimized.

Copy link
Member

ahrens commented Nov 13, 2019

Awesome, thanks for picking this up!

based on 3525 commit on Illumos

I don't see that commit in illumos. Could you specify which codebase you started from? Is it #2672 ?

@Ornias1993

This comment has been minimized.

Copy link
Contributor

Ornias1993 commented Nov 13, 2019

@ahrens My fault of being unclear with my advice to change the commit message. @gamanakis also accidentily changed the PR summary.
this was the original (that should be put back, in the PR summary but left out of the commit summary):

This commit makes the L2ARC persistent across reboots. It is largely
based on 3525 commit on Illumos by Saso Kiselkov, which was later
ported by Yuxuan Shui to zfsonlinux, and rebased to current master
with fixes by myself.

With regards to the Illumos commit...
I see the mistake now, it is based on a Ilumos Issue, not commit.
https://www.illumos.org/issues/3525

@gamanakis

This comment has been minimized.

Copy link
Contributor Author

gamanakis commented Nov 13, 2019

@ahrens Yes, I picked up from #2672

@ahrens

This comment has been minimized.

Copy link
Member

ahrens commented Nov 13, 2019

Thanks for clarifying.

@Ornias1993

This comment has been minimized.

Copy link
Contributor

Ornias1993 commented Nov 15, 2019

@behlendorf Do you have anyone in mind to request a review from?

@behlendorf

This comment has been minimized.

Copy link
Contributor

behlendorf commented Nov 16, 2019

@gamanakis thanks for moving this work forward! I haven't had a chance to take a careful look at this PR to provide any detailed feedback. But when skimming I noticed that it doesn't appear to add any new test cases. One of the next steps for this PR would be add some to the tests/zfs-tests/tests/functional directory to verify everything is working as intended.

@behlendorf behlendorf requested review from ahrens, behlendorf and grwilson Nov 16, 2019
@gamanakis

This comment has been minimized.

Copy link
Contributor Author

gamanakis commented Nov 16, 2019

@behlendorf I will work on that. Meanwhile I did a minor cleanup and implemented this as a zpool property, so that the user can set it. Is it ok if I push (or force-push?) those commits in this PR?

@Ornias1993

This comment has been minimized.

Copy link
Contributor

Ornias1993 commented Nov 16, 2019

@gamanakis Usually those changes can get pushed to the same repo no problem, certainly before formal review :)

@scineram

This comment has been minimized.

Copy link

scineram commented Nov 16, 2019

Thanks for picking this up.

What is the point of another property?

@Ornias1993

This comment has been minimized.

Copy link
Contributor

Ornias1993 commented Nov 16, 2019

@scineram I think @gamanakis means the earlier discussed option to enable/disable the persistance

@Ornias1993

This comment has been minimized.

Copy link
Contributor

Ornias1993 commented Nov 16, 2019

@gamanakis Just as headsup: lookup if your git email on your machine is the same as the one setup on github. it doesn't recognise your commits as linked to your github account. you might want too ;)

@Ornias1993

This comment has been minimized.

Copy link
Contributor

Ornias1993 commented Nov 16, 2019

@gamanakis You might need to add your property to the test suite too, test errors:
zpool_get_002_pos
Found zpool features not in the zpool_get test config 59/60.

@gamanakis

This comment has been minimized.

Copy link
Contributor Author

gamanakis commented Nov 17, 2019

I pushed a final functional commit finalizing the zpool property, and adding man pages.
I am working on the tests now.

@gamanakis gamanakis force-pushed the gamanakis:persist_l2arc_unified branch 4 times, most recently from 4374a5a to acf0bac Nov 17, 2019
man/man8/zpool.8 Outdated Show resolved Hide resolved
man/man8/zpool.8 Outdated Show resolved Hide resolved
@gamanakis gamanakis force-pushed the gamanakis:persist_l2arc_unified branch from 687dae1 to e02d8f0 Nov 17, 2019
@gamanakis

This comment has been minimized.

Copy link
Contributor Author

gamanakis commented Feb 21, 2020

I came across another minor thing that needs fixing. If we remove an L2ARC device and then re-add it to the same pool, a rebuild of L2ARC is not triggered. For this to happen we have to export and re-import the pool.

  1. zpool create test a.img cache b.img
  2. fio rand read so that L2ARC starts filling up
  3. zpool remove test b.img
  4. zpool add test cache b.img --> L2ARC is not being restored

While this is minor, I think we should solve it. We first need to add l2arc_spa_rebuild_start(spa) in spa_vdev_add().

The missing thing is that nvlist_lookup_boolean_value in spa_load_l2cache() does not pick up the ZPOOL_CONFIG_L2CACHE_PERSISTENT value in the label of the device. This is strange since we introduced code in module/zfs/vdev_label.c to write it in the label, and it is present when looking with zdb -l.

@behlendorf

This comment has been minimized.

Copy link
Contributor

behlendorf commented Feb 21, 2020

and it is present when looking with zdb -l.

This is likely because the zpool add test cache b.img command will overwrite the vdev label when adding the new cache device. Avoiding this behavior might be a little tricky, I'm also not sure I'd consider this a bug. While it might be nice if it recognized this device was previously part of the pool and rebuild it, I wouldn't expect it to be able to after explicitly removing the vdev with zpool remove.

@gamanakis

This comment has been minimized.

Copy link
Contributor Author

gamanakis commented Feb 22, 2020

@behlendorf that makes sense. There is an easier way around this, if we decide to implement it. Right now we rely on both vdev_label and l2ad_dev_hdr being sane in order to trigger the L2ARC rebuild. If we skip the vdev_label check at line 1900 of module/zfs/spa.c then we could trigger the rebuild even with zpool add test cache b.img.

@gamanakis gamanakis force-pushed the gamanakis:persist_l2arc_unified branch from 70760da to 2458afe Feb 22, 2020
@gamanakis

This comment has been minimized.

Copy link
Contributor Author

gamanakis commented Feb 22, 2020

2458afe: replace nvlist_add_boolean_value() with fnvlist_add_boolean_value() in vdev_config_generate()

include/sys/arc_impl.h Outdated Show resolved Hide resolved
include/sys/arc_impl.h Outdated Show resolved Hide resolved
module/zfs/spa.c Outdated Show resolved Hide resolved
(void) nvlist_lookup_boolean_value(l2cache[i],
ZPOOL_CONFIG_L2CACHE_PERSISTENT,
&do_rebuild);
l2arc_add_vdev(spa, vd, do_rebuild);

This comment has been minimized.

Copy link
@grwilson

grwilson Feb 22, 2020

Member

Since l2arc_add_vdev calls l2arc_dev_hdr_read to read the header, do we need to explicitly pass in the do_rebuild boolean to this function?

This comment has been minimized.

Copy link
@gamanakis

gamanakis Feb 22, 2020

Author Contributor

These lines check if the vdev_label has the attribute ZPOOL_CONFIG_L2CACHE_PERSISTENT. Right now we check both the vdev_label and the header before deciding to rebuild the L2ARC. Technically we could rely only on the header (which contains the necessary information to start the rebuild).
@behlendorf If we rely only on the header, and remove this check here, we could also trigger the L2ARC rebuild in case of zpool add test cache b.img.

This comment has been minimized.

Copy link
@gamanakis

gamanakis Feb 23, 2020

Author Contributor

On second thought, we have device zpool offline and online and should not probably trigger L2ARC rebuild upon zpool add. In fact it may be a good idea to reset the device header on zpool add.
However, @gwilson is right (see next comment) and L2ARC is not triggered when a pool with an offlined L2ARC device is imported and the L2ARC device is onlined again. This is mediated by vdev_reopen().

module/zfs/vdev.c Outdated Show resolved Hide resolved
module/zfs/arc.c Outdated Show resolved Hide resolved
module/zfs/arc.c Outdated Show resolved Hide resolved
if (adddev->l2ad_end < l2arc_rebuild_blocks_min_l2size) {
adddev->l2ad_log_entries = 0;
} else {
adddev->l2ad_log_entries = MIN((adddev->l2ad_end -
adddev->l2ad_start) >> SPA_MAXBLOCKSHIFT,
L2ARC_LOG_BLK_MAX_ENTRIES);
}
Comment on lines 8972 to 9027

This comment has been minimized.

Copy link
@grwilson

grwilson Feb 24, 2020

Member

I think this mean that any device which is smaller than l2arc_rebuild_blocks_min_l2size won't write log blocks. Does the code handle a mix of persistent l2arc devices with non-persistent l2arc devices? For example, what if you set l2arc_rebuild_blocks_min_l2size to 1TB and add 512GB cache device, then reset l2arc_rebuild_blocks_min_l2size back to 1GB and add another 512GB device. Now one device would have l2ad_log_entries set to 0 and another which is set to 1023.

This comment has been minimized.

Copy link
@gamanakis

gamanakis Feb 24, 2020

Author Contributor

Good point. I think the code can differentiate between them since each device is handled separately. I will add a test case.

This comment has been minimized.

Copy link
@gamanakis

gamanakis Feb 24, 2020

Author Contributor

@grwilson I see now what you mean, I misinterpreted your comment. l2arc_rebuild_blocks_min_l2size is defined as a module parameter, it applies to every L2ARC device when l2arc_add_vdev() is run.
So in your scenario, when l2arc_rebuild_blocks_min_l2size is set back to 1GB then the old 512GB device would still have no log blocks written to it unless it is offlined/onlined or the pool is exported/imported. The new 512GB will have log blocks written to it because when it was added l2arc_rebuild_blocks_min_l2size was already at 1GB.
Given this behaviour, what would you suggest?

* is going to be rebuilt since dh_log_blk_count will be set to
* 0.
*/
l2arc_dev_hdr_update(adddev);

This comment has been minimized.

Copy link
@grwilson

grwilson Feb 24, 2020

Member

Do we still write the l2arc hdr if adddev->l2ad_log_entries is 0?

This comment has been minimized.

Copy link
@gamanakis

gamanakis Feb 24, 2020

Author Contributor

Yes, we do. It is only 512 bytes, so I left it that way.

@ryao

This comment has been minimized.

Copy link
Contributor

ryao commented Feb 24, 2020

My schedule was not quite as free as I would have liked. I will need an extra day to do review.

@Ornias1993

This comment has been minimized.

Copy link
Contributor

Ornias1993 commented Feb 24, 2020

@behlendorf I think a delay till the 1st of March is reasonable considering current feedback and the delay @ryao is facing...

@gamanakis gamanakis force-pushed the gamanakis:persist_l2arc_unified branch from 2458afe to a64fcfe Feb 24, 2020
@gamanakis

This comment has been minimized.

Copy link
Contributor Author

gamanakis commented Feb 24, 2020

a64fcfe: Implemented feedback from @grwilson so far. Most notably:

  • trigger L2ARC rebuild in vdev_reopen(), applies to onlining an L2ARC device
  • added a test offlining/onlining an L2ARC device and verifying L2ARC is rebuilt
  • reset the L2ARC device header upon zpool add
  • update man pages
  • rename (d)hdr = dev->l2ad_dev_hdr to l2dhdr, same with dev->l2ad_dev_hdr_asize
@Ornias1993

This comment has been minimized.

Copy link
Contributor

Ornias1993 commented Feb 24, 2020

@gamanakis Question, for clearity, with these changes, if you use l2arc_headroom = 0 it will basically put everything from ARC into L2ARC just like your previous example:

Example:
ARC is at 16GB. l2arc_write_max = 64 MB, l2arc_headroom = 16GB/64MB = 256.

Also: What is the default setting for l2arc_headroom?

@gamanakis

This comment has been minimized.

Copy link
Contributor Author

gamanakis commented Feb 24, 2020

@gamanakis Question, for clearity, with these changes, if you use l2arc_headroom = 0 it will basically put everything from ARC into L2ARC just like your previous example:

Yes, it will. Give enough time to write all its buffers. Boosting l2arc_write_max to something like 64MB (default is 8MB) will also help in writing in faster (but also depends on the write speed of the L2ARC device).

Also: What is the default setting for l2arc_headroom?

l2arc_headroom defaults to 2, so 2 * l2arc_write_max. I did not change the default.

cmd/zdb/zdb.c Outdated Show resolved Hide resolved
include/sys/arc_impl.h Outdated Show resolved Hide resolved
*/
uint64_t le_prop;
uint64_t le_daddr; /* buf location on l2dev */
const uint64_t le_pad[11]; /* pad to 128 bytes */

This comment has been minimized.

Copy link
@ahrens

ahrens Feb 24, 2020

Member

Why do we need/want to pad it to 128 bytes? Why not 64 bytes, or no padding at all?

This comment has been minimized.

Copy link
@gamanakis

gamanakis Feb 24, 2020

Author Contributor

I think those were reserved for future use in the original implementation.

include/sys/arc_impl.h Outdated Show resolved Hide resolved
include/sys/arc_impl.h Outdated Show resolved Hide resolved
man/man8/zpoolconcepts.8 Outdated Show resolved Hide resolved
module/zfs/arc.c Outdated Show resolved Hide resolved
@gamanakis gamanakis force-pushed the gamanakis:persist_l2arc_unified branch from 3c367fd to d020f54 Feb 24, 2020
@gamanakis

This comment has been minimized.

Copy link
Contributor Author

gamanakis commented Feb 24, 2020

d020f54: Fixes from @ahrens, fix off/onlining if l2arc_vdev_present, fix a subtle bug in l2arc_rebuild() in case of memory pressure.

@gamanakis gamanakis force-pushed the gamanakis:persist_l2arc_unified branch from d020f54 to d2353f5 Feb 25, 2020
@gamanakis

This comment has been minimized.

Copy link
Contributor Author

gamanakis commented Feb 25, 2020

d2353f5: in case we off/online an L2ARC device (vdev_reopen), but do not export/import the pool, l2arc_vdev_present is true. This means if we restore the contents to L2ARC we need to first clear the lists and counts of buffers (l2ad_buflist) and pointers to log blocks (l2ad_lbptr_list).

db6b38d: also fix vdev stats when off/onlining.

39c2e88: clear lists and counts of buffers and pointers to log blocks only if we are reopening a cache device

9b538e2 : replace L2BLK_GET_GET with L2BLK_GET, GET_SET with SET.

@gamanakis gamanakis force-pushed the gamanakis:persist_l2arc_unified branch 5 times, most recently from 9b538e2 to 937bd94 Feb 25, 2020
@gamanakis

This comment has been minimized.

Copy link
Contributor Author

gamanakis commented Feb 26, 2020

937bd94 - a500355: Teach zdb to dump log blocks and log entries of a cache device and update the man page.

0cca77f: LSIZE and PSIZE in le_prop and lbp_prop are in bytes, not sectors.
a9342a9: fix printf alignment in zdb.

2f57bbb: Rebase to master and add a test off/onlining a cache device without re-importing the pool.

@gamanakis gamanakis force-pushed the gamanakis:persist_l2arc_unified branch 4 times, most recently from 0cca77f to a9342a9 Feb 26, 2020
This commit makes the L2ARC persistent across reboots. It is largely
based on issue 3525 in Illumos. This feature implements a light-weight
persistent L2ARC metadata structure that allows L2ARC contents to be
recovered after a reboot. This significantly eases the impact a reboot
has on read performance on systems with large caches.

Co-authored-by: Saso Kiselkov <skiselkov@gmail.com>
Co-authored-by: Jorgen Lundman <lundman@lundman.net>
Co-authored-by: George Amanakis <gamanakis@gmail.com>
Ported-by: Yuxuan Shui <yshuiv7@gmail.com>
Signed-off-by: George Amanakis <gamanakis@gmail.com>
@gamanakis gamanakis force-pushed the gamanakis:persist_l2arc_unified branch from a9342a9 to 2f57bbb Feb 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
OpenZFS 2.0
  
New OpenZFS 2.0 Features (0.8->2.0)
Linked issues

Successfully merging this pull request may close these issues.

You can’t perform that action at this time.