Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

L2ARC restore sometime fails #15202

Closed
shodanshok opened this issue Aug 23, 2023 · 16 comments
Closed

L2ARC restore sometime fails #15202

shodanshok opened this issue Aug 23, 2023 · 16 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@shodanshok
Copy link
Contributor

System information

Type Version/Name
Distribution Name Debian
Distribution Version 12.1
Kernel Version 6.1.0-11-amd64
Architecture x86-64
OpenZFS Version 2.1.11

Describe the problem you're observing

Sometimes, L2ARC restore fails without apparent reason. /proc/spl/kstat/zfs/dbgmsg simply shows L2ARC rebuild no valid log blocks. See below for an example.

# L2ARC populated
root@debian12:~/zfs/zfs-2.1.11# zpool iostat -v
                     capacity     operations     bandwidth
pool               alloc   free   read  write   read  write
-----------------  -----  -----  -----  -----  -----  -----
tank                299M  15.2G      1      0  32.5K  5.98K
  vdb               299M  15.2G      1      0  32.5K  5.98K
  indirect-1           -      -      0      0      0      0
cache                  -      -      -      -      -      -
  /root/l2arc.img  6.37M  3.99G      0      0    399  5.54K
-----------------  -----  -----  -----  -----  -----  -----

# exporting and reimporting pool, L2ARC is empty now
root@debian12:~/zfs/zfs-2.1.11# zpool export tank; zpool import tank
root@debian12:~/zfs/zfs-2.1.11# zpool iostat -v
                     capacity     operations     bandwidth
pool               alloc   free   read  write   read  write
-----------------  -----  -----  -----  -----  -----  -----
tank                299M  15.2G     48     78   998K   956K
  vdb               299M  15.2G     48     78   998K   956K
  indirect-1           -      -      0      0      0      0
cache                  -      -      -      -      -      -
  /root/l2arc.img      0  4.00G      5      2   296K  15.4K
-----------------  -----  -----  -----  -----  -----  -----

# no valid log blocks
root@debian12:~/zfs/zfs-2.1.11# tail /proc/spl/kstat/zfs/dbgmsg
1692828434   spa_history.c:307:spa_history_log_sync(): txg 21334 L2ARC rebuild no valid log blocks
1692828434   spa_history.c:294:spa_history_log_sync(): command: zpool import tank

# create some new metadata L2ARC cache
root@debian12:~/zfs/zfs-2.1.11# time find /tank/test/fsmark/ -exec stat {} \+ > /dev/null
root@debian12:~/zfs/zfs-2.1.11# zpool iostat -v
                     capacity     operations     bandwidth
pool               alloc   free   read  write   read  write
-----------------  -----  -----  -----  -----  -----  -----
tank                299M  15.2G     51      6  1.07M  76.8K
  vdb               299M  15.2G     51      6  1.07M  76.8K
  indirect-1           -      -      0      0      0      0
cache                  -      -      -      -      -      -
  /root/l2arc.img  4.83M  3.99G      0      2  13.2K   139K
-----------------  -----  -----  -----  -----  -----  -----

# exporting and reimporting pool, L2ARC was preserved this time.
root@debian12:~/zfs/zfs-2.1.11# zpool iostat -v
                     capacity     operations     bandwidth
pool               alloc   free   read  write   read  write
-----------------  -----  -----  -----  -----  -----  -----
tank                299M  15.2G     37     47   605K   592K
  vdb               299M  15.2G     37     47   605K   592K
  indirect-1           -      -      0      0      0      0
cache                  -      -      -      -      -      -
  /root/l2arc.img  6.42M  3.99G      7      1   187K  8.87K
-----------------  -----  -----  -----  -----  -----  -----

# log blocks restored
root@debian12:~/zfs/zfs-2.1.11# tail /proc/spl/kstat/zfs/dbgmsg
1692828904   spa_history.c:307:spa_history_log_sync(): txg 21482 L2ARC rebuild successful, restored 4 blocks
1692828904   spa_history.c:294:spa_history_log_sync(): command: zpool import tank

Describe how to reproduce the problem

No specific reproduced yet, it simply happens.

Include any warning/errors/backtraces from the system logs

None.

@shodanshok shodanshok added the Type: Defect Incorrect behavior (e.g. crash, hang) label Aug 23, 2023
@shodanshok
Copy link
Contributor Author

@gamanakis any idea on what is the cause of the issue and/or how to reproduce it consistently? Thanks.

@gamanakis
Copy link
Contributor

Yes, it has to do probably with the label and the ashift. We pushed a commit a while back and it's not included in 2.1.11.

@gamanakis
Copy link
Contributor

Can you try with #14963 applied?

@shodanshok
Copy link
Contributor Author

I can confirm the issue is the one solved by your patch (ie: L2ARC fail to rebuild after first import). Does it means that, after the first failed import, L2ARC is recreated with implicit ashift=9? If so, it will be suboptimal for most SSD.

Also, it seems that your patch is not even in ZFS 2.1.12. It is scheduled for the 2.2 release, or an eventual 2.1.13 will include it?

Thanks.

@gamanakis
Copy link
Contributor

gamanakis commented Aug 27, 2023 via email

@shodanshok
Copy link
Contributor Author

It will be in 2.1.13.

Excellent.

Just to be sure to understand: currently, after the first failed import, L2ARC is recreated with implicit ashift=9?

If so, when 2.1.13 will be out one has to recreate the L2ARC to get ashit=12, right?

@gamanakis
Copy link
Contributor

gamanakis commented Aug 27, 2023 via email

@shodanshok
Copy link
Contributor Author

I've updated to 2.1.13 and re-attached the cache device with -o ashift=12. How can I check the correct ashift was stored on the vdev label? zdb -l or zdb -C show nothing related.

Thanks.

@gamanakis
Copy link
Contributor

I don't think I taught zdb to read the ashift of the cache device. Let me take a look.

@gamanakis
Copy link
Contributor

I opened #15331 to enable this.

@shodanshok
Copy link
Contributor Author

shodanshok commented Dec 19, 2023

@gamanakis I tried zdb -l cachedev on a newly created L2ARC device with ZFS 2.1.14, but I can not see anything related to ashift in the output. Am I missing something? Thanks.

EDIT: on ZFS 2.2.2 zdb -l cachedev correctly shows L2ARC device ashift. Is the missing ashift on ZFS 2.1.14 a cosmetic issue only?

@gamanakis
Copy link
Contributor

@shodanshok I do not think this zdb feature was added in the 2.1.14 branch. Would you mind having a look too?

@gamanakis
Copy link
Contributor

The commit enabling storing of the ashift in the label of cache devices is present in 2.1.14, though.

@shodanshok
Copy link
Contributor Author

I do not think this zdb feature was added in the 2.1.14 branch. Would you mind having a look too?

I missed that; I was under the impression that it was added to 2.1.14.

The commit enabling storing of the ashift in the label of cache devices is present in 2.1.14, though.

I created a test pool on a 2.1.14 machine, added a L2ARC device, updated to 2.2.2 and rebooted. zdb -l shown nothing about ashift. Then, on 2.2.2, I removed and re-added the L2ARC device and zdb -l correctly shows the right ashift. So it seems that 2.1.14 did not stored ashift in the label.

I then used hexdump -C to examine the L2ARC device itself:

# 2.1.14 hexdump /root/l2arc.img -C
00044080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00044090  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

# 2.2.2 hexdump /root/l2arc.img -C
00044080  61 73 68 69 66 74 00 00  00 00 00 08 00 00 00 01  |ashift..........|
00044090  00 00 00 00 00 00 00 0c  00 00 00 00 00 00 00 00  |................|

How can I check if ashift was correctly stored/used on ZFS 2.1.14?
Thanks.

@gamanakis
Copy link
Contributor

See the comments of fe4d055. We need that commit to enable reporting through zdb. Otherwise it remains unaware. I will submit a PR for 2.1.15.

@gamanakis
Copy link
Contributor

I opened #15690

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants