ARM 32bit - Data corrupted on import raidz pool on zfs-0.7.4 #6981

geaaru · 2017-12-18T13:06:25Z

Hi, on upgrade to zfs-0.7.4 I can't import previously zfs pool with raidz.

System information

Type	Version/Name
Distribution Name	Gentoo/Sabayon
Distribution Version	2017
Linux Kernel	Linux version 4.14.7 (vanilla)
Architecture	armv7 (32bit)
ZFS Version	0.7.4
SPL Version	0.7.4

Describe the problem you're observing


$# # zpool import data
   pool: data
     id: 15066551362776210275
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: http://zfsonlinux.org/msg/ZFS-8000-72
 config:

        data        FAULTED  corrupted data
          raidz1-0  ONLINE
            sdb2    ONLINE
            sdc5    ONLINE
            sdd3    ONLINE
          raidz1-1  ONLINE
            sdc6    ONLINE
            sdb1    ONLINE
            sdd4    ONLINE
          raidz1-2  ONLINE
            sdb3    ONLINE
            sdd7    ONLINE
            sdc2    ONLINE
$# # zpool import data
cannot import 'data': I/O error
        Destroy and re-create the pool from
        a backup source.

Describe how to reproduce the problem

On reboot with previous kernel 4.9.22 + zfs-0.6.5.9 pool is then not corrupted.

   pool: data
     id: 15066551362776210275
  state: ONLINE
 status: Some supported features are not enabled on the pool.
 action: The pool can be imported using its name or numeric identifier, though
	some features will not be available without an explicit 'zpool upgrade'.
 config:

	data        ONLINE
	  raidz1-0  ONLINE
	    sdb2    ONLINE
	    sdc5    ONLINE
	    sdd3    ONLINE
	  raidz1-1  ONLINE
	    sdc6    ONLINE
	    sdb1    ONLINE
	    sdd4    ONLINE
	  raidz1-2  ONLINE
	    sdb3    ONLINE
	    sdd7    ONLINE
	    sdc2    ONLINE

The text was updated successfully, but these errors were encountered:

geaaru · 2017-12-18T13:09:34Z

Some more info... it seems that this problem is not present with mirror pool or strip pool:

  pool: crypt
     id: 16541560646217126425
  state: ONLINE
 status: Some supported features are not enabled on the pool.
 action: The pool can be imported using its name or numeric identifier, though
        some features will not be available without an explicit 'zpool upgrade'.
 config:

        crypt                                     ONLINE
          mirror-0                                ONLINE
            ed4b60a3-22e0-4ce8-833c-be670830cdc1  ONLINE
            6168902d-88f9-4f45-b786-7c025b837bfc  ONLINE

   pool: data2
     id: 3554911410334045254
  state: ONLINE
 status: Some supported features are not enabled on the pool.
 action: The pool can be imported using its name or numeric identifier, though
        some features will not be available without an explicit 'zpool upgrade'.
 config:

        data2       ONLINE
          sdd8      ONLINE
          sdc4      ONLINE
          sdc8      ONLINE
          sdd6      ONLINE
          sdb4      ONLINE
          sdd2      ONLINE

rincebrain · 2017-12-18T13:14:06Z

You could try changing the value of /sys/module/zfs/parameters/zfs_vdev_raidz_impl from "fastest" to "original" on 0.7.4 and see if it still thinks the pool is in a bad way. (Also, I'd be curious to see what the contents of it is on your system, as I don't have an ARM system with ZFS handy.)

I would really hope this doesn't change anything, but it being specific to RAIDZ makes me curious.

(Also, what ARM platform specifically? Is this, say, a Raspberry Pi of some flavor, or something else? What's /proc/cpuinfo say?)

geaaru · 2017-12-18T13:45:43Z

Hi,

ARMv7 Banana PI :)

Some results with original mode.

# cat /sys/module/zfs/parameters/zfs_vdev_raidz_impl 
[fastest] original scalar jarvis ~ # 
# echo "original" >  /sys/module/zfs/parameters/zfs_vdev_raidz_impl 
# cat /sys/module/zfs/parameters/zfs_vdev_raidz_impl 
fastest [original] scalar jarvis ~ #

   pool: data
     id: 15066551362776210275
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
	The pool may be active on another system, but can be imported using
	the '-f' flag.
   see: http://zfsonlinux.org/msg/ZFS-8000-72
 config:

	data        FAULTED  corrupted data
	  raidz1-0  ONLINE
	    sdb2    ONLINE
	    sdc5    ONLINE
	    sdd3    ONLINE
	  raidz1-1  ONLINE
	    sdc6    ONLINE
	    sdb1    ONLINE
	    sdd4    ONLINE
	  raidz1-2  ONLINE
	    sdb3    ONLINE
	    sdd7    ONLINE
	    sdc2    ONLINE

Modinfo:

# modinfo zfs | grep ver
version:        0.7.4-r0-gentoo
# modinfo spl | grep ver
version:        0.7.4-r0-gentoo

rincebrain · 2017-12-18T17:22:06Z

Hm, I couldn't readily reproduce this on a new pool I created on an x64 box with 0.6.5.9 or on a new pool I created on my poor RPi3 with 0.6.5.9 and tried importing on 0.7.4 (both worked fine).

You could also try "zpool import -C /tmp/zpool.cache data" to see if it's an issue with zpool.cache having some state that's incorrect, but I would be at least mildly surprised if that were the issue.

geaaru · 2017-12-18T18:08:39Z

If could be helpful, this pool is very old and upgraded with new version at every new release. I don't remember what is initial release but could be maybe 0.6.0 so I don't know if could be related with this.

About your test .. do you means -c (not -C) ? how can I create /tmp/zpool.cache file ?

I will try to see if on amd64 I have same issue.

geaaru · 2017-12-18T18:19:37Z

I do this test:

boot with zfs-0.6.5.9
zfs import data
save /etc/zfs/zpool.cache on /root/
reboot with kernel 4.14+zfs-0.7.4

# zpool import  -c /root/zpool.cache data
cannot import 'data': I/O error
	Destroy and re-create the pool from
	a backup source.

It's seems that I receive same error that I receive without use of zpool.cache file.

rincebrain · 2017-12-18T18:52:34Z

The experiment I wanted to run was explicitly the opposite of that - I wanted you to tell it to use a cachefile that doesn't exist and/or delete the extant one, not point it at a copy from 0.6.5.9.

geaaru · 2017-12-18T21:02:05Z

I can't use a file that doesn't exist:

# zpool import  -c /tmp/zpool.cache data
failed to open cache file: No such file or directory
cannot import 'data': no such pool available

rincebrain · 2017-12-19T12:31:08Z

@geaaru Try import -c none, then.

geaaru · 2017-12-19T12:38:10Z

@rincebrain: I think that doesn't exists -c none option... try always to found a file with name "none".

However, hereinafter command output:

# zpool import -c none data
failed to open cache file: No such file or directory
cannot import 'data': no such pool available

geaaru · 2017-12-20T11:57:03Z

Hi, I test also with last version 0.7.5 and I have same issue on arm.

geaaru · 2017-12-20T17:38:43Z

Additional informations: I attached same zpool on amd64 with zfs-0.7.4 and I dont' receive "FAULTED corrupted data". So, it seems an issue present on arm 32bit environment.

behlendorf · 2017-12-20T18:33:24Z

By chance is the ARMv7 Banana PI a big endian system?

loli10K · 2017-12-20T19:07:52Z

Output of lscpu from my BananaPi M1:

Architecture:          armv7l
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0
Off-line CPU(s) list:  1
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             1

I've been running ZFS on this system for at least a couple of years now without any major issue; i only use mirrors though.

geaaru · 2017-12-21T09:08:52Z

On same storage I have both mirror and strip and I confirm that works fine. Problem is only with raidz. I'm trying to downgrade to 0.7.1 to identify a minor range for bisect. I will align you about results asap.

Thanks at all for support.

rincebrain · 2017-12-21T14:11:04Z

@geaaru something else that might be interesting to check, the contents of /proc/spl/kstat/zfs/fletcher_4_bench and trying the other implementations (by changing /sys/module/zcommon/parameters/zfs_fletcher_4_impl) to see if the corrupted metadata message goes away on 0.7.X.

geaaru · 2017-12-21T18:21:54Z

Hi,

executed these tests:

zfs-0.7.5+kernel-4.9.22 (same kernel used with zfs-0.6.5.9): some issue
zfs-0.7.1+kernel-4.9.22 (same kernel used with zfs-0.6.5.9): some issue:

#  modinfo zfs | grep version
version:        0.7.1-r0-gentoo
srcversion:     5DDE36456163E5371A1832C

# zpool import data
cannot import 'data': I/O error
	Destroy and re-create the pool from
	a backup source.

About your questions:

With zfs-0.7.5+kernel-4.9.22:

# cat /proc/spl/kstat/zfs/fletcher_4_bench 
0 0 0x01 -1 0 14777921666 72002144963041
implementation   native         byteswap       
scalar           325237018      264763015      
superscalar      212174360      200653065      
superscalar4     172048520      165600854      
fastest          scalar         scalar  

#  cat /sys/module/zcommon/parameters/zfs_fletcher_4_impl 
[fastest] scalar superscalar superscalar4 

# echo superscalar  > /sys/module/zcommon/parameters/zfs_fletcher_4_impl 

# cat /sys/module/zcommon/parameters/zfs_fletcher_4_impl 
fastest scalar [superscalar] superscalar4 

# zpool import data
cannot import 'data': I/O error
	Destroy and re-create the pool from
	a backup source.

# echo superscalar4  > /sys/module/zcommon/parameters/zfs_fletcher_4_impl 

# zpool import data
cannot import 'data': I/O error
	Destroy and re-create the pool from
	a backup source.

# echo scalar  > /sys/module/zcommon/parameters/zfs_fletcher_4_impl 

# zpool import data
cannot import 'data': I/O error
	Destroy and re-create the pool from
	a backup source.

geaaru · 2017-12-21T18:47:11Z

I remember that I found this issue in the past.
If could be useful for identify issue.

Problem it was available on 0.7.0-r3 (see my previous issue #6031).

So, this means that 0.6.5.9 from my side it has patch of issue #6031 and so issue about raidz could be related with a regression from 0.7.0-r3 or 0.7.0-r4/r5.

Anyone have an idea where could be the regression to simplify bisect process ?

geaaru · 2017-12-23T18:33:33Z

Some others informations...

I test raidz1 from simple file:

# for i in {1..3}; do truncate -s 2G ./$i.img ; done
# zpool create test raidz1 /data/test-zfs/1.img /data/test-zfs/2.img /data/test-zfs/3.img

# zpool status 
  pool: test
 state: ONLINE
  scan: none requested
config:

	NAME                      STATE     READ WRITE CKSUM
	test                      ONLINE       0     0     0
	  raidz1-0                ONLINE       0     0     0
	    /data/test-zfs/1.img  ONLINE       0     0     0
	    /data/test-zfs/2.img  ONLINE       0     0     0
	    /data/test-zfs/3.img  ONLINE       0     0     0

errors: No known data errors

And this case it seems that works fine.

So, i'm not sure where is issue. Probably it is related with a strange status of pool. It's very strange that this issue is not present on amd64 arch. Could be relative to an issue on checksum algorithm on 32bit arch ? But then why with a pool from file it works fine.

Thanks in advance for any suggestions.

A possible solution it seems destroy and create pool but there are a lot of data :'(

rincebrain · 2017-12-23T19:21:22Z

@geaaru Yeah, I tried reproducing it with pools created with 0.6.5.11 and 0.7.4 on both my RPi3 and an amd64 VM, and couldn't, so I'm guessing it's something about how old the pool is, and that's...a very large space to search, compared to politely asking you to git bisect. :)

geaaru · 2017-12-28T18:03:56Z

Ok, I proceed with rebuild of the pool.

However, it's strange that on 0.6.5.11 it works fine.

Second, on amd64 same pool with 0.7.4 it works fine.

Only for information. Between 0.6.x and 0.7.x are there a lot of changes on checksum functions ?

I will close issue after some more tests with new pool. Thanks at all for support.

geaaru · 2018-01-01T00:54:59Z

Well, maybe I found correct steps to reproduce this issue.

After remove all data and create a new pool (from amd64 environment):

before begin copy of all data I just try to see if empty raidz with a single block of devices are then correctly see on arm environment and reply is "yes"

   pool: data
     id: 11040670559486866873
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

	data        ONLINE
	  raidz1-0  ONLINE
	    sdb2    ONLINE
	    sdc5    ONLINE
	    sdd3    ONLINE

after complete all three blocks of raidz1 I begin to copy about 500GB of data with a zfs send/receive from another pool. After that I execute rollback to snapshot received with "zfs receive" command to enable filesystem. This operation is been done on amd64 environment. After this, I test again on arm environment if all works fine, but...

   pool: data
     id: 11040670559486866873
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://zfsonlinux.org/msg/ZFS-8000-72
 config:

	data        FAULTED  corrupted data
	  raidz1-0  ONLINE
	    sdb2    ONLINE
	    sdc5    ONLINE
	    sdd3    ONLINE
	  raidz1-1  ONLINE
	    sdb3    ONLINE
	    sdd7    ONLINE
	    sdc2    ONLINE
	  raidz1-2  ONLINE
	    sdc6    ONLINE
	    sdb1    ONLINE
	    sdd4    ONLINE

again some issue. So, I confirm that problem is not related with a problem with an old pool.

I try again with these tests:

create raidz directly from arm environment
avoid to receive data from zfs send, I'm not sure that this copy also malformed data that generate this error.

rincebrain · 2018-01-01T01:00:38Z

Does it matter which dataset you do the send/recv of?

What do you mean, "After that I execute rollback to snapshot received with "zfs receive" command to enable filesystem." - you shouldn't need to do a rollback to "enable" a filesystem in some way.

What versions of ZFS were you using on the amd64 and the ARM machines? What version was the zfs send done on, and with which flags?

geaaru · 2018-01-01T01:22:29Z

Sorry, I will try to clarify my actions.

Preface:

On both arm and amd64 I currently use zfs-0.7.5 with kernel 4.14

On copy all data from broken pool for about 500GB to avoid simply my work instead of execute classic command "cp" I used "zfs send" to store data on another temporary pool (I'm not sure that this copy also broken metadata) without raidz (but with copies=2 as option).

So, after create a new pool "data" with three block of raidz1 I execute from temporary pool this command:

$# zfs send data2/recover/data@171231 | zfs receive data/recover@data
$# zfs rollback data/recover@data

After complete this I test on arm and I reproduce same issue.

Just now I tried to create all data pool (with three block of raidz1) from amd64 and try to see on arm if it is correctly imported and reply is yes:

   pool: data
     id: 12860594106812937020
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

	data        ONLINE
	  raidz1-0  ONLINE
	    sdb2    ONLINE
	    sdc5    ONLINE
	    sdd3    ONLINE
	  raidz1-1  ONLINE
	    sdc6    ONLINE
	    sdb1    ONLINE
	    sdd4    ONLINE
	  raidz1-2  ONLINE
	    sdb3    ONLINE
	    sdd7    ONLINE
	    sdc2    ONLINE

It seems that empty raid are correctly imported.

So, now is in progress copy of same data from temporary pool not with "zfs send" but with simple "cp" command always on amd64 environment. Tomorrow I will said you if also with copy of data directly with "cp" I can reproduce issue. If yes, last step is then execute copy of data directly on arm environment.

geaaru · 2018-01-01T08:53:20Z

Hi,

I confirm that also with direct "cp" command (on amd64) when I try to import pool on ARM I receive same issue.

   pool: data
     id: 12860594106812937020
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://zfsonlinux.org/msg/ZFS-8000-72
 config:

	data        FAULTED  corrupted data
	  raidz1-0  ONLINE
	    sdb2    ONLINE
	    sdc5    ONLINE
	    sdd3    ONLINE
	  raidz1-1  ONLINE
	    sdc6    ONLINE
	    sdb1    ONLINE
	    sdd4    ONLINE
	  raidz1-2  ONLINE
	    sdb3    ONLINE
	    sdd7    ONLINE
	    sdc2    ONLINE

Now I try to copy data directly from ARM but however this is a clear symptom that there is some of strange on elaborate raidz1 metadata (or data) created on amd64 environment and then import on arm (32bit).

geaaru · 2018-01-02T08:55:04Z

Also when I try to copy data directly with "cp" or I use "zfs send + zfs receive" from another pool (without raid) I receive errors on checksum:

# zpool status -v
  pool: data
 state: ONLINE
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
   see: http://zfsonlinux.org/msg/ZFS-8000-HC
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	data        ONLINE       0     0     1
	  raidz1-0  ONLINE       0     0    32
	    sdb2    ONLINE       0     0     0
	    sdc5    ONLINE       0     0     0
	    sdd3    ONLINE       0     0     0
	  raidz1-1  ONLINE       0     0    15
	    sdb3    ONLINE       0     0     0
	    sdd7    ONLINE       0     0     0
	    sdc2    ONLINE       0     0     0
	  raidz1-2  ONLINE       0     0    32
	    sdc6    ONLINE       0     0     1
	    sdb1    ONLINE       0     0     0
	    sdd4    ONLINE       0     0     0

errors: List of errors unavailable: pool I/O is currently suspended

This errors aren't visibile with some storage disks and data on amd64 arch.

On my bpi I currently limit arc_max to this value:

# cat /sys/module/zfs/parameters/zfs_arc_max 
73400320

rincebrain · 2018-01-03T07:45:37Z

@geaaru So, I just tried to reproduce this again, by making a pool on 0.7.3 amd64 out of 3 3disk raidz1 vdevs, zfs send|recv a dataset onto it, zfs rollback it, export it, then import it on my RPi with 0.7.4, and no problems.

So if you can find a way to reproduce this making a small pool and sending a small dataset, that'd be quite useful.

geaaru · 2018-01-03T09:31:49Z

@rincebrain Hi, could be related with some big files ? I'm don't understand because from my side I have this errors that you can't reproduce. In my case on pool I have big file of video camera also of 10GB.

About this, I will try to create raidz and copy only small files and then I try to reproduce for create a test case.

Thanks

geaaru · 2018-01-03T16:03:14Z

@rincebrain Maybe I'm in the right road....

If I try to create small files works fine:

(AMD64)
#$ for ((i=0; i<1000;i++)) ; do truncate -s 1K file$i ; done

(on both first level directory of the pool or on sub-dataset)
# zfs create data/test
# zfs create data/test/test
# cd /data/test/test ; for ((i=0; i<1000;i++)) ; do truncate -s 1K file$i ; done

In this case import it works fine on ARM.

Then if I try to create a bigfile on first level directory it seems works:

# dd if=/dev/zero of=bigfile bs=8k count=100000 && sync
100000+0 records in
100000+0 records out
819200000 bytes (819 MB, 781 MiB) copied, 17.4162 s, 47.0 MB/s

This script it seems reproduce always my issue:

#!/bin/bash

zpool import data
zpool destroy data

zpool create data raidz1 /dev/sdd2 /dev/sde5 /dev/sdf3  -m /zpool -f
zpool add data raidz1 /dev/sde6 /dev/sdd1 /dev/sdf4 -f
zpool add data raidz1 /dev/sdd3 /dev/sdf7 /dev/sde2 -f
zfs create data/test
zfs create data/test/test

cd /zpool/test/test

for ((i=0;i<3; i++)) ; do
        echo "Creating file bigfile$i..."
	dd if=/dev/zero of=bigfile$i bs=8k count=1000000 && sync
done

I'm not sure that problem is related with bigfile but maybe how this file is been save inside different raid block. Is there a way to analyze how a file is been splitted across pool ?

It seems that some problem is available also on execute my script directly on ARM. When script is completed correctly then is needed execute zpool export and then zpool import but then is not possible execute import of the pool.

[   73.800015] WARNING: can't open objset 85, error 5
[   73.817469] WARNING: can't open objset 134, error 5
[   83.715658] WARNING: can't open objset 85, error 5
[   83.721939] WARNING: can't open objset 134, error 5

$# zpool status

   pool: data
     id: 15605622066018315452
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

	data        ONLINE
	  raidz1-0  ONLINE
	    sdb2    ONLINE
	    sdc5    ONLINE
	    sdd3    ONLINE
	  raidz1-1  ONLINE
	    sdc6    ONLINE
	    sdb1    ONLINE
	    sdd4    ONLINE
	  raidz1-2  ONLINE
	    sdb3    ONLINE
	    sdd7    ONLINE
	    sdc2    ONLINE

# zpool  import data
cannot import 'data': one or more devices is currently unavailable

Second test:

# /test-zfs.sh 
Creating file bigfile0...
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 972.847 s, 8.4 MB/s
Creating file bigfile1...
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 972.459 s, 8.4 MB/s
Creating file bigfile2...
1000000+0 records in
1000000+0 records out
8192000000 bytes (8.2 GB, 7.6 GiB) copied, 973.682 s, 8.4 MB/s

# zpool  status
  pool: data
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	data        ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     3
	    sdb2    ONLINE       0     0     0
	    sdc5    ONLINE       0     0     0
	    sdd3    ONLINE       0     0     0
	  raidz1-1  ONLINE       0     0    11
	    sdc6    ONLINE       0     0     0
	    sdb1    ONLINE       0     0     0
	    sdd4    ONLINE       0     0     0
	  raidz1-2  ONLINE       0     0     9
	    sdb3    ONLINE       0     0     0
	    sdd7    ONLINE       0     0     0
	    sdc2    ONLINE       0     0     0

errors: No known data errors

Some errors already before zpool export with same dmesg errors:

[   73.800015] WARNING: can't open objset 85, error 5
[   73.817469] WARNING: can't open objset 134, error 5
[   83.715658] WARNING: can't open objset 85, error 5
[   83.721939] WARNING: can't open objset 134, error 5

and then:

# zpool  export data

# zpool  status
no pools available

# zpool import
   pool: data
     id: 15760088254129977136
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://zfsonlinux.org/msg/ZFS-8000-72
 config:

	data        FAULTED  corrupted data
	  raidz1-0  ONLINE
	    sdb2    ONLINE
	    sdc5    ONLINE
	    sdd3    ONLINE
	  raidz1-1  ONLINE
	    sdc6    ONLINE
	    sdb1    ONLINE
	    sdd4    ONLINE
	  raidz1-2  ONLINE
	    sdb3    ONLINE
	    sdd7    ONLINE
	    sdc2    ONLINE

Can you try if also from your side issue is reproducible?

rincebrain · 2018-01-08T02:10:46Z

@ironMann It doesn't matter whether it's set to "original", "fastest", "scalar", "superscalar"; all settings produce this outcome. I even just rechecked it with explicitly setting the module parameter on load in case somehow changing it dynamically wasn't sticking.

raidz_test -S segfaulted at ab9f4b0, which is neat. I'm going to rebuild with 0.7.5 and try again. The stacktrace I got is at https://gist.github.com/rincebrain/b811097361db0110df48a3c1b3e670ab.

nwf · 2018-01-08T02:12:37Z

@ironMann Testing on my machine, raidz_test -S is fully happy with the state of the universe, as far as I can tell, and yet I can reproduce the issue (see above) even with the "original" raidz implementation.

nwf@tinkerboard:~/zfs$ git describe
zfs-0.7.0-227-g390d679ac
nwf@tinkerboard:~/zfs$ ./cmd/raidz_test/raidz_test -S
20/176... 40/165... 60/165... 80/165... 100/165... 120/165... 140/165... 160/165...
Waiting for test threads to finish...
Sweep test succeeded on 165 raidz maps!
nwf@tinkerboard:~/zfs$ echo $?
0

I am not sure what my kernel stack size is; rummaging around in /proc/config.gz, I don't immediately see any options that would take it away from whatever the default is.

rincebrain · 2018-01-08T03:11:42Z

raidz_test -S thinks life is happy for me on 0.7.5 as well, but zpool status disagrees.

root@raspberrypi:/test-zfs# zpool status
  pool: test
 state: ONLINE
  scan: none requested
config:

        NAME                           STATE     READ WRITE CKSUM
        test                           ONLINE       0     0     0
          raidz1-0                     ONLINE       0     0     0
            /test-zfs/1.img  ONLINE       0     0     0
            /test-zfs/2.img  ONLINE       0     0     0
            /test-zfs/3.img  ONLINE       0     0     0

errors: No known data errors
root@raspberrypi:/test-zfs# zfs create test/test;zfs create test/test/test;
root@raspberrypi:/test-zfs# dd if=/dev/zero of=/test/test/test/bigfile1 bs=1M count=2000 & sleep 30 && zpool scrub test
[1] 9948
root@raspberrypi:/test-zfs# zpool status
  pool: test
 state: ONLINE
  scan: scrub in progress since Mon Jan  8 03:09:55 2018
        107M scanned out of 374M at 2.62M/s, 0h1m to go
        8.50K repaired, 28.68% done
config:

        NAME                           STATE     READ WRITE CKSUM
        test                           ONLINE       0     0     0
          raidz1-0                     ONLINE       0     0     4
            /test-zfs/1.img  ONLINE       0     0     0  (repairing)
            /test-zfs/2.img  ONLINE       0     0     0  (repairing)
            /test-zfs/3.img  ONLINE       0     0     0  (repairing)

errors: No known data errors
root@raspberrypi:/test-zfs# cat /sys/module/zfs/parameters/zfs_vdev_raidz_impl
fastest [original] scalar

ironMann · 2018-01-08T07:47:08Z

For the indication of stack size check your zfs config.log; there should be line whether kernel was built with 16K or larger stacks... I'm really not sure if 8K is large enough to run ZFS on top of another filesystem.
Can you try building zvol on top of few partitions of a USB drive instead?
I imagine that setting up raidz2 produces a similar result?
Try setting zcommon. zfs_fletcher_4_impl parameter to "scalar" as well.
If possible try reverting openzfs/spl@6ecfd2b ?

rincebrain · 2018-01-08T07:59:10Z

For the indication of stack size check your zfs config.log; there should be line whether kernel was built with 16K or larger stacks... I'm really not sure if 8K is large enough to run ZFS on top of another filesystem.

/wip/zfs-pre-debug/build/conftest.c:42:4: error: #error "THREAD_SIZE is less than 16K"

Can you try building zvol on top of few partitions of a USB drive instead?

The original reporter was doing this on real disks, I was just using flatfiles because my RPi, unlike his Banana Pi, doesn't have SATA.

I imagine that setting up raidz2 produces a similar result?

Yup.

Try setting zcommon. zfs_fletcher_4_impl parameter to "scalar" as well.

Already tried, and it still happens even with checksum=sha256 on the resulting dataset as well.

If possible try reverting openzfs/spl@6ecfd2b ?

I'll try and report back, but since that commit is from August 2017, and this repros on ab9f4b0, I'm not hopeful.

nwf · 2018-01-08T08:00:27Z

@ironMann I am testing on real disk devices (over USB), not layered atop another filesystem. I would expect stack corruption to crash the machine or exhibit other odd failure modes, not reliable corruption of on-disk data.

rincebrain · 2018-01-08T09:45:20Z

So, I'm not an expert on this code, but shouldn't this have caused some kind of error I could see without passing -vvv?

https://gist.github.com/rincebrain/c45b7663682c0a26f3e4c98f9c7152e6

e: Nevermind, -T is expected to fail all tests, that's what happens when I stay up too early working on things.

ironMann · 2018-01-08T20:52:59Z

@nwf @rincebrain
I think I see the error; it fits with the big file reproducer. Can you give the https://github.com/ironMann/zfs/tree/fix-6981 a go?

nwf · 2018-01-08T23:13:54Z

In thinking about it, the use of size_t in the structures of vdev_raidz_impl.h seems somewhere between "odd", "suspect", and "wrong". They were not originally defined so: contrast (original)

typedef struct raidz_col {
	uint64_t rc_devidx;		/* child device index for I/O */
	uint64_t rc_offset;		/* device offset */
	uint64_t rc_size;		/* I/O size */
	void *rc_data;			/* I/O data */
	void *rc_gdata;			/* used to store the "good" version */
	int rc_error;			/* I/O error for this device */
	uint8_t rc_tried;		/* Did we attempt this I/O column? */
	uint8_t rc_skipped;		/* Did we skip this I/O column? */
} raidz_col_t;

with (refactored by ab9f4b0)

typedef struct raidz_col {
	size_t rc_devidx;		/* child device index for I/O */
	size_t rc_offset;		/* device offset */
	size_t rc_size;			/* I/O size */
	void *rc_data;			/* I/O data */
	void *rc_gdata;			/* used to store the "good" version */
	int rc_error;			/* I/O error for this device */
	unsigned int rc_tried;		/* Did we attempt this I/O column? */
	unsigned int rc_skipped;	/* Did we skip this I/O column? */
} raidz_col_t;

I am deeply confused by the un-commented-upon and seemingly un-merited changes to types made as part of this refactoring; it is suggestive that preserving the original code's semantics exactly was an afterthought, not a design priority. (The structures should simply have been relocated, not mutated, if semantics preservation were a foremost concern.) Since none of these fields are, in fact, sizes of objects in memory (i.e. size_t's meaning), with the possible exception of rc_size (since I/O must fit in memory, I suppose), should we revert all of the following fields to other types?

struct raidz_col
- rc_devidx should be uint32_t or uint64_t or similar?
- rc_offset should be uint64_t as done in fix-6981.
- rc_size ~~might justifiably be size_t~~ (ETA: no way, since SIZE_MAX is 65535), but uint64_t (as it was) seems better.
struct raidz_map
- rm_cols should be uint32_t for alignment, though I assume even uint8_t would be large enough (it was, originally, a uint64_t)
- rm_scols likewise
- rm_bigcols likewise
- rm_asize should be uint64_t as it was
- rm_missingdata should be as rm_cols.
- rm_missingparity likewise
- rm_firstdatacol likewise
- rm_nskip should probably be uint32_t
- rm_skipstart is a column index, and so should be as rm_cols.
- rm_reports was uintptr_t, which seems like the right answer to me.

I would like to see the corresponding parts of ab9f4b0 reverted entirely (reverting to uintN_t everywhere that matters, not just for some fields as in fix-6981). If there is strong reason to minimize the memory footprint of struct raidz_map, the following definition could, I think, be used instead; effort has been taken to minimize structure padding.

typedef struct raidz_map {
	uint64_t rm_asize;		/* Actual total I/O size */
	uintptr_t rm_reports;		/* # of referencing checksum reports */
	abd_t *rm_abd_copy;		/* rm_asize-buffer of copied data */
	raidz_impl_ops_t *rm_ops;	/* RAIDZ math operations */
	uint32_t rm_nskip;		/* Skipped sectors for padding */
	uint8_t rm_cols;			/* Regular column count */
	uint8_t rm_scols;		/* Count including skipped columns */
	uint8_t rm_bigcols;		/* Number of oversized columns */
	uint8_t rm_missingdata;		/* Count of missing data devices */
	uint8_t rm_missingparity;	/* Count of missing parity devices */
	uint8_t rm_firstdatacol;		/* First data column/parity count */
	uint8_t rm_skipstart;		/* Column index of padding start */
	uint8_t rm_freed;		/* map no longer has referencing ZIO */
	uint8_t rm_ecksuminjected;	/* checksum error was injected */
	uint8_t rm_pad1;
	uint8_t rm_pad2;
	uint8_t rm_pad3;
	raidz_col_t rm_col[1];		/* Flexible array of I/O columns */
} raidz_map_t;

rincebrain · 2018-01-08T23:45:33Z

@ironMann I can't reproduce it any more with that partial revert; that said, unless there's some specific reason for the sweeping type changes, @nwf's suggestion to revert them all seems pretty reasonable.

As part of the refactoring of ab9f4b0, several uint64_t-s and uint8_t-s were changed to other types. This caused ZoL github issue openzfs#6981, an overflow of a size_t on a 32-bit ARM machine. In absense of any strong motivation for the type changes, this simply puts them back, modulo the changes accumulated for ABD. Compile-tested on amd64 and run-tested on armhf. Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu> Fixes: openzfs#6981

nwf · 2018-01-09T03:10:56Z

The patch above fixes this issue for me and should cause no badness elsewhere. Assuming that's true, I'd like to propose it for inclusion in the next 0.7 point release as well as on master.

@behlendorf, could we get an ARM32 testbot to go along with the existing buildbot? It'd be nice to know things were being looked over automagically. :)

behlendorf · 2018-01-09T22:20:29Z

Yup, we'll get this fix in to 0.7.6 and master. As for adding a buildbot for ARM32 I'll see what can be done, it certainly would be nice.

As part of the refactoring of ab9f4b0, several uint64_t-s and uint8_t-s were changed to other types. This caused ZoL github issue #6981, an overflow of a size_t on a 32-bit ARM machine. In absense of any strong motivation for the type changes, this simply puts them back, modulo the changes accumulated for ABD. Compile-tested on amd64 and run-tested on armhf. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu> Closes #6981 Closes #7023

As part of the refactoring of ab9f4b0, several uint64_t-s and uint8_t-s were changed to other types. This caused ZoL github issue openzfs#6981, an overflow of a size_t on a 32-bit ARM machine. In absense of any strong motivation for the type changes, this simply puts them back, modulo the changes accumulated for ABD. Compile-tested on amd64 and run-tested on armhf. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu> Closes openzfs#6981 Closes openzfs#7023

As part of the refactoring of ab9f4b0, several uint64_t-s and uint8_t-s were changed to other types. This caused ZoL github issue #6981, an overflow of a size_t on a 32-bit ARM machine. In absense of any strong motivation for the type changes, this simply puts them back, modulo the changes accumulated for ABD. Compile-tested on amd64 and run-tested on armhf. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu> Closes #6981 Closes #7023

As part of the refactoring of ab9f4b0, several uint64_t-s and uint8_t-s were changed to other types. This caused ZoL github issue openzfs#6981, an overflow of a size_t on a 32-bit ARM machine. In absense of any strong motivation for the type changes, this simply puts them back, modulo the changes accumulated for ABD. Compile-tested on amd64 and run-tested on armhf. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Gvozden Neskovic <neskovic@gmail.com> Signed-off-by: Nathaniel Wesley Filardo <nwf@cs.jhu.edu> Closes openzfs#6981 Closes openzfs#7023

loli10K added the Type: Architecture Indicates an issue is specific to a single processor architecture label Dec 18, 2017

nwf mentioned this issue Jan 9, 2018

Revert raidz_map and _col structure types #7023

Merged

9 tasks

behlendorf closed this as completed in #7023 Jan 9, 2018

geaaru mentioned this issue Feb 13, 2018

ARM 32: Error on import pool with 0.7.6 #7165

Closed

rincebrain mentioned this issue Mar 1, 2019

Random checksum errors on raidz1 and mirror pools -- It's not a hardware fault #5018

Closed

ARM 32bit - Data corrupted on import raidz pool on zfs-0.7.4 #6981

ARM 32bit - Data corrupted on import raidz pool on zfs-0.7.4 #6981

Comments

geaaru commented Dec 18, 2017 • edited Loading

System information

Describe the problem you're observing

Describe how to reproduce the problem

geaaru commented Dec 18, 2017

rincebrain commented Dec 18, 2017 • edited Loading

geaaru commented Dec 18, 2017

rincebrain commented Dec 18, 2017

geaaru commented Dec 18, 2017 • edited Loading

geaaru commented Dec 18, 2017

rincebrain commented Dec 18, 2017

geaaru commented Dec 18, 2017

rincebrain commented Dec 19, 2017

geaaru commented Dec 19, 2017

geaaru commented Dec 20, 2017

geaaru commented Dec 20, 2017

behlendorf commented Dec 20, 2017

loli10K commented Dec 20, 2017

geaaru commented Dec 21, 2017

rincebrain commented Dec 21, 2017

geaaru commented Dec 21, 2017 • edited Loading

geaaru commented Dec 21, 2017

geaaru commented Dec 23, 2017

rincebrain commented Dec 23, 2017

geaaru commented Dec 28, 2017 • edited Loading

geaaru commented Jan 1, 2018

rincebrain commented Jan 1, 2018

geaaru commented Jan 1, 2018

geaaru commented Jan 1, 2018

geaaru commented Jan 2, 2018

rincebrain commented Jan 3, 2018

geaaru commented Jan 3, 2018

geaaru commented Jan 3, 2018

rincebrain commented Jan 8, 2018

nwf commented Jan 8, 2018

rincebrain commented Jan 8, 2018 • edited Loading

ironMann commented Jan 8, 2018

rincebrain commented Jan 8, 2018 • edited Loading

nwf commented Jan 8, 2018

rincebrain commented Jan 8, 2018 • edited Loading

ironMann commented Jan 8, 2018

nwf commented Jan 8, 2018 • edited Loading

rincebrain commented Jan 8, 2018

nwf commented Jan 9, 2018

behlendorf commented Jan 9, 2018

geaaru commented Dec 18, 2017 •

edited

Loading

rincebrain commented Dec 18, 2017 •

edited

Loading

geaaru commented Dec 18, 2017 •

edited

Loading

geaaru commented Dec 21, 2017 •

edited

Loading

geaaru commented Dec 28, 2017 •

edited

Loading

rincebrain commented Jan 8, 2018 •

edited

Loading

rincebrain commented Jan 8, 2018 •

edited

Loading

rincebrain commented Jan 8, 2018 •

edited

Loading

nwf commented Jan 8, 2018 •

edited

Loading