zpool scrub / status fails to address [read?] errors to disks #159

jessegit · 2013-01-03T19:36:42Z

I have a backup/archive pool (single raidz1 with 5 disks) that has a disk that is failing. As the pool is mostly only written to and the data can be recollected in a case of failures in pool rebuild, I've been running zpool scrub to fix data errors on the broken disk to somewhat guard against a loss of some of the other disks.

After new writes to the pool, scrub always has some read or checksum errors.

The pool used give errors under Solaris 11:

NAME        STATE     READ WRITE CKSUM
    c0t4d0  ONLINE      12     0    14  458K repaired

All the scrubs on the pool with Solaris 11 have returned read erros with occasional checksum errors to the disk.

Previously, on smartos, I've gotten:

NAME        STATE     READ WRITE CKSUM
    c0t4d0  ONLINE       0     0     2
and
    c0t4d0  ONLINE       0     0     6

with the total amount of corrected data before the vdevs, like:

  scan: scrub repaired 15.3M in 9h54m with 0 errors on Thu Jan  3 12:22:57 2013

The latest scrub I ran was on smartos 20121018T224723Z, but it reported (in addition to the scan: line above):

NAME        STATE     READ WRITE CKSUM
    c0t4d0  ONLINE       0     0     0

which makes me suspect scrub is not counting read error counts properly, or zpool status is not reporting them properly.

It would also be nice to assign the amount of corrected data to disks in addition of the total, like Solaris 11.

The text was updated successfully, but these errors were encountered:

psy0rz · 2013-09-26T14:38:51Z

can this problem be related to it?: https://gist.github.com/psy0rz/6714836

note that this is the latest smartos release

alcir · 2013-09-26T14:46:32Z

Or to this: #259

jessegit · 2013-09-26T15:29:23Z

I experienced same as https://gist.github.com/psy0rz/6714836 (on another box, not related to the problem described above) and it was fixed with dump zvol destroy/recreate, like described in #259 (my guess: on-disk format changed between os versions so the checksums are wrong)

This bug, however, stems from a real hardware problem which is not addressed to the zpool output. The total bytes fixed is shown, but none of the disks in the pool are assigned any read (nor checksum) errors. (it's also possible that sometimes the checksum errors are not assigned to any disk, but at least that works sometimes as seen above)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zpool scrub / status fails to address [read?] errors to disks #159

zpool scrub / status fails to address [read?] errors to disks #159

jessegit commented Jan 3, 2013

psy0rz commented Sep 26, 2013

alcir commented Sep 26, 2013

jessegit commented Sep 26, 2013

zpool scrub / status fails to address [read?] errors to disks #159

zpool scrub / status fails to address [read?] errors to disks #159

Comments

jessegit commented Jan 3, 2013

psy0rz commented Sep 26, 2013

alcir commented Sep 26, 2013

jessegit commented Sep 26, 2013