Skip to content

PANIC: blkptr at [...] has invalid COMPRESS 30 #4467

@gordan-bobic

Description

@gordan-bobic

The call trace is here:

[24313.528339] PANIC: blkptr at ffffffdd81f6c240 has invalid COMPRESS 30
[24313.529681] Showing stack for process 1431
[24313.530980] CPU: 6 PID: 1431 Comm: txg_sync Tainted: P OE 4.4.6-2.el7.aarch64 #2
[24313.532315] Hardware name: AppliedMicro Mustang/Mustang, BIOS 1.1.0 Feb 22 2016
[24313.533646] Call trace:
[24313.534948] [] dump_backtrace+0x0/0x17c
[24313.536258] [] show_stack+0x24/0x2c
[24313.537564] [] dump_stack+0x90/0xb4
[24313.538886] [] spl_dumpstack+0x44/0x60 [spl]
[24313.540199] [] vcmn_err+0xb8/0x108 [spl]
[24313.541650] [] zfs_panic_recover+0x88/0x9c [zfs]
[24313.543094] [] zfs_blkptr_verify+0x2b4/0x348 [zfs]
[24313.544535] [] zio_read+0x54/0xec [zfs]
[24313.545977] [] dsl_scan_scrub_cb+0x40c/0x4bc [zfs]
[24313.547439] [] dsl_scan_visitbp+0x318/0x878 [zfs]
[24313.548908] [] dsl_scan_visitbp+0x4a8/0x878 [zfs]
[24313.550367] [] dsl_scan_visitbp+0x284/0x878 [zfs]
[24313.551819] [] dsl_scan_visitbp+0x284/0x878 [zfs]
[24313.553257] [] dsl_scan_visitbp+0x284/0x878 [zfs]
[24313.554694] [] dsl_scan_visitbp+0x284/0x878 [zfs]
[24313.556131] [] dsl_scan_visitbp+0x284/0x878 [zfs]
[24313.557566] [] dsl_scan_visitbp+0x284/0x878 [zfs]
[24313.558999] [] dsl_scan_visitbp+0x618/0x878 [zfs]
[24313.560437] [] dsl_scan_visitds+0xd4/0x458 [zfs]
[24313.561855] [] dsl_scan_sync+0x338/0xb2c [zfs]
[24313.563259] [] spa_sync+0x324/0x9e0 [zfs]
[24313.564653] [] txg_sync_thread+0x32c/0x5b8 [zfs]
[24313.565887] [] thread_generic_wrapper+0x74/0x88 [spl]
[24313.567101] [] kthread+0xe8/0xfc
[24313.568302] [] ret_from_fork+0x10/0x40

Configuring the zfs module with zfs_recover=1 turns downgrades the panic to a warning and everything seems to happily continue. With the pool in question, this happens both on:
aarch64, Kernel 4.4.6, ZoL 0.6.5.5
x86-64, Kernel 3.18.29, ZoL 0.6.5.6

In both cases with zfs_recover=1, scrub completes successfully without any errors detected on any of the disks.

There seem to be a total of 130 block pointers that have the invalid COMPRESS=30. I have not been able to detect any actual data or metadata damage with zfs_recover=1, but I would prefer to stop having to run this pool with zfs_recover=1 permanently enabled.

  1. Is there a way to identify which files / file systems are affected by the block pointers listed in the logs? I am hoping I could simply remove them and restore those files from a backup to cure the issue.
  2. If this is a sanity check failure, should the scrub not fix it by resetting the COMPRESS value to a sane value? At the moment scrub happily completes without detecting any errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions