Skip to content

Data corruption in cluster environment with shared storage on ZOL 0.7.0-rc5 and above #6603

@arturpzol

Description

@arturpzol

System information

Type Version/Name
Distribution Name Debian Jessie
Distribution Version 8
Linux Kernel 4.4.45, 3.10
Architecture x86_64
ZFS Version 0.7-rc5 and above
SPL Version 0.7.1-1

Describe the problem you're observing

I experienced data corruption in cluster environment (corosync, pacemaker) with shared storage after force power off one of the cluster node (tested on kvm, vmware and real hardware).

I have one pool:

zpool status
  pool: Pool-0
 state: ONLINE
  scan: none requested
config:

        NAME                                          STATE     READ WRITE CKSUM
        Pool-0                                        ONLINE       0     0     0
          mirror-0                                    ONLINE       0     0     0
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-4  ONLINE       0     0     0
            scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-3  ONLINE       0     0     0

with one zvol (primarycache=metadata, sync=always, logbias=throughput) which is shared to client host.

After force power off one of the cluster node, second node takes over the resource and data corruption on zvol can be observed.

I tested all 0.7.0 rc versions and seems that changes in 0.7.0-rc5 had impact on synchronization. After revert that commit 1b7c1e5 corruption did not occur anymore.

Additional I tried different volblocksize for zvol and seems that only volume with 64k and 128k block size has something broken with synchronization.
If I add ZIL to the pool, corruption also did not happen.

I reported that bug also on #3577 but after deeper analysis I think that it is different bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions