-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Debian Jessie |
| Distribution Version | 8 |
| Linux Kernel | 4.4.45, 3.10 |
| Architecture | x86_64 |
| ZFS Version | 0.7-rc5 and above |
| SPL Version | 0.7.1-1 |
Describe the problem you're observing
I experienced data corruption in cluster environment (corosync, pacemaker) with shared storage after force power off one of the cluster node (tested on kvm, vmware and real hardware).
I have one pool:
zpool status
pool: Pool-0
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
Pool-0 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-4 ONLINE 0 0 0
scsi-0QEMU_QEMU_HARDDISK_drive-scsi3-0-3 ONLINE 0 0 0
with one zvol (primarycache=metadata, sync=always, logbias=throughput) which is shared to client host.
After force power off one of the cluster node, second node takes over the resource and data corruption on zvol can be observed.
I tested all 0.7.0 rc versions and seems that changes in 0.7.0-rc5 had impact on synchronization. After revert that commit 1b7c1e5 corruption did not occur anymore.
Additional I tried different volblocksize for zvol and seems that only volume with 64k and 128k block size has something broken with synchronization.
If I add ZIL to the pool, corruption also did not happen.
I reported that bug also on #3577 but after deeper analysis I think that it is different bug