New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data corruption in cluster environment with shared storage on ZOL 0.7.0-rc5 and above #6603
Comments
|
Can you pull git head and try again? This might be something fixed with f763c3d |
|
@arturpzol can you describe the corruption you're able to reproduce. This potentially could be related to If you're not already aware of it I'd also suggest you enable the new multihost feature when running in a failover environment. |
|
I tried git head source but corruption occurred. Also zfs-0.7.1 with patched source using only f763c3d but again corruption occurred. With set Environment description: Two physical or virtual machines with shared 2 disks. Zpool is created using shared disks, zpool uses only one mirror vdevs and with two zvols (one has volblocksize=8k, second has volblocksize=128k). Test description: Test uses bst5 to write data to iSCSI connected storage from Windows OS. During write I force power off node which currently has pool imported. Second node takeover the cluster resource by importing pool and configuring SCST to share zvol as iSCSI LUNs. Bst5 is able to continue write to LUN's without braking the test. When sequential write finishes I wait until bst5 tool read data with compare. Now bst5 reports data on mismatch error when it read data with compare on LUN which has zvol with volblocksize=128k. For LUN which has zvol with volblocksize=8k corruption is not reported. I also eliminate cluster enviroment with single node which is force rebooted and automatically import the pool and configuring SCST to share zvol as iSCSI LUNs but again corruption occurred. Is it safe to revert 1b7c1e5 and using it with ZOL 0.7.1? |
|
I set write tool to save data with blocksize 128kb and above on zvol with volblocksize=128k so second condition from source code: is true for my test so write_state is set to WR_INDIRECT and I assume that it cause the corruption. Debug showed: If
If
If
so each time second condition is true and also corruption occurred. If
so third condition is true and corruption did not occur. Should |
|
@arekinath thanks for the additional information. I'm investigating reproducing this issue with a simpler test case to further debug it. |
|
After perform test scenario that caused corruption with below changes: --- zfs/zvol.c (revision 47737)
+++ zfs/zvol.c (working copy)
@@ -684,13 +684,13 @@
if (zil_replaying(zilog, tx))
return;
- if (zilog->zl_logbias == ZFS_LOGBIAS_THROUGHPUT)
+ if (sync)
+ write_state = WR_COPIED;
+ else if (zilog->zl_logbias == ZFS_LOGBIAS_THROUGHPUT)
write_state = WR_INDIRECT;
else if (!spa_has_slogs(zilog->zl_spa) &&
size >= blocksize && blocksize > zvol_immediate_write_sz)
write_state = WR_INDIRECT;
- else if (sync)
- write_state = WR_COPIED;
else
write_state = WR_NEED_COPY;issue did not occur anymore for different volblocksize and write blocksize. |
|
@arturpzol that's one way to side-step the issue for the moment, it effectively disabes WR_INDIRECT log records when |
The portion of the zvol_replay_write() handler responsible for replaying indirect log records for some reason never existed. As a result indirect log records were not being correctly replayed. This went largely unnoticed since the majority of zvol log records were of the type WR_COPIED or WR_NEED_COPY prior to OpenZFS 7578. This patch updates zvol_replay_write() to correctly handle these log records and adds a new test case which verifies volume replay to prevent any regression. The existing test case which verified replay on filesystem was renamed slog_replay_fs.ksh for clarity. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#6603
|
@arturpzol I've open #6615 which fixes the root cause of this issue and adds a test case to prevent any regression. I'd appreciate it if you could also verify the fix in your environment. Thanks! |
|
@behlendorf seems that fix works. I performed all tests with different volblocksize and write block size and corruption did not occur. Thanks for fix. |
The portion of the zvol_replay_write() handler responsible for replaying indirect log records for some reason never existed. As a result indirect log records were not being correctly replayed. This went largely unnoticed since the majority of zvol log records were of the type WR_COPIED or WR_NEED_COPY prior to OpenZFS 7578. This patch updates zvol_replay_write() to correctly handle these log records and adds a new test case which verifies volume replay to prevent any regression. The existing test case which verified replay on filesystem was renamed slog_replay_fs.ksh for clarity. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs#6603
The portion of the zvol_replay_write() handler responsible for replaying indirect log records for some reason never existed. As a result indirect log records were not being correctly replayed. This went largely unnoticed since the majority of zvol log records were of the type WR_COPIED or WR_NEED_COPY prior to OpenZFS 7578. This patch updates zvol_replay_write() to correctly handle these log records and adds a new test case which verifies volume replay to prevent any regression. The existing test case which verified replay on filesystem was renamed slog_replay_fs.ksh for clarity. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#6603
|
@arturpzol thanks for reporting this and verifying the fix. We'll get this in to the zfs-0.7-release branch for the next point release. |
The portion of the zvol_replay_write() handler responsible for replaying indirect log records for some reason never existed. As a result indirect log records were not being correctly replayed. This went largely unnoticed since the majority of zvol log records were of the type WR_COPIED or WR_NEED_COPY prior to OpenZFS 7578. This patch updates zvol_replay_write() to correctly handle these log records and adds a new test case which verifies volume replay to prevent any regression. The existing test case which verified replay on filesystem was renamed slog_replay_fs.ksh for clarity. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #6603
The portion of the zvol_replay_write() handler responsible for replaying indirect log records for some reason never existed. As a result indirect log records were not being correctly replayed. This went largely unnoticed since the majority of zvol log records were of the type WR_COPIED or WR_NEED_COPY prior to OpenZFS 7578. This patch updates zvol_replay_write() to correctly handle these log records and adds a new test case which verifies volume replay to prevent any regression. The existing test case which verified replay on filesystem was renamed slog_replay_fs.ksh for clarity. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: loli10K <ezomori.nozomu@gmail.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#6603
System information
Describe the problem you're observing
I experienced data corruption in cluster environment (corosync, pacemaker) with shared storage after force power off one of the cluster node (tested on kvm, vmware and real hardware).
I have one pool:
with one zvol (primarycache=metadata, sync=always, logbias=throughput) which is shared to client host.
After force power off one of the cluster node, second node takes over the resource and data corruption on zvol can be observed.
I tested all 0.7.0 rc versions and seems that changes in 0.7.0-rc5 had impact on synchronization. After revert that commit 1b7c1e5 corruption did not occur anymore.
Additional I tried different volblocksize for zvol and seems that only volume with 64k and 128k block size has something broken with synchronization.
If I add ZIL to the pool, corruption also did not happen.
I reported that bug also on #3577 but after deeper analysis I think that it is different bug
The text was updated successfully, but these errors were encountered: