BRT: Fix FICLONE/FICLONERANGE shortened copy #15842

behlendorf · 2024-02-01T00:17:38Z

Motivation and Context

Issue #15728 describes an issue with the Linux integration of BRT which needs to be resolved.

Description

On Linux the ioctl_ficlonerange() and ioctl_ficlone() system calls are expected to either fully clone the specified range or return an error. The range may be for an entire file. While internally ZFS supports cloning partial ranges there's no way to return the length cloned to the caller so we need to make this all or nothing.

As part of this change support for the REMAP_FILE_CAN_SHORTEN flag has been added. When REMAP_FILE_CAN_SHORTEN is set zfs_clone_range() will return a shortened range when encountering pending dirty records. When it's clear zfs_clone_range() will block and wait for the records to be written out allowing the blocks to be cloned.

Furthermore, the file rangelock is held over the region being cloned to prevent it from being modified while cloning. This doesn't quite provide an atomic semantics since if an error is encountered only a portion of the range may be cloned. This will be converted to an error if REMAP_FILE_CAN_SHORTEN was not provided and returned to the caller. However, the destination file range is left in an undefined state.

A test case has been added which exercises this functionality by verifying that cp --reflink=never|auto|always works correctly.

How Has This Been Tested?

A test case has been added which is modeled after the original reproducer from #15728. Without this change applied the test fails, with the test passes.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

module/zfs/zfs_vnops.c

tests/zfs-tests/tests/functional/cp_files/cp_files_002_pos.ksh

robn · 2024-02-01T21:13:34Z

Any reason for cp --reflink vs clonefile? Not that I'm precious about it; its just there to ensure we get exactly what we want, rather than whatever the local cp of the day does.

behlendorf · 2024-02-01T22:00:35Z

Any reason for cp --reflink vs clonefile? Not that I'm precious about it; its just there to ensure we get exactly what we want, rather than whatever the local cp of the day does.

That's a good point. At the time I was concerned about cp working as expected so I wanted to test that. But you're right, clonefile would be more precise and portable. Let me play with that a bit.

behlendorf · 2024-02-02T00:39:28Z

After giving this some more thought I've updated the PR with the following changes.

Added a new zfs_bclone_wait_dirty kmod tunable to control the behavior where there are dirty blocks. Mainly I added this to provide an easy way to compare the performance between waiting on dirty blocks and instead returning and error and making the copy.
Switched the default behavior to not wait on dirty blocks, zfs_bclone_wait_dirty=0. EINVAL is returned when FICLONE or FICLONERANGE can't clone the entire range due to dirty blocks.
Updated the test case to verify both the zfs_bclone_wait_dirty=0 and zfs_bclone_wait_dirty=1 behavior. The test still uses cp --reflink, we should probably consider adding another test which used clonefile.
Restructured how the wait_dirty flag is passed. Really we don't need to pass it at all, but could simply check zfs_bclone_wait_dirty in zfs_clone_range if there's no good real to let the caller control this. For now I left it to get feedback.

rrevans

I patched this and confirmed it fixes the original reproducer case.

tests/zfs-tests/tests/functional/cp_files/cp_files_002_pos.ksh

module/os/linux/zfs/zpl_file_range.c

behlendorf · 2024-02-02T18:45:24Z

Refreshed. The updated version addresses all of the outstanding feedback. I also simplified the patch to no longer allow the caller to pass zfs_bclone_wait_dirty. If at some point in the future we decide this is needed we can always add it.

rrevans · 2024-02-05T12:47:36Z

The updated version addresses all of the outstanding feedback.

Thanks the patch looks very good to me.

On Linux the ioctl_ficlonerange() and ioctl_ficlone() system calls are expected to either fully clone the specified range or return an error. The range may be for an entire file. While internally ZFS supports cloning partial ranges there's no way to return the length cloned to the caller so we need to make this all or nothing. As part of this change support for the REMAP_FILE_CAN_SHORTEN flag has been added. When REMAP_FILE_CAN_SHORTEN is set zfs_clone_range() will return a shortened range when encountering pending dirty records. When it's clear zfs_clone_range() will block and wait for the records to be written out allowing the blocks to be cloned. Furthermore, the file rangelock is held over the region being cloned to prevent it from being modified while cloning. This doesn't quite provide an atomic semantics since if an error is encountered only a portion of the range may be cloned. This will be converted to an error if REMAP_FILE_CAN_SHORTEN was not provided and returned to the caller. However, the destination file range is left in an undefined state. A test case has been added which exercises this functionality by verifying that `cp --reflink=never|auto|always` works correctly. Signed-off-by: Brian D Behlendorf <behlendo@slag12.llnl.gov> Issue openzfs#15728

module/os/freebsd/zfs/zfs_vfsops.c

Relocate declaration of zfs_bclone_enabled and zfs_bclone_wait_dirty to the platform independant code. Add some additional documention to these tunables at the same time. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>

module/zfs/zfs_vnops.c

Alexandero89 · 2024-02-11T14:17:39Z

@behlendorf could you maybe check the github actions. There seems something went wrong.

buildbot/CentOS 8 x86_64 (TEST) is stated as "Build started" but when you click on Details it has already finished long ago
Cleanup action run out of space in the last run (see here)

On Linux the ioctl_ficlonerange() and ioctl_ficlone() system calls are expected to either fully clone the specified range or return an error. The range may be for an entire file. While internally ZFS supports cloning partial ranges there's no way to return the length cloned to the caller so we need to make this all or nothing. As part of this change support for the REMAP_FILE_CAN_SHORTEN flag has been added. When REMAP_FILE_CAN_SHORTEN is set zfs_clone_range() will return a shortened range when encountering pending dirty records. When it's clear zfs_clone_range() will block and wait for the records to be written out allowing the blocks to be cloned. Furthermore, the file range lock is held over the region being cloned to prevent it from being modified while cloning. This doesn't quite provide an atomic semantics since if an error is encountered only a portion of the range may be cloned. This will be converted to an error if REMAP_FILE_CAN_SHORTEN was not provided and returned to the caller. However, the destination file range is left in an undefined state. A test case has been added which exercises this functionality by verifying that `cp --reflink=never|auto|always` works correctly. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #15728 Closes #15842

On Linux the ioctl_ficlonerange() and ioctl_ficlone() system calls are expected to either fully clone the specified range or return an error. The range may be for an entire file. While internally ZFS supports cloning partial ranges there's no way to return the length cloned to the caller so we need to make this all or nothing. As part of this change support for the REMAP_FILE_CAN_SHORTEN flag has been added. When REMAP_FILE_CAN_SHORTEN is set zfs_clone_range() will return a shortened range when encountering pending dirty records. When it's clear zfs_clone_range() will block and wait for the records to be written out allowing the blocks to be cloned. Furthermore, the file range lock is held over the region being cloned to prevent it from being modified while cloning. This doesn't quite provide an atomic semantics since if an error is encountered only a portion of the range may be cloned. This will be converted to an error if REMAP_FILE_CAN_SHORTEN was not provided and returned to the caller. However, the destination file range is left in an undefined state. A test case has been added which exercises this functionality by verifying that `cp --reflink=never|auto|always` works correctly. Reviewed-by: Alexander Motin <mav@FreeBSD.org> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#15728 Closes openzfs#15842

behlendorf added the Status: Code Review Needed Ready for review and testing label Feb 1, 2024

behlendorf mentioned this pull request Feb 1, 2024

BRT: Linux FICLONE truncates large files with dirty blocks #15728

Closed

behlendorf force-pushed the issue-15728 branch from 40667cd to 45d016a Compare February 1, 2024 01:06

behlendorf mentioned this pull request Feb 1, 2024

xfstests: add zfs support to latest xfstests #5481

Open

rrevans reviewed Feb 1, 2024

View reviewed changes

behlendorf force-pushed the issue-15728 branch from 45d016a to 46e82de Compare February 2, 2024 00:24

rrevans reviewed Feb 2, 2024

View reviewed changes

tests/zfs-tests/tests/functional/cp_files/cp_files_002_pos.ksh Outdated Show resolved Hide resolved

tests/zfs-tests/tests/functional/cp_files/cp_files_002_pos.ksh Show resolved Hide resolved

module/os/linux/zfs/zpl_file_range.c Outdated Show resolved Hide resolved

Vlad1mir-D mentioned this pull request Feb 2, 2024

zfs-2.2.3 patchset #15836

Merged

13 tasks

behlendorf force-pushed the issue-15728 branch from 46e82de to 4cc5f57 Compare February 2, 2024 18:36

behlendorf force-pushed the issue-15728 branch from 4cc5f57 to c5a762f Compare February 2, 2024 19:13

behlendorf force-pushed the issue-15728 branch from c5a762f to 26128c6 Compare February 5, 2024 17:52

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Feb 5, 2024

amotin reviewed Feb 5, 2024

View reviewed changes

module/os/freebsd/zfs/zfs_vfsops.c Outdated Show resolved Hide resolved

amotin approved these changes Feb 5, 2024

View reviewed changes

module/zfs/zfs_vnops.c Show resolved Hide resolved

tonyhutter mentioned this pull request Feb 5, 2024

Version update for Kernel 6.7 compatibility #15759

Open

behlendorf merged commit 6dccdf5 into openzfs:master Feb 6, 2024
23 of 25 checks passed

lundman mentioned this pull request Feb 14, 2024

cargo build breaks with "resource temporarily unavailable (error 35)" (I'm assuming EAGAIN) on v2.2.0 and v2.2.2 openzfsonosx/zfs#809

Open

rrevans mentioned this pull request Mar 14, 2024

zfs_bclone_wait_dirty=1 broken for files with unallocated blocks at the end #15994

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BRT: Fix FICLONE/FICLONERANGE shortened copy #15842

BRT: Fix FICLONE/FICLONERANGE shortened copy #15842

behlendorf commented Feb 1, 2024

robn commented Feb 1, 2024

behlendorf commented Feb 1, 2024

behlendorf commented Feb 2, 2024

rrevans left a comment

behlendorf commented Feb 2, 2024

rrevans commented Feb 5, 2024

Alexandero89 commented Feb 11, 2024

BRT: Fix FICLONE/FICLONERANGE shortened copy #15842

BRT: Fix FICLONE/FICLONERANGE shortened copy #15842

Conversation

behlendorf commented Feb 1, 2024

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

robn commented Feb 1, 2024

behlendorf commented Feb 1, 2024

behlendorf commented Feb 2, 2024

rrevans left a comment

Choose a reason for hiding this comment

behlendorf commented Feb 2, 2024

rrevans commented Feb 5, 2024

Alexandero89 commented Feb 11, 2024