Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upSparse-file handling for the Linux implementation of std::fs::copy() #55909
Conversation
tarka
added some commits
Nov 3, 2018
rust-highfive
assigned
dtolnay
Nov 13, 2018
This comment has been minimized.
This comment has been minimized.
|
r? @dtolnay (rust_highfive has picked a reviewer for you, use r? to override) |
rust-highfive
added
the
S-waiting-on-review
label
Nov 13, 2018
This comment has been minimized.
This comment has been minimized.
|
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
This comment has been minimized.
This comment has been minimized.
|
I can‘t really review right now since I‘m on vacation. |
This comment has been minimized.
This comment has been minimized.
|
Reassigning because I am swamped this week: |
rust-highfive
assigned
alexcrichton
and unassigned
dtolnay
Nov 15, 2018
This comment has been minimized.
This comment has been minimized.
the8472
commented
Nov 19, 2018
|
Afaik every call to |
This comment has been minimized.
This comment has been minimized.
the8472
commented
Nov 19, 2018
|
I have posted some ideas on internals how this could be optimized further, but that's out of scope for this PR. |
This comment has been minimized.
This comment has been minimized.
Good point, I've cleaned it up. |
the8472
reviewed
Nov 20, 2018
|
|
||
| // Slightly modified version of io::copy() that only copies a set amount of bytes. | ||
| fn copy_bytes_uspace(mut reader: &File, mut writer: &File, nbytes: usize) -> io::Result<u64> { | ||
| const BLKSIZE: usize = 4 * 1024; // Assume 4k blocks on disk. |
This comment has been minimized.
This comment has been minimized.
the8472
Nov 20, 2018
This is a reasonable default for io::copy that handles generic Read and Write types but for fs::copy we know we're doing syscalls and can probably do better. E.g. gnu cp has bloated up over the years and now does 128KiB or stat.blksize, whichever is larger.
This is probably due to ever-growing throughput (thus more syscalls per unit of time) and larger true block sizes, e.g. on flash and raid storage.
This comment has been minimized.
This comment has been minimized.
the8472
Nov 20, 2018
Oh, and of course it only does this for large files.
On the other hand the rust std default buffer size was dropped to rust 8k in #32695 due to problems with jemalloc. But that might not be relevant here since jemalloc is not the default anymore and this would only apply to copying large files.
This comment has been minimized.
This comment has been minimized.
the8472
commented
Nov 20, 2018
|
GNU cp does |
This comment has been minimized.
This comment has been minimized.
|
Thanks for the PR @tarka! Unfortunately I'm very swamped right now and this is a quite a large amount of code that I won't have time to review. Others from @rust-lang/libs may be interested in this as well. I'm not entirely certain if we want to have such complicated code for this in the standard library, this may be crossing into the boundary of "should start on crates.io and maybe move some time afterwards". If no one else gets around to this I will try to get around to it in the coming weeks |
the8472
reviewed
Nov 20, 2018
| let out_meta = outfd.metadata()?; | ||
|
|
||
| let (is_sparse, is_xmount) = copy_parms(&in_meta, &out_meta)?; | ||
| let uspace = is_xmount; |
This comment has been minimized.
This comment has been minimized.
the8472
Nov 20, 2018
This check may soon be too strict in new kernels, cross-device clone_file_range might become supported e.g. between NFS mounts or multiple subvolumes of btrfs systems.
Instead one should always try copy_file_range and fall back to regular copy on the first EXDEV.
This comment has been minimized.
This comment has been minimized.
|
Thanks @alexcrichton, understood. @the8472 Thanks for the pointers. Based on Alex's comments I might hold off on any changes until it's decided if this is appropriate for std. That said, even if this is accepted into std I think there is a place for an external library with more advanced features and tunables. In particular with xcp I had to duplicate most of what was in |
tarka
added a commit
to tarka/fs-extra
that referenced
this pull request
Nov 22, 2018
TimNN
added
A-allocators
and removed
A-allocators
labels
Nov 27, 2018
This comment has been minimized.
This comment has been minimized.
|
Ping from triage. @rust-lang/libs can someone review this? |
This comment has been minimized.
This comment has been minimized.
|
r? @sfackler |
rust-highfive
assigned
sfackler
and unassigned
alexcrichton
Dec 10, 2018
This comment has been minimized.
This comment has been minimized.
|
I don't think I'm really convinced that the significant extra complexity here is worth it. I'd probably agree with @alexcrichton that this may be best served starting externally. |
This comment has been minimized.
This comment has been minimized.
|
ping from triage. Closing this since it is better to have this started externally. |
tarka commentedNov 13, 2018
This PR adds sparse-file support to the Linux implementation of
std::fs:copy(). The current implementation uses the Linux sys-callcopy_file_range(), which has a number of benefits over user-space copying, but it will expand 'holes' in any sparse files if not explicitly checked-for. This PR also adds a user-space copy function as a fall back for cases wherecopy_file_range()is unavailable or cannot handle the operation.There are a few judgement-call changes made to support the additional code; I'm happy to take pointers on style if this doesn't fit with the overall Rust codebase:
#[cfg()]s, minimise nesting, and simplify testing.It's worth noting that the user-space copy and sparse-hole seek code could be ported to other Unix-like OSs that support sparse-files, however I didn't want to get too into the weeds.
cc @nicokoch who added the original
copy_file_range()implementation.