Skip to content

Conversation

SUPERCILEX
Copy link
Contributor

@SUPERCILEX SUPERCILEX commented Dec 20, 2024

Currently, fs::copy first tries a regular file copy (via copy_file_range) and then falls back to userspace read/write copying. We should use io::copy instead as it tries copy_file_range, sendfile, and splice before falling back to userspace copying. This was discovered here: SUPERCILEX/fuc#40

Perf impact: fs::copy will now have two additional statx calls to decide which syscall to use. I wonder if we should get rid of the statx calls and only continue down the next fallback when the relevant syscalls say the FD isn't supported.

@rustbot
Copy link
Collaborator

rustbot commented Dec 20, 2024

r? @thomcc

rustbot has assigned @thomcc.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added O-unix Operating system: Unix-like S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Dec 20, 2024
@SUPERCILEX SUPERCILEX changed the title Unify fs::copy and io::copy Unify fs::copy and io::copy on Linux Dec 20, 2024
@thomcc
Copy link
Member

thomcc commented Dec 21, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 21, 2024
@thomcc
Copy link
Member

thomcc commented Dec 21, 2024

It would be nice to dedupe the stat calls, but I assume that's too much of a pain?

@bors
Copy link
Collaborator

bors commented Dec 21, 2024

⌛ Trying commit 2b4334e with merge d179dd5...

@SUPERCILEX
Copy link
Contributor Author

Sorry, what do you mean by "dedupe the stat calls?" There aren't unnecessary calls—we use them to determine which syscall to use. I'm suggesting that instead of doing stat -> stat -> relevant_syscall, we can do failed_syscall -> failed_syscall -> failed_syscall -> use space copy in the worst case and go straight to copy_file_range in the fast path. So worst case it's an extra syscall but best case using any of the copy acceleration syscalls is no more than paying the stat calls.

@thomcc
Copy link
Member

thomcc commented Dec 21, 2024

Ah, I had misunderstood. We can leave something like that for a future pr.

@SUPERCILEX
Copy link
Contributor Author

Sounds good! I guess do we know what proportion of calls of fs/io copy are between regular files vs pipes vs sockets etc? No idea how we'd get this data lol.

@bors
Copy link
Collaborator

bors commented Dec 21, 2024

☀️ Try build successful - checks-actions
Build commit: d179dd5 (d179dd52797cf4ff504a7914bf84a41fd5603089)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (d179dd5): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results (secondary 3.2%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
3.2% [3.2%, 3.2%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 767.128s -> 766.815s (-0.04%)
Artifact size: 330.26 MiB -> 330.24 MiB (-0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 21, 2024
@thomcc
Copy link
Member

thomcc commented Dec 21, 2024

@bors r+

@bors
Copy link
Collaborator

bors commented Dec 21, 2024

📌 Commit 2b4334e has been approved by thomcc

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 21, 2024
@SUPERCILEX
Copy link
Contributor Author

Ok I took a look at the io::copy code and #108283 means we can't skip the stat calls for sendfile, so avoiding the stat calls for copy_file_range probably isn't worth it.

Though also I lied and there are duplicate stat calls! Will fix.

@SUPERCILEX
Copy link
Contributor Author

SUPERCILEX commented Dec 21, 2024

Ok, fixed. Old:

openat(AT_FDCWD, "LICENSE", O_RDONLY|O_CLOEXEC) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0664, stx_size=11358, ...}) = 0
openat(AT_FDCWD, "/tmp/LICENSE", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0100664) = 4
statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0664, stx_size=0, ...}) = 0
fchmod(4, 0100664)                      = 0
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0664, stx_size=11358, ...}) = 0
statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0664, stx_size=0, ...}) = 0
copy_file_range(3, NULL, 4, NULL, 1073741824, 0) = -1 EXDEV (Invalid cross-device link)
sendfile(4, 3, NULL, 2147479552)        = 11358
sendfile(4, 3, NULL, 2147479552)        = 0
close(4)                                = 0
close(3)                                = 0

New:

openat(AT_FDCWD, "LICENSE", O_RDONLY|O_CLOEXEC) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0664, stx_size=11358, ...}) = 0
openat(AT_FDCWD, "/tmp/LICENSE", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0100664) = 4
statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0664, stx_size=0, ...}) = 0
fchmod(4, 0100664)                      = 0
copy_file_range(3, NULL, 4, NULL, 1073741824, 0) = -1 EXDEV (Invalid cross-device link)
sendfile(4, 3, NULL, 2147479552)        = 11358
sendfile(4, 3, NULL, 2147479552)        = 0
close(4)                                = 0
close(3)                                = 0

Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
@rustbot
Copy link
Collaborator

rustbot commented Dec 23, 2024

Could not assign reviewer from: thomcc.
User(s) thomcc are either the PR author, already assigned, or on vacation, and there are no other candidates.
Use r? to specify someone else to assign.

@SUPERCILEX

This comment was marked as off-topic.

@mati865

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 23, 2024
@SUPERCILEX

This comment was marked as off-topic.

Comment on lines 1989 to 1990
&mut crate::sys::kernel_copy::CachedFileMetadata(reader, reader_metadata),
&mut crate::sys::kernel_copy::CachedFileMetadata(writer, writer_metadata),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernel_copy module is only defined for linux/android. So this would fail on some other unixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed. I'm surprised CI doesn't catch that.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
@thomcc
Copy link
Member

thomcc commented Dec 28, 2024

@bors r+

@bors
Copy link
Collaborator

bors commented Dec 28, 2024

📌 Commit 96cc078 has been approved by thomcc

It is now in the queue for this repository.

@bors bors removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 28, 2024
@bors
Copy link
Collaborator

bors commented Dec 28, 2024

⌛ Testing commit 96cc078 with merge 3c1e750...

@bors
Copy link
Collaborator

bors commented Dec 28, 2024

☀️ Test successful - checks-actions
Approved by: thomcc
Pushing 3c1e750 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 28, 2024
@bors bors merged commit 3c1e750 into rust-lang:master Dec 28, 2024
7 checks passed
@rustbot rustbot added this to the 1.85.0 milestone Dec 28, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (3c1e750): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 1.1%, secondary -0.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.7% [2.2%, 3.1%] 2
Regressions ❌
(secondary)
1.6% [1.6%, 1.6%] 1
Improvements ✅
(primary)
-2.1% [-2.1%, -2.1%] 1
Improvements ✅
(secondary)
-1.9% [-1.9%, -1.9%] 1
All ❌✅ (primary) 1.1% [-2.1%, 3.1%] 3

Cycles

Results (primary -2.3%, secondary -4.1%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.2% [2.2%, 2.2%] 2
Improvements ✅
(primary)
-2.3% [-2.3%, -2.3%] 1
Improvements ✅
(secondary)
-6.6% [-7.6%, -4.0%] 5
All ❌✅ (primary) -2.3% [-2.3%, -2.3%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 763.637s -> 763.352s (-0.04%)
Artifact size: 325.42 MiB -> 325.46 MiB (0.01%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. O-unix Operating system: Unix-like S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants