Ensure `ptr::read` gets all the same LLVM `load` metadata that dereferencing does #109035

scottmcm · 2023-03-11T23:37:05Z

I was looking into array::IntoIter optimization, and noticed that it wasn't annotating the loads with noundef for simple things like array::IntoIter<i32, N>. Trying to narrow it down, it seems that was because MaybeUninit::assume_init_read isn't marking the load as initialized (https://rust.godbolt.org/z/Mxd8TPTnv), which is unfortunate since that's basically its reason to exist.

The root cause is that ptr::read is currently implemented via the untyped copy_nonoverlapping, and thus the load doesn't get any type-aware metadata: no noundef, no !range. This PR solves that by lowering ptr::read(p) to copy *p in MIR, for which the backends already do the right thing.

Fortuitiously, this also improves the IR we give to LLVM for things like mem::replace, and fixes a couple of long-standing bugs where ptr::read on Copy types was worse than *ing them.

Zulip conversation: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Move.20array.3A.3AIntoIter.20to.20ManuallyDrop/near/341189936

cc @erikdesjardins @JakobDegen @workingjubilee @the8472

Fixes #106369
Fixes #73258

rustbot · 2023-03-11T23:37:10Z

r? @WaffleLapkin

(rustbot has picked a reviewer for you, use r? to override)

rustbot · 2023-03-11T23:37:13Z

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

saethlin · 2023-03-11T23:44:13Z

cc @rust-lang/opsem for awareness

saethlin · 2023-03-12T00:05:25Z

Does it make sense to give ptr::write the same treatment?

scottmcm · 2023-03-12T00:16:39Z

@saethlin It might, since write is also using copy_nonoverlapping. But I think I'd rather that be a separate change, since if it's worth it then it's a different motivation than the one here, since there's no noundef metadate on stores.

As a simple demo, https://rust.godbolt.org/z/8GbEsEj43 shows that read is worse than *p, but write is emitting the same thing as *p = x, so I want to just focus on the read side for now.

EDIT: there's also no !range metadata on stores, so that reason to do this for read also doesn't apply to write.

compiler/rustc_mir_transform/src/lower_intrinsics.rs

JakobDegen · 2023-03-12T00:30:25Z

tests/ui/consts/const-eval/ub-ref-ptr.stderr

@@ -148,11 +148,11 @@ LL | const DATA_FN_PTR: fn() = unsafe { mem::transmute(&13) };
               HEX_DUMP
           }

-error: accessing memory with alignment 1, but alignment 4 is required
+error[E0080]: evaluation of constant value failed


This does make a future-incompat warning into a hard error. Based on the comment here though, this seems to be pre-approved by T-lang. In any case, cc @RalfJung and @oli-obk for awareness

Yeah, these future incompat warnings were because we wanted to make moving away from dubious const-eval patterns smoother as part of the Const UB Armistice of #99923, so that const UB doesn't immediately turn into const-break-the-build due to compiler changes. If people feel it's been enough time we can switch this off.

This particular PR just affects ptr::read() which should definitely be fine imo. I'll leave the bikeshedding about what do to with the other cases to everyone else :)

IIRC, general temperature was that for these UB-in-const cases, a single warning stable cycle is probably sufficient, two is definitely sufficient, and that we're within rights to do no warning releases if we wanted to (i.e. warning at all is a good faith best effort to give some time to migrate).

Amusingly, the "see issue" is pointing at #68585, which is

Tracking issue for conflicting repr(...) hints future compatibility

Purely if it was up to me: go for it.

It seems like this will be a hugely beneficial change, crates impacted were still technically "doing it wrong", we're landing this no sooner than 1.70 (so they've had 1.68 and will have 1.69 to fix it), and "const-stable since 1.63" actually means that we should cut it off sooner rather than later due to the "Lindy effect" that bad code patterns have (i.e. the longer a pattern exists, the longer it is expected to continue existing). "More time to migrate" is something we should be considering for const fn stabilizations with version numbers like 1.49

Yeah this is fine, we can by now probably make that entire lint into a hard error.

What I don't understand immediately is why this PR changes behavior here though...

It changes behavior because unaligned copy_nonoverlapping is (temporarily) allowed, but unaligned derefs are not: https://godbolt.org/z/M6f5MrjKo

error: accessing memory with alignment 1, but alignment 4 is required --> /rustc/8a73f50d875840b8077b8ec080fa41881d7ce40d/library/core/src/intrinsics.rs:2393:9 | = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release! = note: for more information, see issue #68585 <https://github.com/rust-lang/rust/issues/104616> note: inside `copy_nonoverlapping::<u32>` --> /rustc/8a73f50d875840b8077b8ec080fa41881d7ce40d/library/core/src/intrinsics.rs:2393:9 note: inside `COPY_NONOVERLAPPING` --> <source>:8:5 | 8 | ptr::copy_nonoverlapping(unaligned, ptr::addr_of_mut!(dest), 1); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ = note: `#[deny(invalid_alignment)]` on by default error[E0080]: evaluation of constant value failed --> <source>:14:5 | 14 | *unaligned | ^^^^^^^^^^ accessing memory with alignment 1, but alignment 4 is required

(and this PR changes ptr::read from using copy_nonoverlapping to a deref)

It changes behavior because unaligned copy_nonoverlapping is (temporarily) allowed, but unaligned derefs are not

That's the thing, unaligned derefs should also be temporarily allowed...

I am... confused. On playground both * and copy_nonoverlapping error: [play]. Moreover the lint from copy_nonoverlapping is not actually a lint, you can't allow it: [play]. Lastly, the compiler says there are 3 errors, but only shows 2??...

compiler/rustc_mir_transform/src/lower_intrinsics.rs

I was looking into `array::IntoIter` optimization, and noticed that it wasn't annotating the loads with `noundef` for simple things like `array::IntoIter<i32, N>`. Turned out to be a more general problem as `MaybeUninit::assume_init_read` isn't marking the load as initialized (<https://rust.godbolt.org/z/Mxd8TPTnv>), which is unfortunate since that's basically its reason to exist. This PR lowers `ptr::read(p)` to `copy *p` in MIR, which fortuitiously also improves the IR we give to LLVM for things like `mem::replace`.

scottmcm · 2023-03-12T02:26:37Z

I fully expect this to be fine, but since read is pretty core, let's check perf just in case:
@bors try @rust-timer queue

workingjubilee · 2023-03-12T02:49:48Z

LLVMさま please bless this patch.

rust-timer · 2023-03-12T10:21:37Z

Finished benchmarking commit (b7c032a129d0565b7e3f96e008ac8baf713fddb0): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.9%	[0.2%, 1.3%]	5
Regressions ❌ (secondary)	1.2%	[0.2%, 3.1%]	8
Improvements ✅ (primary)	-0.6%	[-1.4%, -0.2%]	27
Improvements ✅ (secondary)	-2.5%	[-3.7%, -0.1%]	16
All ❌✅ (primary)	-0.3%	[-1.4%, 1.3%]	32

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.0%	[1.1%, 2.5%]	4
Regressions ❌ (secondary)	2.6%	[1.2%, 3.7%]	5
Improvements ✅ (primary)	-3.7%	[-6.9%, -0.8%]	6
Improvements ✅ (secondary)	-2.6%	[-2.6%, -2.6%]	2
All ❌✅ (primary)	-1.4%	[-6.9%, 2.5%]	10

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.1%	[1.1%, 1.1%]	1
Regressions ❌ (secondary)	2.3%	[2.2%, 2.5%]	2
Improvements ✅ (primary)	-1.5%	[-1.5%, -1.5%]	1
Improvements ✅ (secondary)	-3.0%	[-3.8%, -2.0%]	13
All ❌✅ (primary)	-0.2%	[-1.5%, 1.1%]	2

RalfJung · 2023-03-12T16:34:51Z

Turned out to be a more general problem as MaybeUninit::assume_init_read isn't marking the load as initialized (https://rust.godbolt.org/z/Mxd8TPTnv), which is unfortunate since that's basically its reason to exist.

There's nothing special about assume_init_read though, it's a completely normal function returning a T. So it seems odd to me that we would treat it in any particular way for optimizations?

library/core/src/ptr/mod.rs

erikdesjardins · 2023-03-12T18:01:59Z

There's nothing special about assume_init_read though, it's a completely normal function returning a T. So it seems odd to me that we would treat it in any particular way for optimizations?

It performs a read using ptr::read, which internally does that read at MaybeUninit<T> instead of T, so we don't put noundef on the "original" load. Nothing else after the original load really matters, because all other places where we could put noundef (e.g. the assume_init return value) are inlined or optimized out by MIR opt or LLVM. In this case, it's due to MIR opt, but e.g. disabling MIR inlining wouldn't help since LLVM inlining doesn't preserve knowledge either.

This results in us giving the following IR to LLVM: https://rust.godbolt.org/z/z31dn1zjM

define noundef i32 @demo(ptr noalias noundef readonly align 4 dereferenceable(4) %x) unnamed_addr #0 {
  %tmp = alloca i32, align 4
  call void @llvm.lifetime.start.p0(i64 4, ptr %tmp)
  call void @llvm.memcpy.p0.p0.i64(ptr align 4 %tmp, ptr align 4 %x, i64 4, i1 false)
  %self = load i32, ptr %tmp, align 4
  call void @llvm.lifetime.end.p0(i64 4, ptr %tmp)
  ret i32 %self
}

Note that there is no noundef on the load (nor the memcpy, since there's no way to express "writes 4 noundef bytes to its first argument"). There is noundef on the return value, but if the value was used in demo instead of being returned, of course it wouldn't be there.

erikdesjardins · 2023-03-12T18:10:07Z

Finished benchmarking commit (b7c032a): comparison URL.

All of the regressions seem to be due to LLVM doing more work, which makes sense--this new IR is much more optimizable.

erikdesjardins · 2023-03-12T18:14:18Z

library/core/src/ptr/mod.rs

+            // This uses a dedicated intrinsic, not `copy_nonoverlapping`,
+            // so that it gets a *typed* copy, not an *untyped* one.
+            crate::intrinsics::read_via_copy(src)


@JakobDegen it would be super nice if this entire PR was just

mir!({ RET = *src; Return() })

Are there any plans to allow defining intrinsics with custom MIR? Maybe it's difficult because both are in core?

There's https://stdrs.dev/nightly/x86_64-unknown-linux-gnu/std/intrinsics/mir/macro.mir.html (err, which of course you know because you used it in the example 🤦), but I don't know if that's something we'd ever want to use for productized things, rather than just in tests.

library/core/src/ptr/mod.rs

RalfJung · 2023-03-13T20:34:53Z

library/core/src/intrinsics.rs

+    /// The stabilized form of this intrinsic is [`crate::ptr::read`], so
+    /// that can be implemented without needing to do an *untyped* copy
+    /// via [`copy_nonoverlapping`], and thus can get proper metadata.


Suggested change

/// The stabilized form of this intrinsic is [`crate::ptr::read`], so

/// that can be implemented without needing to do an *untyped* copy

/// via [`copy_nonoverlapping`], and thus can get proper metadata.

/// The stabilized form of this intrinsic is [`crate::ptr::read`], so that

/// it is easier for the compiler to generate a load with proper metadata.

scottmcm · 2023-03-15T06:33:12Z

@bors r=WaffleLapkin,JakobDegen

bors · 2023-03-15T06:33:15Z

📌 Commit e7c6ad8 has been approved by WaffleLapkin,JakobDegen

It is now in the queue for this repository.

bors · 2023-03-15T06:50:11Z

⌛ Testing commit e7c6ad8 with merge 51c1bef70e3a6986eb7f91880e9123f53fe24a08...

bors · 2023-03-15T07:14:17Z

💔 Test failed - checks-actions

Apparently in CI it's getting generated in the opposite order, one function per file will make the test pass either way.

scottmcm · 2023-03-15T08:18:58Z

@bors r=WaffleLapkin,JakobDegen

bors · 2023-03-15T08:19:00Z

📌 Commit dfc3377 has been approved by WaffleLapkin,JakobDegen

It is now in the queue for this repository.

bors · 2023-03-15T11:44:15Z

⌛ Testing commit dfc3377 with merge e4b9f86...

bors · 2023-03-15T14:49:56Z

☀️ Test successful - checks-actions
Approved by: WaffleLapkin,JakobDegen
Pushing e4b9f86 to master...

rust-timer · 2023-03-17T02:34:46Z

Finished benchmarking commit (e4b9f86): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.8%	[0.3%, 1.9%]	7
Regressions ❌ (secondary)	1.5%	[0.3%, 2.9%]	5
Improvements ✅ (primary)	-0.8%	[-1.3%, -0.3%]	14
Improvements ✅ (secondary)	-2.2%	[-3.7%, -0.4%]	20
All ❌✅ (primary)	-0.2%	[-1.3%, 1.9%]	21

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.9%	[1.1%, 6.8%]	2
Regressions ❌ (secondary)	3.4%	[2.4%, 4.4%]	2
Improvements ✅ (primary)	-2.8%	[-6.3%, -0.8%]	8
Improvements ✅ (secondary)	-2.4%	[-4.1%, -1.1%]	10
All ❌✅ (primary)	-1.4%	[-6.3%, 6.8%]	10

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.2%	[2.1%, 2.4%]	4
Improvements ✅ (primary)	-1.0%	[-1.0%, -1.0%]	1
Improvements ✅ (secondary)	-4.3%	[-9.4%, -2.1%]	18
All ❌✅ (primary)	-1.0%	[-1.0%, -1.0%]	1

nnethercote · 2023-03-17T03:21:15Z

Improvements significantly outweigh regressions, plus there's a non-trivial improvement of ~5 seconds on bootstrap time.

@rustbot label: +perf-regression-triaged

rustbot assigned WaffleLapkin Mar 11, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 11, 2023

scottmcm force-pushed the ptr-read-should-know-undef branch from bfb3857 to b0f3e14 Compare March 11, 2023 23:41

JakobDegen reviewed Mar 12, 2023

View reviewed changes

scottmcm force-pushed the ptr-read-should-know-undef branch from b0f3e14 to b2c717f Compare March 12, 2023 01:44

JakobDegen approved these changes Mar 12, 2023

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 12, 2023

This comment was marked as resolved.

Sign in to view

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Mar 12, 2023

RalfJung reviewed Mar 12, 2023

View reviewed changes

library/core/src/ptr/mod.rs Show resolved Hide resolved

erikdesjardins reviewed Mar 12, 2023

View reviewed changes

RalfJung reviewed Mar 13, 2023

View reviewed changes

library/core/src/ptr/mod.rs Show resolved Hide resolved

RalfJung reviewed Mar 13, 2023

View reviewed changes

Improved implementation and comments after code review feedback

e7c6ad8

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 15, 2023

bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 15, 2023

Split the mem-replace codegen test

dfc3377

Apparently in CI it's getting generated in the opposite order, one function per file will make the test pass either way.

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 15, 2023

This comment has been minimized.

Sign in to view

bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 15, 2023

bors merged commit e4b9f86 into rust-lang:master Mar 15, 2023

rustbot added this to the 1.70.0 milestone Mar 15, 2023

scottmcm deleted the ptr-read-should-know-undef branch March 15, 2023 15:05

the8472 mentioned this pull request Mar 17, 2023

Permit the MIR inliner to inline diverging functions #106428

Merged

pnkfelix mentioned this pull request Mar 21, 2023

remove obsolete givens from regionck #107376

Merged

scottmcm mentioned this pull request Sep 4, 2023

read_via_copy: don't prematurely optimize away the read #115531

Merged

scottmcm mentioned this pull request Jul 3, 2024

Miscompilation on release profile when std::ptr::read is used to cast byte primitive to some tuples in unreachable code paths #127286

Closed

Ensure ptr::read gets all the same LLVM load metadata that dereferencing does #109035

Ensure ptr::read gets all the same LLVM load metadata that dereferencing does #109035

Conversation

scottmcm commented Mar 11, 2023 • edited Loading

rustbot commented Mar 11, 2023

rustbot commented Mar 11, 2023

saethlin commented Mar 11, 2023

saethlin commented Mar 12, 2023

scottmcm commented Mar 12, 2023 • edited Loading

JakobDegen Mar 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JakobDegen Mar 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

workingjubilee Mar 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

erikdesjardins Mar 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottmcm commented Mar 12, 2023

This comment has been minimized.

This comment was marked as resolved.

workingjubilee commented Mar 12, 2023

This comment was marked as resolved.

This comment has been minimized.

rust-timer commented Mar 12, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

RalfJung commented Mar 12, 2023

erikdesjardins commented Mar 12, 2023 • edited Loading

erikdesjardins commented Mar 12, 2023

Choose a reason for hiding this comment

scottmcm Mar 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottmcm commented Mar 15, 2023

bors commented Mar 15, 2023

bors commented Mar 15, 2023

bors commented Mar 15, 2023

scottmcm commented Mar 15, 2023

bors commented Mar 15, 2023

This comment has been minimized.

bors commented Mar 15, 2023

bors commented Mar 15, 2023

rust-timer commented Mar 17, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

nnethercote commented Mar 17, 2023

Ensure `ptr::read` gets all the same LLVM `load` metadata that dereferencing does #109035

Ensure `ptr::read` gets all the same LLVM `load` metadata that dereferencing does #109035

scottmcm commented Mar 11, 2023 •

edited

Loading

scottmcm commented Mar 12, 2023 •

edited

Loading

JakobDegen Mar 12, 2023 •

edited

Loading

JakobDegen Mar 12, 2023 •

edited

Loading

workingjubilee Mar 12, 2023 •

edited

Loading

erikdesjardins Mar 12, 2023 •

edited

Loading

erikdesjardins commented Mar 12, 2023 •

edited

Loading

scottmcm Mar 12, 2023 •

edited

Loading