Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non-temporal stores: use inline assembly #1541

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Feb 25, 2024

LLVM treats !nontemporal as just a hint on store operations, which is unsound -- they have a totally different semantics, similar to atomic memory orderings. So I'd like to avoid any risk of that causing any issues by entirely avoiding their !nontemporal attribute. Is it acceptable to use inline assembly to implement these intrinsics?

Note that this is my first time ever writing inline assembly, so the code may or may not make any sense.^^

@rustbot
Copy link
Collaborator

rustbot commented Feb 25, 2024

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@Amanieu
Copy link
Member

Amanieu commented Feb 25, 2024

My understanding is that LLVM can turn a nontemporal store into a normal one, but not the other way around. This seems to be fine as far as I understand.


The CI failure happens because the target_feature attribute only enables sse2 and rustc isn't smart enough to figure out that this implies sse (only LLVM knowns that). You fix it by enabling the sse feature as well.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 25, 2024

My understanding is that LLVM can turn a nontemporal store into a normal one, but not the other way around. This seems to be fine as far as I understand.

It's completely unclear. LangRef talks about it as a hint:

The optional !nontemporal metadata must reference a single metadata name <nontemp_node> corresponding to a metadata node with one i32 entry of value 1. The existence of the !nontemporal metadata on the instruction tells the optimizer and code generator that this load is not expected to be reused in the cache. The code generator may select special instructions to save cache bandwidth, such as the MOVNT instruction on x86.

That would mean the flag can be added or removed arbitrarily ("this load is not expected to be reused in the cache" -- but no semantic constraints or anything). But that's clearly wrong. LLVM doesn't acknowledge in the slightest the extra UB that can be caused by non-temporal stores (llvm/llvm-project#64521). Therefore I have zero confidence that anyone thought about how !nontemporal interacts with all the LLVM passes that work on load (almost all of which probably just ignore the attribute entirely). I'm not even aware of any cross-platform memory model with support for nontemporal stores that they could be using here -- and they clearly need a cross-platform memory model since they are doing optimizations in the context of a C++11-style model.

@RalfJung
Copy link
Member Author

SDE ERROR:  TID: 1064 executed instruction with an unaligned memory reference to address 0x7f27229035e0 INSTR: 0x562d8a5e21f3: IFORM: VMOVNTPS_MEMf32_ZMMf32_AVX512 :: vmovntps zmmword ptr [rax], zmm0
	IMAGE:    /checkout/target/x86_64-unknown-linux-gnu/release/deps/core_arch-59198cd2fc79a24a
	FUNCTION: _ZN9core_arch9core_arch3x867avx512f5tests20test_mm512_stream_ps20test_mm512_stream_ps17hb7f0b28acc824410E.llvm.13799798511543115899
	FUNCTION ADDR: 0x562d8a5e21c0

Hm, yes, this requires alignment, but that shouldn't be new...?

struct Memory {
pub data: [f32; 16],
pub data: [f32; 16], // 64 bytes
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should have failed many times already. The only explanation I have for why that did not happen is that maybe LLVM optimizes away the entire test...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the whole stack frame gets 64-byte aligned, since there are __m512 values involved.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that happened it would also happen with this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants