Skip to content

Conversation

sayantn
Copy link
Contributor

@sayantn sayantn commented Oct 1, 2025

  • Use inline asm for _mm_stream_{ss,sd}
  • Add _mm_sfence calls in tests, without it the tests are technically UB

@rustbot
Copy link
Collaborator

rustbot commented Oct 1, 2025

r? @Amanieu

rustbot has assigned @Amanieu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@Amanieu
Copy link
Member

Amanieu commented Oct 2, 2025

Can you add a comment explaining why inline asm is used here?

@sayantn
Copy link
Contributor Author

sayantn commented Oct 5, 2025

Shouldn't the implementation of stream_load intrinsics use _mm_sfence? Because afaiu stream_load is just stream, but we are loading the data onto the stack after the move, so the user doesn't need to call _mm_sfence. But doesn't this mean that we need to call it, otherwise the stack variable might end up (partially) uninitalized?

@Amanieu
Copy link
Member

Amanieu commented Oct 5, 2025

My understanding is that x86 memory model weirdness only applies to non-temporal stores, not non-temporal loads.

@Amanieu Amanieu added this pull request to the merge queue Oct 5, 2025
Merged via the queue into rust-lang:master with commit db9f604 Oct 5, 2025
63 checks passed
@sayantn
Copy link
Contributor Author

sayantn commented Oct 6, 2025

SDM says

MOVNTDQA loads a double quadword from the source operand (second operand) to the destination operand (first
operand) using a non-temporal hint if the memory source is WC (write combining) memory type

and

Because the WC protocol uses a weakly-ordered memory consistency model, a fencing operation implemented with
a MFENCE instruction should be used in conjunction with MOVNTDQA instructions if multiple processors might use
different memory types for the referenced memory locations or to synchronize reads of a processor with writes by other agents in the system.

so it seems like this can be UB if some very specific conditions are met (I am not too good with memory types). Do we not support such things (like using different memory types for same chunk in different processors)? It still seems like the second condition (syncing with other agents) might be a vulnerability

@Amanieu
Copy link
Member

Amanieu commented Oct 6, 2025

Memory type is an attribute that the OS sets for a range of physical memory. Normal memory uses the WB protocol which has the expected memory ordering semantics. Essentially we don't consider WC memory as normal memory for the Rust abstract machine, so the condition doesn't apply here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants