New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
non-temporal stores: use inline assembly #1541
base: master
Are you sure you want to change the base?
Conversation
My understanding is that LLVM can turn a nontemporal store into a normal one, but not the other way around. This seems to be fine as far as I understand. The CI failure happens because the |
It's completely unclear. LangRef talks about it as a hint:
That would mean the flag can be added or removed arbitrarily ("this load is not expected to be reused in the cache" -- but no semantic constraints or anything). But that's clearly wrong. LLVM doesn't acknowledge in the slightest the extra UB that can be caused by non-temporal stores (llvm/llvm-project#64521). Therefore I have zero confidence that anyone thought about how |
Hm, yes, this requires alignment, but that shouldn't be new...? |
struct Memory { | ||
pub data: [f32; 16], | ||
pub data: [f32; 16], // 64 bytes | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should have failed many times already. The only explanation I have for why that did not happen is that maybe LLVM optimizes away the entire test...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the whole stack frame gets 64-byte aligned, since there are __m512
values involved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that happened it would also happen with this PR.
LLVM treats
!nontemporal
as just a hint on store operations, which is unsound -- they have a totally different semantics, similar to atomic memory orderings. So I'd like to avoid any risk of that causing any issues by entirely avoiding their!nontemporal
attribute. Is it acceptable to use inline assembly to implement these intrinsics?Note that this is my first time ever writing inline assembly, so the code may or may not make any sense.^^