-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenMP] Align up the size when calling aligned_alloc #65525
Conversation
Based on https://en.cppreference.com/w/c/memory/aligned_alloc, the `size` is supposed to be a multiple of `alignment`, and it is implementation defined behavior if not. We have a non-conformant use in `kmp_barrier.h` when allocating distribute barrier. The size of the barrier is 576 and the alignment is `4*CACHE_LINE`, which is 64 on most systems. Apparently it works perfectly fine for Linux and Intel-based Mac, but not for Apple Silicon based Mac. Fix llvm#63194.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
Based on https://en.cppreference.com/w/c/memory/aligned_alloc, the `size` is supposed to be a multiple of `alignment`, and it is implementation defined behavior if not. We have a non-conformant use in `kmp_barrier.h` when allocating distribute barrier. The size of the barrier is 576 and the alignment is `4*CACHE_LINE`, which is 256 on most systems. Apparently it works perfectly fine for Linux and Intel-based Mac, but not for Apple Silicon based Mac. Fix llvm#63194.
Was this performance tested on Intel hardware? Distributed barrier is designed specifically for use on Intel hardware. |
If this causes issues we need to distinguish apple here or catch the nullptr that is supposed to be returned https://open-std.org/JTC1/SC22/WG14/www/docs/summary.htm#dr_460 |
First, I don't think it will affect the performance of Intel hardware since it just "wastes" some memory. Next, as pointed by the two links in this PR, it is UB if the size is not a multiple of the alignment. The fact that it works on Intel hardware doesn't make it a right implementation. Lastly, if this feature is designed specifically for Intel, we need to restrict its use to Intel only hardware and disable the test on other platforms. However, on the other hand, if we only restrict it to Intel, then we can keep the previous implementation since it works on Intel. |
FWIW, I think the last change to the standard clarified it's not ub. I might have read it wrong but that is what my link was about. |
Based on https://en.cppreference.com/w/c/memory/aligned_alloc, the
size
is supposedto be a multiple of
alignment
, and it is implementation defined behavior if not.We have a non-conformant use in
kmp_barrier.h
when allocating distribute barrier.The size of the barrier is 576 and the alignment is
4*CACHE_LINE
, which is 256on most systems. Apparently it works perfectly fine for Linux and Intel-based Mac,
but not for Apple Silicon based Mac.
Fix #63194.