Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenMP][AIX]Define struct kmp_base_tas_lock with the order of two members swapped for big-endian #79188

Merged
merged 2 commits into from
Feb 13, 2024

Conversation

xingxue-ibm
Copy link
Contributor

The direct lock data structure has bit 0 (the least significant bit) of the first 32-bit word set to 1 to indicate it is a direct lock. On the other hand, the first word (in 32-bit mode) or first two words (in 64-bit mode) of an indirect lock are the address of the entry allocated from the indirect lock table. The runtime checks bit 0 of the first 32-bit word to tell if this is a direct or an indirect lock. This works fine for 32-bit and 64-bit little-endian because its memory layout of a 64-bit address is (low word, high word). However, this causes problems for big-endian where the memory layout of a 64-bit address is (high word, low word). If an address of the indirect lock table entry is something like 0x110035300, i.e., (0x1, 0x10035300), it is treated as a direct lock. This patch defines struct kmp_base_tas_lock with the ordering of the two 32-bit members flipped for big-endian PPC64 so that when checking/setting tags in member poll, the second word (the low word) is used. This patch also changes places where poll is not already explicitly specified for checking/setting tags.

@xingxue-ibm xingxue-ibm added the openmp:libomp OpenMP host runtime label Jan 23, 2024
@xingxue-ibm xingxue-ibm self-assigned this Jan 23, 2024
@@ -120,8 +120,15 @@ extern void __kmp_validate_locks(void);

struct kmp_base_tas_lock {
// KMP_LOCK_FREE(tas) => unlocked; locked: (gtid+1) of owning thread
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ && KMP_ARCH_PPC64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I question whether KMP_ARCH_PPC64 should be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought to use KMP_ARCH_PPC64 to limit the scope. This issue may also affect s390x which I believe is also big-endian. I've changed to guard the order flipping with #if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ && __LP64__. @iii-i please let me know if it does not work for s390x. The reason for guarding it with __LP64__ is the original ordering works as designed in 32-bit mode both little- and big-endian. If these members are flipped for 32-bit big-endian, tags of a direct lock will take the second word while the address of the entry allocated from the indirect lock table will take the first word.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought to use KMP_ARCH_PPC64 to limit the scope. This issue may also affect s390x which I believe is also big-endian.

Ya, I was also thinking about other big-endian processors that might be affected like mips64 or if sparc support is eventually added sparc64.

@iii-i
Copy link
Member

iii-i commented Jan 24, 2024

How did you find this? I would like to check if this affects s390x. If the repro is simple enough, maybe consider adding a test?

@xingxue-ibm
Copy link
Contributor Author

How did you find this? I would like to check if this affects s390x. If the repro is simple enough, maybe consider adding a test?

There were 36 libomp/runtime LIT test affected on AIX in 64-bit, including critical/omp_critical_with_hint.c, and many test cases in the test/parallel and worksharing directories. Investigating these failures shows the root cause is the memory layout of the address in 64-bit big-endian. Since s390x is big-endian, this may affect s390x as well. In terms of adding a new test case for this issue, I am not sure it will be necessary because without this patch, test cases mentioned above would be failing.

@iii-i iii-i requested a review from uweigand January 30, 2024 01:42
@iii-i
Copy link
Member

iii-i commented Jan 30, 2024

I think this currently works on s390x by accident, since heap and executable addresses look roughly like this: 0x2aa2e7e0000. The leading 0x2aa tends to stay there regardless of ASLR, and it is even.

The changes look good to me. I tested them on s390x and there are no test failures.

@xingxue-ibm
Copy link
Contributor Author

Gentle ping...Are there any further comments or concerns?

Copy link
Collaborator

@kkwli kkwli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks.

@xingxue-ibm xingxue-ibm merged commit ac97562 into llvm:main Feb 13, 2024
4 checks passed
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Feb 14, 2024
…mbers swapped for big-endian (llvm#79188)

The direct lock data structure has bit `0` (the least significant bit)
of the first 32-bit word set to `1` to indicate it is a direct lock. On
the other hand, the first word (in 32-bit mode) or first two words (in
64-bit mode) of an indirect lock are the address of the entry allocated
from the indirect lock table. The runtime checks bit `0` of the first
32-bit word to tell if this is a direct or an indirect lock. This works
fine for 32-bit and 64-bit little-endian because its memory layout of a
64-bit address is (`low word`, `high word`). However, this causes
problems for big-endian where the memory layout of a 64-bit address is
(`high word`, `low word`). If an address of the indirect lock table
entry is something like `0x110035300`, i.e., (`0x1`, `0x10035300`), it
is treated as a direct lock. This patch defines `struct
kmp_base_tas_lock` with the ordering of the two 32-bit members flipped
for big-endian PPC64 so that when checking/setting tags in member
`poll`, the second word (the low word) is used. This patch also changes
places where `poll` is not already explicitly specified for
checking/setting tags.

(cherry picked from commit ac97562)
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Feb 16, 2024
…mbers swapped for big-endian (llvm#79188)

The direct lock data structure has bit `0` (the least significant bit)
of the first 32-bit word set to `1` to indicate it is a direct lock. On
the other hand, the first word (in 32-bit mode) or first two words (in
64-bit mode) of an indirect lock are the address of the entry allocated
from the indirect lock table. The runtime checks bit `0` of the first
32-bit word to tell if this is a direct or an indirect lock. This works
fine for 32-bit and 64-bit little-endian because its memory layout of a
64-bit address is (`low word`, `high word`). However, this causes
problems for big-endian where the memory layout of a 64-bit address is
(`high word`, `low word`). If an address of the indirect lock table
entry is something like `0x110035300`, i.e., (`0x1`, `0x10035300`), it
is treated as a direct lock. This patch defines `struct
kmp_base_tas_lock` with the ordering of the two 32-bit members flipped
for big-endian PPC64 so that when checking/setting tags in member
`poll`, the second word (the low word) is used. This patch also changes
places where `poll` is not already explicitly specified for
checking/setting tags.

(cherry picked from commit ac97562)
@pointhex pointhex mentioned this pull request May 7, 2024
@xingxue-ibm xingxue-ibm deleted the openmp-lock-big-endian branch May 24, 2024 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
openmp:libomp OpenMP host runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants