New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opportunity to use BTS instruction instead of cmpxchg #37322
Comments
Just as a comment, https://reviews.llvm.org/D48606#1144023 suggested that memory versions of BT* should not be used. |
We also don't optimize the single bit immediate version
|
Even though BTS/BTR/BTC are like >10 uop flows, they are probably still useful for this atomic case because we can remove the entire loop around the cmpxchg. Unfortunately, this is tricky to implement. The AtomicExpandPass currently turns all atomicrmw or/and/xors that need the previous value into binop + cmpxchg before X86 selection dag. We could try to disable this for the single bit case, but if the shift instruction and the atomicrmw are in separate basic blocks, we won't be able to find the bit position during basic block at a time isel. We almost need an IR construct for atomic bittest+set/clear/complement as one instruction. |
Much of this seems to work in GCC 12.2.0 as well as in I noticed a missed optimization in both g++-12 and clang++-15: Some operations involving bit 31 degrade to loops around #include <atomic>
template<uint32_t b>
void lock_bts(std::atomic<uint32_t> &a) { while (!(a.fetch_or(b) & b)); }
template<uint32_t b>
void lock_btr(std::atomic<uint32_t> &a) { while (a.fetch_and(~b) & b); }
template<uint32_t b>
void lock_btc(std::atomic<uint32_t> &a) { while (a.fetch_xor(b) & b); }
template void lock_bts<1U<<30>(std::atomic<uint32_t> &a);
template void lock_btr<1U<<30>(std::atomic<uint32_t> &a);
template void lock_btc<1U<<30>(std::atomic<uint32_t> &a);
// bug: uses lock cmpxchg
template void lock_bts<1U<<31>(std::atomic<uint32_t> &a);
template void lock_btr<1U<<31>(std::atomic<uint32_t> &a);
template void lock_btc<1U<<31>(std::atomic<uint32_t> &a); |
Extended Description
Hello,
Code:
Clang -O3:
GCC 7+ generates code using BTS instruction:
For more code examples see link:
https://godbolt.org/g/28nkRu
The text was updated successfully, but these errors were encountered: