Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP15 barrier instructions should be emitted before the exclusives loops #41201

Open
llvmbot opened this issue May 13, 2019 · 4 comments
Open
Labels
backend:ARM bugzilla Issues migrated from bugzilla

Comments

@llvmbot
Copy link
Collaborator

llvmbot commented May 13, 2019

Bugzilla Link 41856
Version trunk
OS Linux
Attachments Never ending loop
Reporter LLVM Bugzilla Contributor
CC @jpalus,@comex,@efriedma-quic,@kbeyls,@plugwash,@smithp35

Extended Description

Symptoms

strex never succeeds and is looping indefinitely.

Environment

  • Linux kernel with CP15_BARRIER_EMULATION=y
  • abi.cp15_barrier set to 1 (emulate)
  • arm-unknown-linux-gnueabihf toolchain

CP15 barrier instructions

Issue description

parking_lot author:

This seems to be closer to an LLVM bug than a parking_lot bug. The source
of the problem is the CP15 emulation in the kernel. Essentially the
mcr p15, #​0x0, r12, c7, c10, #​0x5 is trapping to the kernel every time,
which invalidates the exclusive monitor between the ldrex and strex
instructions. This results in the strex never succeeding and looping
indefinitely.

See attached loop.png image.

ARM engineer (Will Deacon) response on this:

Hi again, Robert,

Just a quick update on this:

  1. CP15 barriers remain deprecated in the Armv8 architecture, and so
    may be removed entirely from future CPUs.

  2. Because of (1), the kernel defaults to trap+emulate, so that it can
    warn about the use of these instructions. I think this is the right
    thing to do because, once the instructions have been removed, we
    will have no choice but to trap+emulate (this happened for the SWP
    instruction already). This trapping will prevent your exclusives loop
    from ever succeeding.

  3. The right place to address this issue is in LLVM, where atomic
    read-modify-write operations with conditional release semantics (i.e.
    release on success) should actually emit the CP15 barrier before the
    exclusives loop. Assuming that contention is rare (which it kind of
    needs to be for performant compare-and-swap anyway), I don't see this
    having a meaningful impact on performance.

I've reached out to one of our upstream LLVM developers, and I'll be talking
with him face-to-face next week about getting this fixed.

Will

Solution

Will's third point:

Atomic read-modify-write operations with conditional release semantics (i.e.
release on success) should actually emit the CP15 barrier before the
exclusives loop. Assuming that contention is rare (which it kind of
needs to be for performant compare-and-swap anyway), I don't see this
having a meaningful impact on performance.

No fix

People aren't / won't be able to use Rust / Swift / ... on Linux with
CP15_BARRIER_EMULATION=y & abi.cp15_barrier=1 (emulation, default value)
& arm-unknown-linux-gnueabihf toolchain.

@llvmbot
Copy link
Collaborator Author

llvmbot commented May 13, 2019

Rust compiler issue: rust-lang/rust#60605

@efriedma-quic
Copy link
Collaborator

I'm not sure I understand why this is being deprecated, but it's not my decision, I guess.

To be clear, the issue here is specifically with weak cmpxchg with release semantics, where the compiler is targeting armv6? (Next time, please include a testcase; it took me a few minutes to work that out based on just this report.)

If the barrier is going to trap to the kernel, we shouldn't be putting between an ldrex and an strex, yes.

If we're using a barrier we know won't trap, like dmb, we want to avoid the barrier where possible, for the sake of performance. For example, if a "release" cmpxchg fails, we don't need a barrier at all. This may not be due to contention; it might just be that the variable is in a state where the cmpxchg will always fail. This is more likely to be relevant for a "strong" cmpxchg.

Independent of where we place the barrier, if we expect that cp15 barriers will trap on future Linux releases, should LLVM be changed so it never uses cp15 barriers on Linux? We can call __sync_synchronize instead.

@efriedma-quic
Copy link
Collaborator

Independent of where we place the barrier, if we expect that cp15 barriers
will trap on future Linux releases, should LLVM be changed so it never uses
cp15 barriers on Linux? We can call __sync_synchronize instead.

Actually, thinking about this a bit more, this isn't really independent. We probably should just expand all non-"monotonic" atomicrmw and cmpxchg operations to _sync* on Linux if we have ldrex, but not dmb. If we have to call a helper for the barrier anyway, there isn't much point to expanding the atomic operation inline.

@plugwash
Copy link
Mannequin

plugwash mannequin commented Jan 21, 2021

Hi, I maintain the raspbian project I was recently directed to this bug as being the root cause of an issue that has been a major PITA in the raspbian project for some time. I decided to take a bit of a look.

Actually, thinking about this a bit more, this isn't really independent. We
probably should just expand all non-"monotonic" atomicrmw and cmpxchg operations
to _sync* on Linux if we have ldrex, but not dmb. If we have to call a helper
for the barrier anyway, there isn't much point to expanding the atomic operation > inline.

I agree, when people build code for "armv6" they generally expect it to work on "armv6 or better" and given the situation with these instructions going forward
avoiding them seems the safest option.

I decided to take a look at this myself, but I was rather out of my depth in terms of both rust and LLVM. After some searching the code I put the following patch in place, which I thought would disable the use of the cp15 barriers.

--- llvm-toolchain-11-11.0.0.orig/llvm/lib/Target/ARM/ARMSubtarget.h
+++ llvm-toolchain-11-11.0.0/llvm/lib/Target/ARM/ARMSubtarget.h
@@ -647,7 +647,9 @@ public:
bool hasAcquireRelease() const { return HasAcquireRelease; }
.
bool hasAnyDataBarrier() const {

  • return HasDataBarrier || (hasV6Ops() && !isThumb());
  • //avoid using armv6 cp15 barriers, they cause more
  • //problems than they solve.
  • return HasDataBarrier;// || (hasV6Ops() && !isThumb());
    }
    .
    bool useMulOps() const { return UseMulOps; }

Unfortunately after building and installing the new llvm and clearing the cargo caches, the parking lot testsuite still hangs, I don't know if this is because my change was ineffective or because there was some other problem.

Am I looking along the right lines?

Does anyone know how to reproduce this issue without involving rust or swift?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:ARM bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

2 participants