Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrated assembler rejects ldrexd/strexd in thumb mode #30406

Closed
sbc100 opened this issue Nov 18, 2016 · 12 comments
Closed

Integrated assembler rejects ldrexd/strexd in thumb mode #30406

sbc100 opened this issue Nov 18, 2016 · 12 comments
Assignees
Labels
bugzilla Issues migrated from bugzilla clang:frontend Language frontend issues, e.g. anything involving "Sema" worksforme Resolved as "works for me"

Comments

@sbc100
Copy link
Collaborator

sbc100 commented Nov 18, 2016

Bugzilla Link 31058
Resolution WORKSFORME
Resolved on Jan 24, 2017 09:32
Version trunk
OS Linux
Blocks #20796
CC @kbeyls,@nico,@rengolin,@frobtech

Extended Description

These instructions are apparently valid in thumb2 mode but clang's integrated asm rejects them:

void func() {
int volatile *ptr;
int ret;
asm volatile("ldrexd %0,[%1]\n"
: "=&r" (ret)
: "r" (ptr)
: "cc", "memory"
);
}

$ clang --target=arm-linux-gnueabihf -march=armv7-a -o test.o -c test.c -mthumb
test.c:4:24: error: instruction requires: arm-mode
asm volatile("ldrexd %0,[%1]\n"
^
:1:2: note: instantiated into assembly here
ldrexd r1,[r0]
^
1 error generated.

With -no-integrated-as the same code compiles fine:

See https://bugs.chromium.org/p/chromium/issues/detail?id=564059

@sbc100
Copy link
Collaborator Author

sbc100 commented Nov 18, 2016

assigned to @rengolin

@frobtech
Copy link
Contributor

Also bites strexd, even the three-operand version that ironically is only fully available in Thumb mode and not in ARM mode. The full original case is:

void foo(long long *ptr, long long value) {
int store_failed;
long long dummy;
asm volatile(
"1:\n"
// Dummy load to lock cache line.
"ldrexd %1, [%3]\n"
"strexd %0, %2, [%3]\n"
"teq %0, #​0\n"
"bne 1b"
: "=&r" (store_failed), "=&r"(dummy)
: "r"(value), "r" (ptr)
: "cc", "memory");
}

@rengolin
Copy link
Member

Interesting... Their definitions are in the table-gen file and not even the ARM description has Requires<[IsARM]>. Must be a bug in the assembler itself, probably matching the wrong instruction.

@kbeyls
Copy link
Collaborator

kbeyls commented Nov 24, 2016

Looking into the issue briefly, I think clang technically isn't wrong.

For LDREXD, the ARMARM only defines the 3-argument assembly instruction: LDREXD{}{} , , []
For STREXD, the ARMARM only defines the 4-argument assembly instruction: STREXD{}{} , , , []

The reproducer added to #30406 shows that the assembly instructions used in that code aren't official ARM syntax, but rather, seemingly, a gas-specific extension with fewer arguments ("ldrexd %1, [%3]\n", "strexd %0, %2, [%3]\n").

The clang integrated assembler tends to stick closer to official assembly syntax and not try to support unofficial extensions.

I agree the error message is confusing.

The difficulty in this case in using the official syntax seems to be on how to represent in inline assembly constraints that you want Rt1 and Rt2 to respectively be the high/low (or vice versa, I didn't try to check which way around) part of a 64-bit value. The closest solution I've found so far to achieve this in inline assembly is something I found in clang's unit tests, i.e. splitting the 64 bit value explicitly into 2 32 bit values at the C level. Not the nicest solution, but it could be a work-around if there are no officially supported inline asm constraints to express the necessary constraints:

int64_t foo(int64_t v, volatile int64_t *p)
{
register uint32_t rl asm("r1");
register uint32_t rh asm("r2");

int64_t r;
uint32_t t;

asm volatile(
"ldrexd%[_rl], %[_rh], [%[_p]]"
: [_rl] "=&r" (rl), [_rh] "=&r" (rh)
: [_p] "p" (p) : "memory");

return r;
}

I think the better solution is to use intrinsics or builtins. Ideally, the ACLE-defined intrinsic __ldrexd and __strexd should be used here. Alas, these don't seem to have been implemented yet in clang. In the mean while, the best solution seems to be using the clang-specific builtins __builtin_arm_strexd and __builtin_arm_ldrexd. I think something like the following is semantically equivalent to the reproducer in the Roland's above comment:

void foo(long long *ptr, long long value) {
int store_failed=1;
long long dummy;
while (store_failed) {
dummy = __builtin_arm_ldrexd(ptr);
store_failed = __builtin_arm_strexd(value, ptr);
}
}

In my opinion, the code using the builtins is cleaner, so that could be the best solution for now to implement this, while waiting for the ACLE-defined intrinsics to get implemented.

@llvmbot
Copy link
Collaborator

llvmbot commented Dec 6, 2016

Hi Kristof,
I suspect Clang may actually support this syntax, as the error disappears when -mthumb is removed from the clang command line. Clang only complains when -mthumb is given to it from the command line.
Cheers,
Matteo

@rengolin
Copy link
Member

Script to test all arch/isa combinations on GAS and MC
If anything, LLVM is too lax on support and allow those instructions to be incorrectly emitted on Thumb1 architectures.

The attached script iterates through all arches, ISAs and EX instructions, trying them both on GAS and LLVM-MC, and the behaviour is mostly coherent, with the following exceptions:

LLVM passes with: echo 'ldrexd r2, r3, [r4]' | llvm-mc -triple armv4 -o /dev/null
LLVM passes with: echo 'strexd r0, r2, r3, [r4]' | llvm-mc -triple armv4 -o /dev/null
LLVM passes with: echo 'ldrexd r2, r3, [r4]' | llvm-mc -triple armv5 -o /dev/null
LLVM passes with: echo 'strexd r0, r2, r3, [r4]' | llvm-mc -triple armv5 -o /dev/null
LLVM passes with: echo 'ldrexd r2, r3, [r4]' | llvm-mc -triple armv6 -o /dev/null
LLVM passes with: echo 'strexd r0, r2, r3, [r4]' | llvm-mc -triple armv6 -o /dev/null
LLVM passes with: echo 'ldrexd r2, r3, [r4]' | llvm-mc -triple armv6t2 -o /dev/null
LLVM passes with: echo 'strexd r0, r2, r3, [r4]' | llvm-mc -triple armv6t2 -o /dev/null
LLVM passes with: echo 'ldrex r3, [r4]' | llvm-mc -triple armv4 -o /dev/null
LLVM passes with: echo 'strex r2, r3, [r4]' | llvm-mc -triple armv4 -o /dev/null
LLVM passes with: echo 'ldrex r3, [r4]' | llvm-mc -triple armv5 -o /dev/null
LLVM passes with: echo 'strex r2, r3, [r4]' | llvm-mc -triple armv5 -o /dev/null
LLVM passes with: echo 'ldrexb r3, [r4]' | llvm-mc -triple armv4 -o /dev/null
LLVM passes with: echo 'strexb r2, r3, [r4]' | llvm-mc -triple armv4 -o /dev/null
LLVM passes with: echo 'ldrexb r3, [r4]' | llvm-mc -triple armv5 -o /dev/null
LLVM passes with: echo 'strexb r2, r3, [r4]' | llvm-mc -triple armv5 -o /dev/null
LLVM passes with: echo 'ldrexb r3, [r4]' | llvm-mc -triple armv6 -o /dev/null
LLVM passes with: echo 'strexb r2, r3, [r4]' | llvm-mc -triple armv6 -o /dev/null
LLVM passes with: echo 'ldrexb r3, [r4]' | llvm-mc -triple armv6t2 -o /dev/null
LLVM passes with: echo 'strexb r2, r3, [r4]' | llvm-mc -triple armv6t2 -o /dev/null
LLVM passes with: echo 'ldrexh r3, [r4]' | llvm-mc -triple armv4 -o /dev/null
LLVM passes with: echo 'strexh r2, r3, [r4]' | llvm-mc -triple armv4 -o /dev/null
LLVM passes with: echo 'ldrexh r3, [r4]' | llvm-mc -triple armv5 -o /dev/null
LLVM passes with: echo 'strexh r2, r3, [r4]' | llvm-mc -triple armv5 -o /dev/null
LLVM passes with: echo 'ldrexh r3, [r4]' | llvm-mc -triple armv6 -o /dev/null
LLVM passes with: echo 'strexh r2, r3, [r4]' | llvm-mc -triple armv6 -o /dev/null
LLVM passes with: echo 'ldrexh r3, [r4]' | llvm-mc -triple armv6t2 -o /dev/null
LLVM passes with: echo 'strexh r2, r3, [r4]' | llvm-mc -triple armv6t2 -o /dev/null
Number of tests: 128
GCC failures: 0
LLVM failures: 28

Basically, all Thumb1 arches, though surprisingly not v6M, "support" exclusive loads and stores.

It seems LLVM is not failing anything that GCC isn't, when it comes to ARM ARM syntax.

@rengolin
Copy link
Member

Kristof, regarding the intrinsics, I've tested the ACLE recommendation and neither Clang (trunk) nor GCC (5.3, 6.3) support them. Is there any other GCC-specific intrinsic that one can do #ifdefs while using versions that do not support the ACLE syntax?

Also, do we have a time frame for the new ACLE version to be out? We would have an elegant solution to this problem...

IIGIR, the problem here is that you can't bind a long double into two registers in inline assembly, for example:

long long value;
long long *ptr;
asm volatile(
"ldrexd %0.lo, %0.hi, [%1]\n"
: "=r" (value)
: "*r" (ptr)
);

But it also seems that LLVM does understand the single-bind syntax, at least in Thumb mode. So, I guess this bug is more about supporting the "specific inline asm need to use long doubles" than deviating from ARM ARM syntax.

Clang does interpret that syntax (in ARM mode) and prints out ARM ARM compatible three-register syntax, but complains in Thumb mode that it's not compatible.

strexd.c:7:8: error: instruction requires: arm-mode
"ldrexd %1, [%3]\n"
strexd.c:8:8: error: instruction requires: arm-mode
"strexd %0, %2, [%3]\n"

which is not true, at least not in Thumb 2.

Given that llvm-mc did accept the syntax for ARMv7*, I believe this is just a bug in the inline assembler logic, not arch support.

@rengolin
Copy link
Member

s/at least in Thumb mode/at least in ARM mode/, obviously.

@rengolin
Copy link
Member

IR file with the inline asm for {ld,st}rexd
Funny enough, trunk now bails on ARM and errors out on Thumb:

$ llc strexd-arm.ll -o -
...
@app
.Ltmp0:
ldrexd
llc: llvm/include/llvm/MC/MCInst.h:75: int64_t llvm::MCOperand::getImm() const: Assertion `isImm() && "This is not an immediate"' failed.

IR file attached.

@llvmbot
Copy link
Collaborator

llvmbot commented Jan 23, 2017

I had another look at the issue some time ago, below are some notes. These notes may benefit whoever is trying to use the GAS syntax for ldrexd (and similar syntax for strexd). Using %0, %H0 it is possible to explicitly provide the two registers that make together the 64-bit value (read below for more info).

Looking at the ARM ARM I can see only one variant for ldrexd, which takes 3-arguments as follows:

ldrexd , , []

When compiling this assembly instruction as Thumb, rN and rM can be choosen separately, without constraints. On the contrary, when compiling for the ARM instruction set, rM is always the register following rN (M = N + 1). In other words, ARM has an encoding for instructions like "ldrexd r0, r1, [r2]", but not for "ldrexd r0, r3, [r2]". Indeed, if you look at the ARM encoding (the bit pattern defining the instruction) you'll see that only N is provided. This is maybe the reason why the GNU assembler accepts a 2-arguments variant:

ldrexd , []

Which it interprets as a "ldrexd , <r(N+1)>, []", both when using ARM mode and when using Thumb mode (interestingly, gdb uses this form when disassemblying instructions in ARM mode). The integrated Clang assembler, on the other hand, seems to only accept this syntax when in ARM mode.

Said this, the one register syntax is far from ideal. What we would ideally want when using asm(...) is to somehow pass to ldrexd both the registers that make the 64-bit entity. Fortunately, there is a way of doing this. I discovered it by searching code that I knew had to solve this same problem (e.g. the Linux kernel implementations in arch/arm/include/asm/atomic.h). This is one example:

static inline long long atomic64_read(const atomic64_t *v)
{
long long result;
asm volatile("@ atomic64_read\n"
" ldrexd %0, %H0, [%1]"
: "=&r" (result)
: "r" (&v->counter), "Qo" (v->counter)
);
return result;
}

So %0, %H0 gives the couple of registers that contain the values for the first argument, when this is 64-bit in size. The only useful documentation I have for the H register operand is the one from the source code at:

https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/arm/arm.c#l21562

Search for "An explanation of the 'Q', 'R' and 'H' register operands". My understanding is that using '%H0' (rather than '%R0') should work on big endian as well. Importantly, it also seems to be correctly understood by Clang.

@rengolin
Copy link
Member

Hi Matteo, Thanks!

That puts away all the concerns in this bug. There are the two new bugs I found which I'll report separately, as neither of them block Chromium.

I have tested and the %H0 syntax works well on Clang (3.9 and trunk) and GCC (5.3 and 6.3), for both ARM and Thumb.

Sam, please check that this solution is good for Chromium, so we can close the bug.

cheers,
--renato

@rengolin
Copy link
Member

Both bugs reported as bug #​31720 and bug #​31721.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@Quuxplusone Quuxplusone added the worksforme Resolved as "works for me" label Jan 20, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla Issues migrated from bugzilla clang:frontend Language frontend issues, e.g. anything involving "Sema" worksforme Resolved as "works for me"
Projects
None yet
Development

No branches or pull requests

6 participants