windows-arm64 support#8995
Conversation
Credits to MSYS2 Ruby package using this patch.
Fix compilation error when using MSYS2 environment. Credits to MSYS2 Ruby package using this patch.
When adding preprocessor option for llvm-windres (using clang as parameter), it fails. Thus, do not add this. It's needed to be able to compile windows-arm64 version, because MSYS2 toolchain is LLVM based (instead of GCC/binutils).
This fixes "unexpected ucrtbase.dll" for native windows-arm64 ruby binary. It does not solve issue with x64 version emulated on this platform. Value of pioinfo pointer can be found in ucrtbase.dll at latest adrp/add sequence before return of _isatty function. This works for both release and debug ucrt. Due to the nature of aarch64 ISA (vs x86 or x64), it's needed to disassemble instructions to retrieve offset value, which is a bit more complicated than matching specific string patterns. Details about adrp/add usage can be found in this blog post: https://devblogs.microsoft.com/oldnewthing/20220809-00/?p=106955 For instruction decoding, the Arm documentation was used as a reference.
|
For now, 2 tests are failing. I can investigate and fix this, once this PR will be merged. |
|
@dennisameling @lazka You could be interested by some of those patches for MSYS2 Ruby package. It would enable clangarm64 version of it (I didn't try yet to build additional gems). |
|
@nurse You're welcome to review it, as this |
Out of curiosity, would it be possible to remove/replace that __pioinfo mess entirely? The assembly instructions in ucrtbase may change anytime in any Windows version. |
|
@larskanis did something in that area last year: https://bugs.ruby-lang.org/issues/18605#note-6 I don't know what the status there is. |
|
I completely agree that the best solution would be to get rid of For now, it's a tentative to have a pragmatic solution, and hopefully, @ArminG-MSFT can make sure ucrtbase |
|
The patch looks good. Thank you for your contribution! |
|
As lazka quoted and larskanis said, to remove
As far as I understand Python is under slightly different situation because they don't provide HANDLE/SOCKET unification. (their Perl seems similar situation, but they introduced another hack this year (!). Perl/perl5@8552f09 |
|
fyi, the USE_LLVM_WINDRES workaround in this PR should no longer be needed starting with llvm v18, and the relevant llvm changes are backported in MSYS2 as of today -> msys2/MINGW-packages#19524 |
| /* imm = immhi:immlo:Zeros(12), 64 */ | ||
| const uint64_t adrp_imm = ((adrp_immhi << 2) | adrp_immlo) << 12; |
There was a problem hiding this comment.
I'm messing with some similar code, and noticed what I think is a bug here: adrp's immediate should be sign-extended. I think something like
| /* imm = immhi:immlo:Zeros(12), 64 */ | |
| const uint64_t adrp_imm = ((adrp_immhi << 2) | adrp_immlo) << 12; | |
| /* imm = SignExtend(immhi:immlo:Zeros(12), 64) */ | |
| int64_t adrp_imm = (adrp_immhi << 2) | adrp_immlo; | |
| /* sign extend */ | |
| if (adrp_imm & (1 << 20)) | |
| adrp_imm |= ~((1 << 21) - 1); | |
| adrp_imm <<= 12; |
if you really want that to be const I suppose you could do that in a ternary but I don't know if it's more legible that way:
| /* imm = immhi:immlo:Zeros(12), 64 */ | |
| const uint64_t adrp_imm = ((adrp_immhi << 2) | adrp_immlo) << 12; | |
| /* imm = SignExtend(immhi:immlo:Zeros(12), 64) */ | |
| const int64_t adrp_imm = (((adrp_immhi & (1 << 18)) ? ~((1 << 21) - 1) : 0) | (adrp_immhi << 2) | adrp_immlo) << 12; |
There was a problem hiding this comment.
You're right, I missed that, and luckily, we didn't hit the case where adrp immhi sign matters.
I won't push a fix, but you're welcome to contribute it.
There was a problem hiding this comment.
Maybe... The code I'm messing with didn't hit a negative offset either, but I was thinking it wouldn't be very useful if it wasn't signed, so checked the docs instead of just relying on what this code was doing... I didn't check the specifics of the add instruction either, the code I was looking at didn't use that, instead an ldr with an immediate unsigned offset to add in the low-order bits.
There was a problem hiding this comment.
By curiosity, I checked the QEMU source as well, and sign is carefully extended for the immediate when decoding instruction.
| const uint32_t add_id = 0x11000000; | ||
| const uint32_t add_mask = 0x3fc00000; |
There was a problem hiding this comment.
I'm having trouble understanding the add id/mask. I think this is trying to match https://developer.arm.com/documentation/ddi0602/2024-09/Index-by-Encoding/Data-Processing----Immediate?lang=en#addsub_imm.
id 0 0 0 1 0 0 0 1 0 0
mask 0 0 1 1 1 1 1 1 1 1
desc sf op S 1 0 0 0 1 0 sh
So you mask off sf (size, 32 or 64 bit), and op (apparently add vs subtract), and match S must be off (not setting flags) and sh must be off (not shifting by 12 bits). You then later implement checking and handling the sh bit left shifting by 12, but never check or implement the op bit turning it into a subtract. Perhaps you meant mask 0x7f800000 instead? Or am I off by one somewhere?
Anyway, sh should not be set, the adrp sets everything but the least-significant 12 bits, so the purpose of the following add in the pair is to set the lower 12 bits, and shifting the add's immediate by 12 bits would result in it not setting those 12 bits either. So a mask of 0x7fc00000 would probably be fine, and then drop the add_sh handling completely.
There was a problem hiding this comment.
I used doc pages for instructions encoding: add - vs sub.
_mask are the bits that matters to identify encoding (same between sub and add).
_id are the bits expected, i.e. the concrete encoding. (bit 30 allow to differentiate sub vs add).
To identify an add, we need to check bits 23 to 30, which represent the instruction (fixed) encoding.
add_id: 0x11000000 (bits 23 to 30 are 0b00100010, sub_id would be 0b10100010)
add_mask: 0x3fc00000 (0b0111111110000000000000000000000, bits 23 to 30), sub_mask would be the same.
Note: The code included here is not supposed to be a complete disassembler, just a simple way to find a couple of sequences expected (it's supposed that only immediates may be changed in the future). So it makes assumptions (S bit at 0) and is "enough" to work, no more.
There was a problem hiding this comment.
_maskare the bits that matters to identify encoding (same between sub and add)._idare the bits expected, i.e. the concrete encoding. (bit 30 allow to differentiate sub vs add).To identify an add, we need to check bits 23 to 30, which represent the instruction (fixed) encoding.
add_id: 0x11000000 (bits 23 to 30 are 0b00100010,sub_idwould be 0b10100010)add_mask: 0x3fc00000 (0b0111111110000000000000000000000, bits 23 to 30), sub_mask would be the same.
sub x1, x2, #0x159 is:
0xd1056441 is 0b11010001000001010110010001000001
your mask is:
0x3fc00000 is 0b00111111110000000000000000000000
anding them results in:
0x11000000 is the same as add_id.
I think you may be off-by-one in your bits due to them numbering from 0 to 31, so counting from MSB you need to start at 31 not 32.
I still recommend mask of 0x7fc00000 and removing add_sh. I could open a PR for that, if you agree with my reasoning.
There was a problem hiding this comment.
oops, you're right, it seems that the mask is shifted to the right, and missing the op bit which is wrong.
And counting again yesterday, I did the same mistake...
The correct mask I get is 0x7f800000 (not including the sh bit). (0xff << 23, bits 23 to 30). Unluckily, since bit 30 is 0 for add, we missed that because mask exclude it. I would have catch the problem if it was a sub instead.
There was a problem hiding this comment.
Feel free to make a PR for it (and ping me, I can give you a +1).
It was introduced as part of the Arm64-on-Windows patch: ruby#8995 But a few days later it was fixed on the LLVM side for llvm-18 and backported to MSYS2: msys2/MINGW-packages#19157 (comment) Now this code is only unnecessary complexity.
It was introduced as part of the Arm64-on-Windows patch: ruby#8995 But a few days later it was fixed on the LLVM side for llvm-18 and backported to MSYS2: msys2/MINGW-packages#19157 (comment) Now this code is only unnecessary complexity.
It was introduced as part of the Arm64-on-Windows patch: #8995 But a few days later it was fixed on the LLVM side for llvm-18 and backported to MSYS2: msys2/MINGW-packages#19157 (comment) Now this code is only unnecessary complexity.
This series allows to build and run Ruby natively on windows-arm64.
It contains fixes for build system (some of them taken from MSYS2 Ruby package).
Most of the work is detection of
__pioinfopointer, which was the blocking step. Details are available in commit message. This was tested on windows-arm64 11 retail build (not insiders).You can find a build log here, that uses this script from an MSYS2 clangarm64 shell.
I'm not sure if Ruby community accepts a serie of patches, or if several PR are needed. Let me know.
Note: This is an effort part of my work at Linaro.