Skip to content

windows-arm64 support#8995

Merged
nurse merged 4 commits into
ruby:masterfrom
p-b-o:windows-arm64
Nov 23, 2023
Merged

windows-arm64 support#8995
nurse merged 4 commits into
ruby:masterfrom
p-b-o:windows-arm64

Conversation

@p-b-o
Copy link
Copy Markdown
Contributor

@p-b-o p-b-o commented Nov 22, 2023

This series allows to build and run Ruby natively on windows-arm64.

It contains fixes for build system (some of them taken from MSYS2 Ruby package).
Most of the work is detection of __pioinfo pointer, which was the blocking step. Details are available in commit message. This was tested on windows-arm64 11 retail build (not insiders).

You can find a build log here, that uses this script from an MSYS2 clangarm64 shell.

I'm not sure if Ruby community accepts a serie of patches, or if several PR are needed. Let me know.
Note: This is an effort part of my work at Linaro.

Pierrick Bouvier added 4 commits November 21, 2023 18:48
Credits to MSYS2 Ruby package using this patch.
Fix compilation error when using MSYS2 environment.
Credits to MSYS2 Ruby package using this patch.
When adding preprocessor option for llvm-windres (using clang as
parameter), it fails. Thus, do not add this.

It's needed to be able to compile windows-arm64 version, because MSYS2
toolchain is LLVM based (instead of GCC/binutils).
This fixes "unexpected ucrtbase.dll" for native windows-arm64 ruby
binary. It does not solve issue with x64 version emulated on this
platform.

Value of pioinfo pointer can be found in ucrtbase.dll at latest adrp/add
sequence before return of _isatty function. This works for both release
and debug ucrt.

Due to the nature of aarch64 ISA (vs x86 or x64), it's needed to
disassemble instructions to retrieve offset value, which is a bit more
complicated than matching specific string patterns.

Details about adrp/add usage can be found in this blog post:
https://devblogs.microsoft.com/oldnewthing/20220809-00/?p=106955
For instruction decoding, the Arm documentation was used as a reference.
@p-b-o p-b-o changed the title Windows-arm64 support windows-arm64 support Nov 22, 2023
@p-b-o
Copy link
Copy Markdown
Contributor Author

p-b-o commented Nov 22, 2023

For now, 2 tests are failing. I can investigate and fix this, once this PR will be merged.

#209 test_fiber.rb:37: 
     Fiber.new(&Object.method(:class_eval)).resume("foo")
  #=> killed by SIGSEGV (signal 11)  [ruby-dev:34128]
#1524 test_thread.rb:336: 
     g = enum_for(:binding)
     loop { g.next }
  #=> killed by SIGSEGV (signal 11)  [ruby-dev:34128]
FAIL 2/0 tests failed

@p-b-o
Copy link
Copy Markdown
Contributor Author

p-b-o commented Nov 22, 2023

@dennisameling @lazka You could be interested by some of those patches for MSYS2 Ruby package. It would enable clangarm64 version of it (I didn't try yet to build additional gems).

@p-b-o
Copy link
Copy Markdown
Contributor Author

p-b-o commented Nov 23, 2023

@nurse You're welcome to review it, as this __pioinfo stuff should be familiar to you ;).

@Biswa96
Copy link
Copy Markdown

Biswa96 commented Nov 23, 2023

Most of the work is detection of __pioinfo pointer, which was the blocking step.

Out of curiosity, would it be possible to remove/replace that __pioinfo mess entirely? The assembly instructions in ucrtbase may change anytime in any Windows version.

@lazka
Copy link
Copy Markdown

lazka commented Nov 23, 2023

@larskanis did something in that area last year: https://bugs.ruby-lang.org/issues/18605#note-6 I don't know what the status there is.

@p-b-o
Copy link
Copy Markdown
Contributor Author

p-b-o commented Nov 23, 2023

I completely agree that the best solution would be to get rid of pioinfo pointer. Python, Perl, and MinGW (at least) got rid of it from what I investigated. As @lazka mentioned, @larskanis started an implementation here. However, it seems it has been untouched for a while.

For now, it's a tentative to have a pragmatic solution, and hopefully, @ArminG-MSFT can make sure ucrtbase _isatty function will keep the same pattern for windows-arm64 too.

@nurse
Copy link
Copy Markdown
Member

nurse commented Nov 23, 2023

The patch looks good. Thank you for your contribution!

@nurse
Copy link
Copy Markdown
Member

nurse commented Nov 23, 2023

As lazka quoted and larskanis said, to remove __pioninfo it needs to reimplement C standard library I/O functions to use our HANDLE/SOCKET mapping table. Why the table is needed is to emulate Unix-like behavior which unifies file handle and socket. In other words to provide IO.select. The strategy larskanis wrote in https://bugs.ruby-lang.org/issues/18605#note-6 is aligned our (I and usa) understanding. The most tough part is implement CRLF handling (text mode).

I completely agree that the best solution would be to get rid of pioinfo pointer. Python, Perl, and MinGW (at least) got rid of it from what I investigated.

As far as I understand Python is under slightly different situation because they don't provide HANDLE/SOCKET unification. (their select doesn't handle files on Windows) https://docs.python.org/3/library/select.html

Perl seems similar situation, but they introduced another hack this year (!). Perl/perl5@8552f09
It looks interesting and it's worth investigating how stable the strategy is.

@lazka
Copy link
Copy Markdown

lazka commented Dec 24, 2023

fyi, the USE_LLVM_WINDRES workaround in this PR should no longer be needed starting with llvm v18, and the relevant llvm changes are backported in MSYS2 as of today -> msys2/MINGW-packages#19524

Comment thread win32/win32.c
Comment on lines +2669 to +2670
/* imm = immhi:immlo:Zeros(12), 64 */
const uint64_t adrp_imm = ((adrp_immhi << 2) | adrp_immlo) << 12;
Copy link
Copy Markdown
Contributor

@jeremyd2019 jeremyd2019 Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm messing with some similar code, and noticed what I think is a bug here: adrp's immediate should be sign-extended. I think something like

Suggested change
/* imm = immhi:immlo:Zeros(12), 64 */
const uint64_t adrp_imm = ((adrp_immhi << 2) | adrp_immlo) << 12;
/* imm = SignExtend(immhi:immlo:Zeros(12), 64) */
int64_t adrp_imm = (adrp_immhi << 2) | adrp_immlo;
/* sign extend */
if (adrp_imm & (1 << 20))
adrp_imm |= ~((1 << 21) - 1);
adrp_imm <<= 12;

if you really want that to be const I suppose you could do that in a ternary but I don't know if it's more legible that way:

Suggested change
/* imm = immhi:immlo:Zeros(12), 64 */
const uint64_t adrp_imm = ((adrp_immhi << 2) | adrp_immlo) << 12;
/* imm = SignExtend(immhi:immlo:Zeros(12), 64) */
const int64_t adrp_imm = (((adrp_immhi & (1 << 18)) ? ~((1 << 21) - 1) : 0) | (adrp_immhi << 2) | adrp_immlo) << 12;

Copy link
Copy Markdown
Contributor Author

@p-b-o p-b-o Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I missed that, and luckily, we didn't hit the case where adrp immhi sign matters.
I won't push a fix, but you're welcome to contribute it.

Copy link
Copy Markdown
Contributor

@jeremyd2019 jeremyd2019 Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe... The code I'm messing with didn't hit a negative offset either, but I was thinking it wouldn't be very useful if it wasn't signed, so checked the docs instead of just relying on what this code was doing... I didn't check the specifics of the add instruction either, the code I was looking at didn't use that, instead an ldr with an immediate unsigned offset to add in the low-order bits.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By curiosity, I checked the QEMU source as well, and sign is carefully extended for the immediate when decoding instruction.

Comment thread win32/win32.c
Comment on lines +2648 to +2649
const uint32_t add_id = 0x11000000;
const uint32_t add_mask = 0x3fc00000;
Copy link
Copy Markdown
Contributor

@jeremyd2019 jeremyd2019 Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble understanding the add id/mask. I think this is trying to match https://developer.arm.com/documentation/ddi0602/2024-09/Index-by-Encoding/Data-Processing----Immediate?lang=en#addsub_imm.

id   0  0  0  1  0  0  0  1  0  0
mask 0  0  1  1  1  1  1  1  1  1
desc sf op S  1  0  0  0  1  0  sh

So you mask off sf (size, 32 or 64 bit), and op (apparently add vs subtract), and match S must be off (not setting flags) and sh must be off (not shifting by 12 bits). You then later implement checking and handling the sh bit left shifting by 12, but never check or implement the op bit turning it into a subtract. Perhaps you meant mask 0x7f800000 instead? Or am I off by one somewhere?

Anyway, sh should not be set, the adrp sets everything but the least-significant 12 bits, so the purpose of the following add in the pair is to set the lower 12 bits, and shifting the add's immediate by 12 bits would result in it not setting those 12 bits either. So a mask of 0x7fc00000 would probably be fine, and then drop the add_sh handling completely.

Copy link
Copy Markdown
Contributor Author

@p-b-o p-b-o Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used doc pages for instructions encoding: add - vs sub.

_mask are the bits that matters to identify encoding (same between sub and add).
_id are the bits expected, i.e. the concrete encoding. (bit 30 allow to differentiate sub vs add).

To identify an add, we need to check bits 23 to 30, which represent the instruction (fixed) encoding.
add_id: 0x11000000 (bits 23 to 30 are 0b00100010, sub_id would be 0b10100010)
add_mask: 0x3fc00000 (0b0111111110000000000000000000000, bits 23 to 30), sub_mask would be the same.

Note: The code included here is not supposed to be a complete disassembler, just a simple way to find a couple of sequences expected (it's supposed that only immediates may be changed in the future). So it makes assumptions (S bit at 0) and is "enough" to work, no more.

Copy link
Copy Markdown
Contributor

@jeremyd2019 jeremyd2019 Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_mask are the bits that matters to identify encoding (same between sub and add). _id are the bits expected, i.e. the concrete encoding. (bit 30 allow to differentiate sub vs add).

To identify an add, we need to check bits 23 to 30, which represent the instruction (fixed) encoding. add_id: 0x11000000 (bits 23 to 30 are 0b00100010, sub_id would be 0b10100010) add_mask: 0x3fc00000 (0b0111111110000000000000000000000, bits 23 to 30), sub_mask would be the same.

sub x1, x2, #0x159 is:
0xd1056441 is 0b11010001000001010110010001000001
your mask is:
0x3fc00000 is 0b00111111110000000000000000000000
anding them results in:
0x11000000 is the same as add_id.

I think you may be off-by-one in your bits due to them numbering from 0 to 31, so counting from MSB you need to start at 31 not 32.

I still recommend mask of 0x7fc00000 and removing add_sh. I could open a PR for that, if you agree with my reasoning.

Copy link
Copy Markdown
Contributor Author

@p-b-o p-b-o Nov 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, you're right, it seems that the mask is shifted to the right, and missing the op bit which is wrong.
And counting again yesterday, I did the same mistake...

The correct mask I get is 0x7f800000 (not including the sh bit). (0xff << 23, bits 23 to 30). Unluckily, since bit 30 is 0 for add, we missed that because mask exclude it. I would have catch the problem if it was a sub instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to make a PR for it (and ping me, I can give you a +1).

larskanis added a commit to larskanis/ruby that referenced this pull request Dec 18, 2024
It was introduced as part of the Arm64-on-Windows patch:
  ruby#8995

But a few days later it was fixed on the LLVM side for llvm-18 and backported to MSYS2:
  msys2/MINGW-packages#19157 (comment)

Now this code is only unnecessary complexity.
hsbt pushed a commit to larskanis/ruby that referenced this pull request Jan 15, 2026
It was introduced as part of the Arm64-on-Windows patch:
  ruby#8995

But a few days later it was fixed on the LLVM side for llvm-18 and backported to MSYS2:
  msys2/MINGW-packages#19157 (comment)

Now this code is only unnecessary complexity.
hsbt pushed a commit that referenced this pull request Jan 15, 2026
It was introduced as part of the Arm64-on-Windows patch:
  #8995

But a few days later it was fixed on the LLVM side for llvm-18 and backported to MSYS2:
  msys2/MINGW-packages#19157 (comment)

Now this code is only unnecessary complexity.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants