Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YJIT: optimized codegen for rb_ary_empty_p #7242

Merged
merged 3 commits into from Feb 9, 2023

Conversation

maximecb
Copy link
Contributor

@maximecb maximecb commented Feb 3, 2023

Starting by factoring out the code to read the length of an array and checking that this passes CI tests. We were repeating some ops to deal with register allocation but I think we added more registers since so that should no longer be needed.

Codegen before

== BLOCK 2/3, ISEQ RANGE [2,4), 196 bytes =====================
  # regenerate_branch
  # Insn: opt_empty_p
  # call to Array#empty?
  # guard not immediate
  # regenerate_branch
  0x10fe60064: test byte ptr [rbx], 7
  0x10fe60067: jne 0x10fe62072
  0x10fe6006d: cmp qword ptr [rbx], 0
  0x10fe60071: je 0x10fe62086
  0x10fe60077: mov rax, qword ptr [rbx]
  # guard known class
  0x10fe6007a: movabs rcx, 0x10e950c10
  0x10fe60084: cmp qword ptr [rax + 8], rcx
  0x10fe60088: jne 0x10fe6209a
  # RUBY_VM_CHECK_INTS(ec)
  0x10fe6008e: mov eax, dword ptr [r12 + 0x20]
  0x10fe60093: test eax, eax
  0x10fe60095: jne 0x10fe62051
  # stack overflow check
  0x10fe6009b: lea rax, [rbx + 0xa8]
  0x10fe600a2: cmp r13, rax
  0x10fe600a5: jbe 0x10fe62051
  # save PC to CFP
  0x10fe600ab: movabs rax, 0x6000035283e0
  0x10fe600b5: mov qword ptr [r13], rax
  0x10fe600b9: lea rax, [rbx + 0x20]
  # push cme, specval, frame type
  0x10fe600bd: movabs rcx, 0x10d8fda60
  0x10fe600c7: mov qword ptr [rax - 0x18], rcx
  0x10fe600cb: mov qword ptr [rax - 0x10], 0
  0x10fe600d3: mov qword ptr [rax - 8], 0x55550083
  # push callee control frame
  0x10fe600db: mov qword ptr [r13 - 0x40], 0
  0x10fe600e3: mov qword ptr [r13 - 0x10], rax
  0x10fe600e7: mov qword ptr [r13 - 0x38], rax
  0x10fe600eb: mov qword ptr [r13 - 0x30], 0
  0x10fe600f3: mov rcx, qword ptr [rbx]
  0x10fe600f6: mov qword ptr [r13 - 0x28], rcx
  0x10fe600fa: mov qword ptr [r13 - 0x18], 0
  0x10fe60102: sub rax, 8
  0x10fe60106: mov qword ptr [r13 - 0x20], rax
  # switch to new CFP
  0x10fe6010a: lea rax, [r13 - 0x40]
  0x10fe6010e: mov qword ptr [r12 + 0x10], rax
  0x10fe60113: lea rax, [rbx + 8]
  # call C function
  0x10fe60117: mov rdi, qword ptr [rax - 8]
  0x10fe6011b: call 0x102ecbc60
  0x10fe60120: mov qword ptr [rbx], rax
  0x10fe60123: mov qword ptr [r12 + 0x10], r13

Codegen after

This is less than half the code size, and there is no more interrupt check, stack overflow check or function call.

== BLOCK 2/3, ISEQ RANGE [2,4), 90 bytes ======================
  # regenerate_branch
  # Insn: opt_empty_p
  # call to Array#empty?
  # guard not immediate
  # regenerate_branch
  0x10ff90064: test byte ptr [rbx], 7
  0x10ff90067: jne 0x10ff92072
  0x10ff9006d: cmp qword ptr [rbx], 0
  0x10ff90071: je 0x10ff92086
  0x10ff90077: mov rax, qword ptr [rbx]
  # guard known class
  0x10ff9007a: movabs rcx, 0x10ea80c00
  0x10ff90084: cmp qword ptr [rax + 8], rcx
  0x10ff90088: jne 0x10ff9209a
  0x10ff9008e: mov rax, qword ptr [rbx]
  # get array length for embedded or heap
  0x10ff90091: mov rcx, qword ptr [rax]
  0x10ff90094: and rcx, 0x3f8000
  0x10ff9009b: sar rcx, 0xf
  0x10ff9009f: test word ptr [rax], 0x2000
  0x10ff900a4: cmove rcx, qword ptr [rax + 0x10]
  0x10ff900a9: cmp rcx, 0
  0x10ff900ad: mov eax, 0x14
  0x10ff900b2: mov ecx, 0
  0x10ff900b7: cmovne rax, rcx
  0x10ff900bb: mov qword ptr [rbx], rax

@maximecb maximecb marked this pull request as draft February 3, 2023 16:08
@matzbot matzbot requested a review from a team February 3, 2023 16:08
yjit/src/codegen.rs Outdated Show resolved Hide resolved
@maximecb maximecb marked this pull request as ready for review February 9, 2023 18:11
@matzbot matzbot requested a review from a team February 9, 2023 18:11
yjit/src/codegen.rs Outdated Show resolved Hide resolved
Co-authored-by: Takashi Kokubun <takashikkbn@gmail.com>
@maximecb maximecb merged commit 810aeb2 into ruby:master Feb 9, 2023
@maximecb maximecb deleted the yjit_ary_empty branch February 9, 2023 20:14
@k0kubun k0kubun changed the title YJIT: optimized codegen for rb_ary_empty_p (WIP) YJIT: optimized codegen for rb_ary_empty_p Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants