Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spill fewer temps on iv writes #9974

Merged
merged 4 commits into from
Feb 16, 2024
Merged

Conversation

tenderlove
Copy link
Member

Not all IV writes require calling a C function. If we don't need to execute a write barrier (IOW the written value is an immediate), and we don't need to expand the object to accommodate a new IV, we won't need to make a C call and we can avoid spilling temps.

@matzbot matzbot requested a review from a team February 15, 2024 02:06
@tenderlove
Copy link
Member Author

tenderlove commented Feb 15, 2024

I tried this code:

class Foo
  def initialize
    @a = 1
    @b = 1
    @c = 1
  end
end

def foo
  50.times { Foo.new }
end

foo
puts RubyVM::YJIT.disasm(Foo.instance_method(:initialize))

Disasm before this patch:

TOTAL INLINE CODE SIZE: 192 bytes
== BLOCK 1/4, ISEQ RANGE [0,4), 4 bytes =======================
  # Insn: 0000 putobject_INT2FIX_1_ (stack_size: 0)
  # reg_temps: 00000000 -> 00000001
  0x1088c43d8: mov x1, #3

== BLOCK 2/4, ISEQ RANGE [1,8), 60 bytes ======================
  # regenerate_branch
  # Insn: 0001 setinstancevariable (stack_size: 1)
  # regenerate_branch
  0x1088c43dc: ldur x11, [x19, #0x18]
  # guard object is heap
  0x1088c43e0: tst x11, #7
  0x1088c43e4: b.ne #0x1088c6614
  0x1088c43e8: cmp x11, #0
  0x1088c43ec: b.eq #0x1088c6614
  # guard shape
  0x1088c43f0: ldur w12, [x11, #4]
  0x1088c43f4: cmp w12, #5
  0x1088c43f8: b.ne #0x1088c65ac
  0x1088c43fc: nop 
  # spill_temps: 00000001 -> 00000000
  0x1088c4400: stur x1, [x21]
  # write IV
  0x1088c4404: ldur x12, [x21]
  0x1088c4408: stur x12, [x11, #0x10]
  # write shape
  0x1088c440c: mov x12, #0x17
  0x1088c4410: stur w12, [x11, #4]
  # Insn: 0004 putobject_INT2FIX_1_ (stack_size: 0)
  # reg_temps: 00000000 -> 00000001
  0x1088c4414: mov x1, #3

== BLOCK 3/4, ISEQ RANGE [5,13), 48 bytes =====================
  # regenerate_branch
  # Insn: 0005 setinstancevariable (stack_size: 1)
  # regenerate_branch
  0x1088c4418: ldur x11, [x19, #0x18]
  # guard shape
  0x1088c441c: ldur w12, [x11, #4]
  0x1088c4420: cmp w12, #0x17
  0x1088c4424: b.ne #0x1088c664c
  0x1088c4428: nop 
  # spill_temps: 00000001 -> 00000000
  0x1088c442c: stur x1, [x21]
  # write IV
  0x1088c4430: ldur x12, [x21]
  0x1088c4434: stur x12, [x11, #0x18]
  # write shape
  0x1088c4438: mov x12, #0x18
  0x1088c443c: stur w12, [x11, #4]
  # Insn: 0008 putobject_INT2FIX_1_ (stack_size: 0)
  # reg_temps: 00000000 -> 00000001
  0x1088c4440: mov x1, #3
  # Insn: 0009 dup (stack_size: 1)
  # reg_temps: 00000001 -> 00000011
  0x1088c4444: mov x9, x1

== BLOCK 4/4, ISEQ RANGE [10,14), 80 bytes ====================
  # regenerate_branch
  # Insn: 0010 setinstancevariable (stack_size: 2)
  # regenerate_branch
  0x1088c4448: ldur x11, [x19, #0x18]
  # guard shape
  0x1088c444c: ldur w12, [x11, #4]
  0x1088c4450: cmp w12, #0x18
  0x1088c4454: b.ne #0x1088c66b8
  0x1088c4458: nop 
  # spill_temps: 00000011 -> 00000000
  0x1088c445c: stur x1, [x21]
  0x1088c4460: stur x9, [x21, #8]
  # write IV
  0x1088c4464: ldur x12, [x21, #8]
  0x1088c4468: stur x12, [x11, #0x20]
  # write shape
  0x1088c446c: mov x12, #0x19
  0x1088c4470: stur w12, [x11, #4]
  # Insn: 0013 leave (stack_size: 1)
  # RUBY_VM_CHECK_INTS(ec)
  0x1088c4474: ldur w11, [x20, #0x20]
  0x1088c4478: tst w11, w11
  0x1088c447c: b.ne #0x1088c66f0
  # pop stack frame
  0x1088c4480: adds x11, x19, #0x38
  0x1088c4484: mov x19, x11
  0x1088c4488: stur x19, [x20, #0x10]
  0x1088c448c: ldur x0, [x21]
  0x1088c4490: ldur x11, [x19, #-8]
  0x1088c4494: br x11

Disasm after this patch:

TOTAL INLINE CODE SIZE: 164 bytes
== BLOCK 1/4, ISEQ RANGE [0,4), 4 bytes =======================
  # Insn: 0000 putobject_INT2FIX_1_ (stack_size: 0)
  # reg_temps: 00000000 -> 00000001
  0x1050e83d8: mov x1, #3

== BLOCK 2/4, ISEQ RANGE [1,8), 52 bytes ======================
  # regenerate_branch
  # Insn: 0001 setinstancevariable (stack_size: 1)
  # regenerate_branch
  0x1050e83dc: ldur x11, [x19, #0x18]
  # guard object is heap
  0x1050e83e0: tst x11, #7
  0x1050e83e4: b.ne #0x1050ea614
  0x1050e83e8: cmp x11, #0
  0x1050e83ec: b.eq #0x1050ea614
  # guard shape
  0x1050e83f0: ldur w12, [x11, #4]
  0x1050e83f4: cmp w12, #5
  0x1050e83f8: b.ne #0x1050ea5ac
  0x1050e83fc: nop 
  # write IV
  0x1050e8400: stur x1, [x11, #0x10]
  # write shape
  0x1050e8404: mov x12, #0x17
  0x1050e8408: stur w12, [x11, #4]
  # Insn: 0004 putobject_INT2FIX_1_ (stack_size: 0)
  # reg_temps: 00000000 -> 00000001
  0x1050e840c: mov x1, #3

== BLOCK 3/4, ISEQ RANGE [5,13), 40 bytes =====================
  # regenerate_branch
  # Insn: 0005 setinstancevariable (stack_size: 1)
  # regenerate_branch
  0x1050e8410: ldur x11, [x19, #0x18]
  # guard shape
  0x1050e8414: ldur w12, [x11, #4]
  0x1050e8418: cmp w12, #0x17
  0x1050e841c: b.ne #0x1050ea64c
  0x1050e8420: nop 
  # write IV
  0x1050e8424: stur x1, [x11, #0x18]
  # write shape
  0x1050e8428: mov x12, #0x18
  0x1050e842c: stur w12, [x11, #4]
  # Insn: 0008 putobject_INT2FIX_1_ (stack_size: 0)
  # reg_temps: 00000000 -> 00000001
  0x1050e8430: mov x1, #3
  # Insn: 0009 dup (stack_size: 1)
  # reg_temps: 00000001 -> 00000011
  0x1050e8434: mov x9, x1

== BLOCK 4/4, ISEQ RANGE [10,14), 68 bytes ====================
  # regenerate_branch
  # Insn: 0010 setinstancevariable (stack_size: 2)
  # regenerate_branch
  0x1050e8438: ldur x11, [x19, #0x18]
  # guard shape
  0x1050e843c: ldur w12, [x11, #4]
  0x1050e8440: cmp w12, #0x18
  0x1050e8444: b.ne #0x1050ea6b8
  0x1050e8448: nop 
  # write IV
  0x1050e844c: stur x9, [x11, #0x20]
  # write shape
  0x1050e8450: mov x12, #0x19
  0x1050e8454: stur w12, [x11, #4]
  # Insn: 0013 leave (stack_size: 1)
  # RUBY_VM_CHECK_INTS(ec)
  0x1050e8458: ldur w11, [x20, #0x20]
  0x1050e845c: tst w11, w11
  0x1050e8460: b.ne #0x1050ea6f0
  # pop stack frame
  0x1050e8464: adds x11, x19, #0x38
  0x1050e8468: mov x19, x11
  0x1050e846c: stur x19, [x20, #0x10]
  0x1050e8470: mov x0, x1
  0x1050e8474: ldur x11, [x19, #-8]
  0x1050e8478: br x11

The code to store temps is eliminated.

yjit/src/codegen.rs Outdated Show resolved Hide resolved
Not all IV writes require calling a C function. If we don't need to
execute a write barrier (IOW the written value is an immediate), and we
don't need to expand the object to accommodate a new IV, we won't need
to make a C call and we can avoid spilling temps.
yjit/src/codegen.rs Outdated Show resolved Hide resolved
yjit/src/codegen.rs Outdated Show resolved Hide resolved
Copy link
Member

@k0kubun k0kubun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up rewriting the whole patch, but this looks good :trollface:

This does skip spilling temps when ivar_index is known and the written value is immediate. This seems like the only obvious improvement on the current setivar.

@tenderlove
Copy link
Member Author

@k0kubun lol thank you!

@tenderlove tenderlove merged commit 9d81741 into ruby:master Feb 16, 2024
97 checks passed
@tenderlove tenderlove deleted the fewer-spills branch February 16, 2024 00:38
@k0kubun
Copy link
Member

k0kubun commented Feb 17, 2024

As Maxime shared, this was a nice change for Optcarrot.

before: ruby 3.4.0dev (2024-02-15T23:04:38Z master 1b9b960963) +YJIT [x86_64-linux]
after: ruby 3.4.0dev (2024-02-16T00:38:21Z master 9d81741f27) +YJIT [x86_64-linux]

---------  -----------  ----------  ----------  ----------  -------------  ------------
bench      before (ms)  stddev (%)  after (ms)  stddev (%)  after 1st itr  before/after
optcarrot  1524.8       0.7         1490.6      0.8         1.02           1.02
---------  -----------  ----------  ----------  ----------  -------------  ------------

@maximecb
Copy link
Contributor

Railsbench is also 75% faster on speed.yjit.org now. Liquid-render 2.56x, pretty impressive!

Though Alan's work on nibbling at exits may have something to do with that too :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants