optimize gc sweep #1049

Open
wants to merge 10 commits into
from

Conversation

2 participants
@kazuho

kazuho commented Oct 8, 2015

This PR optimizes the gc_page_sweep function.

Consider the case of sweeping a string.

In case of trunk, the code path is: switch -> call obj_free -> switch -> switch -> call rb_str_free (note that obj_free does not get inlined even though it is marked as such).
This PR changes the code path to: call obj_free_string (via table lookup) -> jmp rb_str_free.

On my MacBook Pro using GCC 5.1.0, the following synthetic benchmark went from 7.57 sec. (trunk) to 7.21 sec (this PR).

require "benchmark"
N = 100_000_000
GC.start; GC.start # magic
Benchmark.bm{|bm|
  bm.report{N.times{str = "a"}}
}

The number of GC benchmarks changed as follows:

benchmark trunk (user time / gc time) this PR (user time / gc time)
gcbench-rdoc 97.5 / 5.27 95.0 / 5.15
gcbench-aobench 72.3 / 4.15 69.3 / 3.98

Please take with a grain of salt when looking at the changes in user time; there were some variation between repetitive runs, and I am not sure why the numbers changed - does the incremental GC of ruby refer to some timer (in which case reduced number of sweep runs (as its faster) would lead to better CPU cache usage explaining the gain), or if not, it could be due to better use of branch prediction unit.
OTOH the gc time numbers observed were mostly consistent.

@hsbt hsbt assigned ko1 Oct 8, 2015

@ko1

This comment has been minimized.

Show comment
Hide comment
@ko1

ko1 Nov 5, 2016

Contributor

We'll need to employ this technique. Please give me a time for a while.
Sorry for waiting.

Contributor

ko1 commented Nov 5, 2016

We'll need to employ this technique. Please give me a time for a while.
Sorry for waiting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment